Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request - Integrate Soundcloud #70

Open
stateofdenialist opened this issue Dec 27, 2024 · 3 comments
Open

Request - Integrate Soundcloud #70

stateofdenialist opened this issue Dec 27, 2024 · 3 comments

Comments

@stateofdenialist
Copy link

Thank you for this tool, hoping that you integrate Soundcloud at some point.

@MXC1
Copy link

MXC1 commented Dec 28, 2024

This would be amazing to see. @fiso64, you might be able to take inspiration from 3jackdaws/soundcloud-lib for getting SoundCloud tracks. Alternatively you can use browser automation and/or webscraping tools (e.g. Selenium / BeautifulSoup).

I've managed to get slsk-batchdl to work with SoundCloud URLs by scraping them using Selenium and BeautifulSoup, writing to a CSV and then passing that CSV to slsk-batchdl. Not the most elegant solution, but it works.

Here's a couple of code snippets:

//download_and_process_playlists.py

    # Process all SoundCloud playlists
    soundcloud_csv_paths = []
    for sc_url in soundcloud_playlists:
        print(f"Processing SoundCloud playlist: {sc_url}")
        convert_soundcloud_to_csv(sc_url)

    # Find CSV files in the soundcloud_playlists directory
    soundcloud_csv_dir = "./soundcloud_playlists"
    if os.path.exists(soundcloud_csv_dir):
        for file_name in os.listdir(soundcloud_csv_dir):
            if file_name.endswith(".csv"):
                soundcloud_csv_paths.append(os.path.join(soundcloud_csv_dir, file_name))

    # Pass all CSV files from SoundCloud playlists to slsk-batchdl
    for csv_path in soundcloud_csv_paths:
        print(f"Passing SoundCloud CSV to slsk-batchdl: {csv_path}")
        subprocess.run(["sldl", "--desperate", "--strict-artist", csv_path], check=True)
//convert_soundcloud_to_csv.py

def scrape_soundcloud_playlist(url, output_csv):
    """
    Scrapes a SoundCloud playlist and saves the track information to a CSV file.

    Args:
        url (str): The URL of the SoundCloud playlist.
        output_csv (str): The filename of the CSV file to save data.
    """
    try:
        logging.info("Initializing WebDriver.")
        # Set up the WebDriver
        service = Service(CHROME_DRIVER_PATH)
        options = webdriver.ChromeOptions()
        options.add_argument("--headless")  # Run in headless mode
        options.add_argument("--mute-audio")  # Prevent SoundCloud starting playback
        options.add_experimental_option("excludeSwitches", ["enable-logging"])
        driver = webdriver.Chrome(service=service, options=options)
        
        # import logging
        logger = logging.getLogger('urllib3.connectionpool')
        logger.setLevel(logging.INFO)

        logger = logging.getLogger('selenium.webdriver.remote.remote_connection')
        logger.setLevel(logging.WARNING)

        logging.info(f"Loading URL: {url}")
        # Load the URL
        driver.get(url)
        time.sleep(5)  # Wait for the page to load completely

        logging.debug("Retrieving page source.")
        # Get the page source after rendering JavaScript
        page_source = driver.page_source
        driver.quit()  # Close the browser

        logging.debug("Parsing page content.")
        # Parse the HTML content with BeautifulSoup
        soup = BeautifulSoup(page_source, 'html.parser')

        # Find the playlist items
        playlist_items = soup.select('li.trackList__item')

        # Prepare the data
        data = []
        for item in playlist_items:
            artist_tag = item.select_one('.trackItem__username')
            track_tag = item.select_one('.trackItem__trackTitle')
            if artist_tag and track_tag:
                artist = artist_tag.text.strip()
                track = track_tag.text.strip()
                logging.debug(f"Found track: Artist = {artist}, Track = {track}")
                data.append([artist, track])

        # Ensure the directory exists
        os.makedirs(os.path.dirname(output_csv), exist_ok=True)

        # Save the data to a CSV file
        logging.info(f"Saving data to {output_csv}.")
        with open(output_csv, mode='w', newline='', encoding='utf-8') as file:
            writer = csv.writer(file)
            writer.writerow(['Artist', 'Track'])  # Write the header
            writer.writerows(data)  # Write the data

        logging.info(f"Data successfully saved to {output_csv}.")

@fiso64
Copy link
Owner

fiso64 commented Dec 28, 2024

I've thought about this before but have decided against working on it because soundcloud doesn't make this easy (and scraping is annoying). I am open to any pull requests though. The extractor code is one of the few parts of the program that is well structured and should be easy to extend for anyone interested.

P.S: When downloading from soundcloud I prefer to first get the original file in case the uploader has provided any, then try soulseek, and then fallback to downloading the re-encoded m4a or mp3 file from soundcloud. I use a modified version of https://github.com/scdl-org/scdl that calls sldl whenever it is about to download a re-encoded file. This kind of integration would be hard to implement in sldl without copying much of the logic from scdl.

@stateofdenialist
Copy link
Author

Thanks for the detailed response @fiso64 and @MXC1 for the snippets, and yes I use scdl already which is a super useful tool! I'll figure something out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants