Request - Integrate Soundcloud #70

stateofdenialist · 2024-12-27T08:00:27Z

Thank you for this tool, hoping that you integrate Soundcloud at some point.

MXC1 · 2024-12-28T18:46:51Z

This would be amazing to see. @fiso64, you might be able to take inspiration from 3jackdaws/soundcloud-lib for getting SoundCloud tracks. Alternatively you can use browser automation and/or webscraping tools (e.g. Selenium / BeautifulSoup).

I've managed to get slsk-batchdl to work with SoundCloud URLs by scraping them using Selenium and BeautifulSoup, writing to a CSV and then passing that CSV to slsk-batchdl. Not the most elegant solution, but it works.

Here's a couple of code snippets:

//download_and_process_playlists.py

    # Process all SoundCloud playlists
    soundcloud_csv_paths = []
    for sc_url in soundcloud_playlists:
        print(f"Processing SoundCloud playlist: {sc_url}")
        convert_soundcloud_to_csv(sc_url)

    # Find CSV files in the soundcloud_playlists directory
    soundcloud_csv_dir = "./soundcloud_playlists"
    if os.path.exists(soundcloud_csv_dir):
        for file_name in os.listdir(soundcloud_csv_dir):
            if file_name.endswith(".csv"):
                soundcloud_csv_paths.append(os.path.join(soundcloud_csv_dir, file_name))

    # Pass all CSV files from SoundCloud playlists to slsk-batchdl
    for csv_path in soundcloud_csv_paths:
        print(f"Passing SoundCloud CSV to slsk-batchdl: {csv_path}")
        subprocess.run(["sldl", "--desperate", "--strict-artist", csv_path], check=True)

//convert_soundcloud_to_csv.py

def scrape_soundcloud_playlist(url, output_csv):
    """
    Scrapes a SoundCloud playlist and saves the track information to a CSV file.

    Args:
        url (str): The URL of the SoundCloud playlist.
        output_csv (str): The filename of the CSV file to save data.
    """
    try:
        logging.info("Initializing WebDriver.")
        # Set up the WebDriver
        service = Service(CHROME_DRIVER_PATH)
        options = webdriver.ChromeOptions()
        options.add_argument("--headless")  # Run in headless mode
        options.add_argument("--mute-audio")  # Prevent SoundCloud starting playback
        options.add_experimental_option("excludeSwitches", ["enable-logging"])
        driver = webdriver.Chrome(service=service, options=options)
        
        # import logging
        logger = logging.getLogger('urllib3.connectionpool')
        logger.setLevel(logging.INFO)

        logger = logging.getLogger('selenium.webdriver.remote.remote_connection')
        logger.setLevel(logging.WARNING)

        logging.info(f"Loading URL: {url}")
        # Load the URL
        driver.get(url)
        time.sleep(5)  # Wait for the page to load completely

        logging.debug("Retrieving page source.")
        # Get the page source after rendering JavaScript
        page_source = driver.page_source
        driver.quit()  # Close the browser

        logging.debug("Parsing page content.")
        # Parse the HTML content with BeautifulSoup
        soup = BeautifulSoup(page_source, 'html.parser')

        # Find the playlist items
        playlist_items = soup.select('li.trackList__item')

        # Prepare the data
        data = []
        for item in playlist_items:
            artist_tag = item.select_one('.trackItem__username')
            track_tag = item.select_one('.trackItem__trackTitle')
            if artist_tag and track_tag:
                artist = artist_tag.text.strip()
                track = track_tag.text.strip()
                logging.debug(f"Found track: Artist = {artist}, Track = {track}")
                data.append([artist, track])

        # Ensure the directory exists
        os.makedirs(os.path.dirname(output_csv), exist_ok=True)

        # Save the data to a CSV file
        logging.info(f"Saving data to {output_csv}.")
        with open(output_csv, mode='w', newline='', encoding='utf-8') as file:
            writer = csv.writer(file)
            writer.writerow(['Artist', 'Track'])  # Write the header
            writer.writerows(data)  # Write the data

        logging.info(f"Data successfully saved to {output_csv}.")

fiso64 · 2024-12-28T20:51:08Z

I've thought about this before but have decided against working on it because soundcloud doesn't make this easy (and scraping is annoying). I am open to any pull requests though. The extractor code is one of the few parts of the program that is well structured and should be easy to extend for anyone interested.

P.S: When downloading from soundcloud I prefer to first get the original file in case the uploader has provided any, then try soulseek, and then fallback to downloading the re-encoded m4a or mp3 file from soundcloud. I use a modified version of https://github.com/scdl-org/scdl that calls sldl whenever it is about to download a re-encoded file. This kind of integration would be hard to implement in sldl without copying much of the logic from scdl.

stateofdenialist · 2025-01-04T03:58:37Z

Thanks for the detailed response @fiso64 and @MXC1 for the snippets, and yes I use scdl already which is a super useful tool! I'll figure something out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request - Integrate Soundcloud #70

Request - Integrate Soundcloud #70

stateofdenialist commented Dec 27, 2024

MXC1 commented Dec 28, 2024

fiso64 commented Dec 28, 2024

stateofdenialist commented Jan 4, 2025

Request - Integrate Soundcloud #70

Request - Integrate Soundcloud #70

Comments

stateofdenialist commented Dec 27, 2024

MXC1 commented Dec 28, 2024

fiso64 commented Dec 28, 2024

stateofdenialist commented Jan 4, 2025