How to use crawler.crawl to full-page scrolling? #406

helenatthais · 2025-01-03T05:44:10Z

Despite simulate full-page scrolling feature released with 0.4.1. version, I'm struggling to make it work because I'm still not sure where to insert crawler.crawl function. The docs (https://crawl4ai.com/mkdocs/blog/releases/0.4.1/) cite the following example:

await crawler.crawl( url="https://example.com", scan_full_page=True, # Enables scrolling scroll_delay=0.2 # Waits 200ms between scrolls (optional) )

The text was updated successfully, but these errors were encountered:

TheCutestCat · 2025-01-03T07:24:21Z

@helenatthais I have fixed this problem with this PR,
and here is a disscusion about some parameters for screenshot that was not mentioned : link

helenatthais · 2025-01-03T08:00:08Z

Tried to execute the code from the referred PR and still the full scrolling page feature doesn't work:

async def main():
# Configure the browser settings
browser_config = BrowserConfig(headless=False, verbose=True)

# Set run configurations, including cache mode and markdown generator
crawl_config = CrawlerRunConfig(
    cache_mode=CacheMode.BYPASS,
    screenshot=True,
    # Set these two flags
    scan_full_page=True,
    wait_for_images=True,
)

async with AsyncWebCrawler(config=browser_config) as crawler:
    result = await crawler.arun(
        url='https://www.nytimes.com/ca/',
        config=crawl_config

TheCutestCat · 2025-01-03T08:05:38Z

@helenatthais Hi, could you please provide more details about your setup and how you're running the code? I've tested it in my local environment and everything seems to work fine.

One possible cause of the issue might be that the original crawl4ai package is still installed. Could you check if that's the case?

helenatthais · 2025-01-03T08:15:38Z

Sure, I installed crawl4ai with pip install crawl4ai and I've recently upgraded with --upgrade. I'm trying to run the following code to scrape Google Maps reviews:

import asyncio
from crawl4ai import AsyncWebCrawler, CacheMode, BrowserConfig, CrawlerRunConfig
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
from crawl4ai.content_filter_strategy import PruningContentFilter
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
import json

async def main():
    browser_config = BrowserConfig(headless=False, verbose=True)
    
    crawl_config = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS,
        screenshot=False,
        scan_full_page=True,
        js_code="window.scrollTo(0, document.body.scrollHeight);",
        scroll_delay=2000,
        css_selector="div.GHT2ce.NsCY4, span.wiI7pd",
        exclude_external_links=True,
        exclude_social_media_links=True,
        exclude_external_images=True,
        simulate_user=True
    )
        
    async with AsyncWebCrawler(verbose=True, config=browser_config) as crawler:
        result = await crawler.arun(
            url="https://www.google.com.br/maps/place/Dra+Regina+C%C3%A9lia+de+Aquino+Barbosa/@-22.8744795,-43.3429393,17z/data=!4m8!3m7!1s0x9962d7809bdfe3:0x9871497b1081f14e!8m2!3d-22.8744795!4d-43.3403644!9m1!1b1!16s%2Fg%2F1wf2320v?entry=ttu&g_ep=EgoyMDI0MTIxMS4wIKXMDSoASAFQAw%3D%3D",
            config=crawl_config
        )
        print(result.markdown)

if __name__ == "__main__":
    asyncio.run(main())

TheCutestCat · 2025-01-03T08:26:47Z

@helenatthais I understand that. This is because my PR hasn't been merged into the main branch yet. You can either:

Wait for the new version of crawl4ai (which should be available soon), or
Use the modified original code (though this will be a bit more complex) by implementing the changes shown here: changes in PR #403

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use crawler.crawl to full-page scrolling? #406

How to use crawler.crawl to full-page scrolling? #406

helenatthais commented Jan 3, 2025 •

edited

Loading

TheCutestCat commented Jan 3, 2025 •

edited

Loading

helenatthais commented Jan 3, 2025 •

edited

Loading

TheCutestCat commented Jan 3, 2025

helenatthais commented Jan 3, 2025 •

edited

Loading

TheCutestCat commented Jan 3, 2025

How to use crawler.crawl to full-page scrolling? #406

How to use crawler.crawl to full-page scrolling? #406

Comments

helenatthais commented Jan 3, 2025 • edited Loading

TheCutestCat commented Jan 3, 2025 • edited Loading

helenatthais commented Jan 3, 2025 • edited Loading

TheCutestCat commented Jan 3, 2025

helenatthais commented Jan 3, 2025 • edited Loading

TheCutestCat commented Jan 3, 2025

helenatthais commented Jan 3, 2025 •

edited

Loading

TheCutestCat commented Jan 3, 2025 •

edited

Loading

helenatthais commented Jan 3, 2025 •

edited

Loading

helenatthais commented Jan 3, 2025 •

edited

Loading