> ## Documentation Index > Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt > Use this file to discover all available pages before exploring further. # Python SDK > Official Python SDK for ScrapeGraphAI ScrapeGraph API Banner [![PyPI version](https://badge.fury.io/py/scrapegraph-py.svg)](https://badge.fury.io/py/scrapegraph-py) [![Python Support](https://img.shields.io/pypi/pyversions/scrapegraph-py.svg)](https://pypi.org/project/scrapegraph-py/) ## Installation Install the package using pip: ```bash theme={null} pip install scrapegraph-py ``` ## Features * **AI-Powered Extraction**: Advanced web scraping using artificial intelligence * **Flexible Clients**: Both synchronous and asynchronous support * **Type Safety**: Structured output with Pydantic schemas * **Production Ready**: Detailed logging and automatic retries * **Developer Friendly**: Comprehensive error handling ## Quick Start Initialize the client with your API key: ```python theme={null} from scrapegraph_py import Client client = Client(api_key="your-api-key-here") ``` You can also set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `client = Client()` ## Services ### SmartScraper Extract specific information from any webpage using AI: ```python theme={null} response = client.smartscraper( website_url="https://example.com", user_prompt="Extract the main heading and description" ) ``` #### Parameters | Parameter | Type | Required | Description | | ----------------- | ------- | -------- | ------------------------------------------------------------------------------------------------------ | | website\_url | string | Yes | The URL of the webpage that needs to be scraped. | | user\_prompt | string | Yes | A textual description of what you want to achieve. | | output\_schema | object | No | The Pydantic object that describes the structure and format of the response. | | render\_heavy\_js | boolean | No | Enable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, etc.). Default: False | Define a simple schema for basic data extraction: ```python theme={null} from pydantic import BaseModel, Field class ArticleData(BaseModel): title: str = Field(description="The article title") author: str = Field(description="The author's name") publish_date: str = Field(description="Article publication date") content: str = Field(description="Main article content") category: str = Field(description="Article category") response = client.smartscraper( website_url="https://example.com/blog/article", user_prompt="Extract the article information", output_schema=ArticleData ) print(f"Title: {response.title}") print(f"Author: {response.author}") print(f"Published: {response.publish_date}") ``` Define a complex schema for nested data structures: ```python theme={null} from typing import List from pydantic import BaseModel, Field class Employee(BaseModel): name: str = Field(description="Employee's full name") position: str = Field(description="Job title") department: str = Field(description="Department name") email: str = Field(description="Email address") class Office(BaseModel): location: str = Field(description="Office location/city") address: str = Field(description="Full address") phone: str = Field(description="Contact number") class CompanyData(BaseModel): name: str = Field(description="Company name") description: str = Field(description="Company description") industry: str = Field(description="Industry sector") founded_year: int = Field(description="Year company was founded") employees: List[Employee] = Field(description="List of key employees") offices: List[Office] = Field(description="Company office locations") website: str = Field(description="Company website URL") # Extract comprehensive company information response = client.smartscraper( website_url="https://example.com/about", user_prompt="Extract detailed company information including employees and offices", output_schema=CompanyData ) # Access nested data print(f"Company: {response.name}") print("\nKey Employees:") for employee in response.employees: print(f"- {employee.name} ({employee.position})") print("\nOffice Locations:") for office in response.offices: print(f"- {office.location}: {office.address}") ``` For modern web applications built with React, Vue, Angular, or other JavaScript frameworks: ```python theme={null} from scrapegraph_py import Client from pydantic import BaseModel, Field class ProductInfo(BaseModel): name: str = Field(description="Product name") price: str = Field(description="Product price") description: str = Field(description="Product description") availability: str = Field(description="Product availability status") client = Client(api_key="your-api-key") # Enable enhanced JavaScript rendering for a React-based e-commerce site response = client.smartscraper( website_url="https://example-react-store.com/products/123", user_prompt="Extract product details including name, price, description, and availability", output_schema=ProductInfo, render_heavy_js=True # Enable for React/Vue/Angular sites ) print(f"Product: {response['result']['name']}") print(f"Price: {response['result']['price']}") print(f"Available: {response['result']['availability']}") ``` **When to use `render_heavy_js`:** * React, Vue, or Angular applications * Single Page Applications (SPAs) * Sites with heavy client-side rendering * Dynamic content loaded via JavaScript * Interactive elements that depend on JavaScript execution ### SearchScraper Search and extract information from multiple web sources using AI: ```python theme={null} from scrapegraph_py.models import TimeRange response = client.searchscraper( user_prompt="What are the key features and pricing of ChatGPT Plus?", time_range=TimeRange.PAST_WEEK # Optional: Filter results by time range ) ``` #### Parameters | Parameter | Type | Required | Description | | ------------------- | --------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | user\_prompt | string | Yes | A textual description of what you want to achieve. | | num\_results | number | No | Number of websites to search (3-20). Default: 3. | | extraction\_mode | boolean | No | **True** = AI extraction mode (10 credits/page), **False** = markdown mode (2 credits/page). Default: True | | output\_schema | object | No | The Pydantic object that describes the structure and format of the response (AI extraction mode only) | | location\_geo\_code | string | No | Optional geo code for location-based search (e.g., "us") | | time\_range | TimeRange | No | Optional time range filter for search results. Options: TimeRange.PAST\_HOUR, TimeRange.PAST\_24\_HOURS, TimeRange.PAST\_WEEK, TimeRange.PAST\_MONTH, TimeRange.PAST\_YEAR | Define a simple schema for structured search results: ```python theme={null} from pydantic import BaseModel, Field from typing import List class ProductInfo(BaseModel): name: str = Field(description="Product name") description: str = Field(description="Product description") price: str = Field(description="Product price") features: List[str] = Field(description="List of key features") availability: str = Field(description="Availability information") from scrapegraph_py.models import TimeRange response = client.searchscraper( user_prompt="Find information about iPhone 15 Pro", output_schema=ProductInfo, location_geo_code="us", # Optional: Geo code for location-based search time_range=TimeRange.PAST_MONTH # Optional: Filter results by time range ) print(f"Product: {response.name}") print(f"Price: {response.price}") print("\nFeatures:") for feature in response.features: print(f"- {feature}") ``` Define a complex schema for comprehensive market research: ```python theme={null} from typing import List from pydantic import BaseModel, Field class MarketPlayer(BaseModel): name: str = Field(description="Company name") market_share: str = Field(description="Market share percentage") key_products: List[str] = Field(description="Main products in market") strengths: List[str] = Field(description="Company's market strengths") class MarketTrend(BaseModel): name: str = Field(description="Trend name") description: str = Field(description="Trend description") impact: str = Field(description="Expected market impact") timeframe: str = Field(description="Trend timeframe") class MarketAnalysis(BaseModel): market_size: str = Field(description="Total market size") growth_rate: str = Field(description="Annual growth rate") key_players: List[MarketPlayer] = Field(description="Major market players") trends: List[MarketTrend] = Field(description="Market trends") challenges: List[str] = Field(description="Industry challenges") opportunities: List[str] = Field(description="Market opportunities") from scrapegraph_py.models import TimeRange # Perform comprehensive market research response = client.searchscraper( user_prompt="Analyze the current AI chip market landscape", output_schema=MarketAnalysis, location_geo_code="us", # Optional: Geo code for location-based search time_range=TimeRange.PAST_MONTH # Optional: Filter results by time range ) # Access structured market data print(f"Market Size: {response.market_size}") print(f"Growth Rate: {response.growth_rate}") print("\nKey Players:") for player in response.key_players: print(f"\n{player.name}") print(f"Market Share: {player.market_share}") print("Key Products:") for product in player.key_products: print(f"- {product}") print("\nMarket Trends:") for trend in response.trends: print(f"\n{trend.name}") print(f"Impact: {trend.impact}") print(f"Timeframe: {trend.timeframe}") ``` Use markdown mode for cost-effective content gathering: ```python theme={null} from scrapegraph_py import Client client = Client(api_key="your-api-key") from scrapegraph_py.models import TimeRange # Enable markdown mode for cost-effective content gathering response = client.searchscraper( user_prompt="Latest developments in artificial intelligence", num_results=3, extraction_mode=False, # Enable markdown mode (2 credits per page vs 10 credits) location_geo_code="us", # Optional: Geo code for location-based search time_range=TimeRange.PAST_WEEK # Optional: Filter results by time range ) # Access the raw markdown content markdown_content = response['markdown_content'] reference_urls = response['reference_urls'] print(f"Markdown content length: {len(markdown_content)} characters") print(f"Reference URLs: {len(reference_urls)}") # Process the markdown content print("Content preview:", markdown_content[:500] + "...") # Save to file for analysis with open('ai_research_content.md', 'w', encoding='utf-8') as f: f.write(markdown_content) print("Content saved to ai_research_content.md") ``` **Markdown Mode Benefits:** * **Cost-effective**: Only 2 credits per page (vs 10 credits for AI extraction) * **Full content**: Get complete page content in markdown format * **Faster**: No AI processing overhead * **Perfect for**: Content analysis, bulk data collection, building datasets Filter search results by date range to get only recent information: ```python theme={null} from scrapegraph_py import Client from scrapegraph_py.models import TimeRange client = Client(api_key="your-api-key") # Search for recent news from the past week response = client.searchscraper( user_prompt="Latest news about AI developments", num_results=5, time_range=TimeRange.PAST_WEEK # Options: PAST_HOUR, PAST_24_HOURS, PAST_WEEK, PAST_MONTH, PAST_YEAR ) print("Recent AI news:", response['result']) print("Reference URLs:", response['reference_urls']) ``` **Time Range Options:** * `TimeRange.PAST_HOUR` - Results from the past hour * `TimeRange.PAST_24_HOURS` - Results from the past 24 hours * `TimeRange.PAST_WEEK` - Results from the past week * `TimeRange.PAST_MONTH` - Results from the past month * `TimeRange.PAST_YEAR` - Results from the past year **Use Cases:** * Finding recent news and updates * Tracking time-sensitive information * Getting latest product releases * Monitoring recent market changes ### Markdownify Convert any webpage into clean, formatted markdown: ```python theme={null} response = client.markdownify( website_url="https://example.com" ) ``` ## Async Support All endpoints support asynchronous operations: ```python theme={null} import asyncio from scrapegraph_py import AsyncClient async def main(): async with AsyncClient() as client: response = await client.smartscraper( website_url="https://example.com", user_prompt="Extract the main content" ) print(response) asyncio.run(main()) ``` ## Feedback Help us improve by submitting feedback programmatically: ```python theme={null} client.submit_feedback( request_id="your-request-id", rating=5, feedback_text="Great results!" ) ``` ## Support Report issues and contribute to the SDK Get help from our development team This project is licensed under the MIT License. See the [LICENSE](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/LICENSE) file for details.