> ## Documentation Index
> Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt
> Use this file to discover all available pages before exploring further.
# Python SDK
> Official Python SDK for ScrapeGraphAI
[](https://badge.fury.io/py/scrapegraph-py)
[](https://pypi.org/project/scrapegraph-py/)
## Installation
Install the package using pip:
```bash theme={null}
pip install scrapegraph-py
```
## Features
* **AI-Powered Extraction**: Advanced web scraping using artificial intelligence
* **Flexible Clients**: Both synchronous and asynchronous support
* **Type Safety**: Structured output with Pydantic schemas
* **Production Ready**: Detailed logging and automatic retries
* **Developer Friendly**: Comprehensive error handling
## Quick Start
Initialize the client with your API key:
```python theme={null}
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
```
You can also set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `client = Client()`
## Services
### SmartScraper
Extract specific information from any webpage using AI:
```python theme={null}
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main heading and description"
)
```
#### Parameters
| Parameter | Type | Required | Description |
| ----------------- | ------- | -------- | ------------------------------------------------------------------------------------------------------ |
| website\_url | string | Yes | The URL of the webpage that needs to be scraped. |
| user\_prompt | string | Yes | A textual description of what you want to achieve. |
| output\_schema | object | No | The Pydantic object that describes the structure and format of the response. |
| render\_heavy\_js | boolean | No | Enable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, etc.). Default: False |
Define a simple schema for basic data extraction:
```python theme={null}
from pydantic import BaseModel, Field
class ArticleData(BaseModel):
title: str = Field(description="The article title")
author: str = Field(description="The author's name")
publish_date: str = Field(description="Article publication date")
content: str = Field(description="Main article content")
category: str = Field(description="Article category")
response = client.smartscraper(
website_url="https://example.com/blog/article",
user_prompt="Extract the article information",
output_schema=ArticleData
)
print(f"Title: {response.title}")
print(f"Author: {response.author}")
print(f"Published: {response.publish_date}")
```
Define a complex schema for nested data structures:
```python theme={null}
from typing import List
from pydantic import BaseModel, Field
class Employee(BaseModel):
name: str = Field(description="Employee's full name")
position: str = Field(description="Job title")
department: str = Field(description="Department name")
email: str = Field(description="Email address")
class Office(BaseModel):
location: str = Field(description="Office location/city")
address: str = Field(description="Full address")
phone: str = Field(description="Contact number")
class CompanyData(BaseModel):
name: str = Field(description="Company name")
description: str = Field(description="Company description")
industry: str = Field(description="Industry sector")
founded_year: int = Field(description="Year company was founded")
employees: List[Employee] = Field(description="List of key employees")
offices: List[Office] = Field(description="Company office locations")
website: str = Field(description="Company website URL")
# Extract comprehensive company information
response = client.smartscraper(
website_url="https://example.com/about",
user_prompt="Extract detailed company information including employees and offices",
output_schema=CompanyData
)
# Access nested data
print(f"Company: {response.name}")
print("\nKey Employees:")
for employee in response.employees:
print(f"- {employee.name} ({employee.position})")
print("\nOffice Locations:")
for office in response.offices:
print(f"- {office.location}: {office.address}")
```
For modern web applications built with React, Vue, Angular, or other JavaScript frameworks:
```python theme={null}
from scrapegraph_py import Client
from pydantic import BaseModel, Field
class ProductInfo(BaseModel):
name: str = Field(description="Product name")
price: str = Field(description="Product price")
description: str = Field(description="Product description")
availability: str = Field(description="Product availability status")
client = Client(api_key="your-api-key")
# Enable enhanced JavaScript rendering for a React-based e-commerce site
response = client.smartscraper(
website_url="https://example-react-store.com/products/123",
user_prompt="Extract product details including name, price, description, and availability",
output_schema=ProductInfo,
render_heavy_js=True # Enable for React/Vue/Angular sites
)
print(f"Product: {response['result']['name']}")
print(f"Price: {response['result']['price']}")
print(f"Available: {response['result']['availability']}")
```
**When to use `render_heavy_js`:**
* React, Vue, or Angular applications
* Single Page Applications (SPAs)
* Sites with heavy client-side rendering
* Dynamic content loaded via JavaScript
* Interactive elements that depend on JavaScript execution
### SearchScraper
Search and extract information from multiple web sources using AI:
```python theme={null}
from scrapegraph_py.models import TimeRange
response = client.searchscraper(
user_prompt="What are the key features and pricing of ChatGPT Plus?",
time_range=TimeRange.PAST_WEEK # Optional: Filter results by time range
)
```
#### Parameters
| Parameter | Type | Required | Description |
| ------------------- | --------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| user\_prompt | string | Yes | A textual description of what you want to achieve. |
| num\_results | number | No | Number of websites to search (3-20). Default: 3. |
| extraction\_mode | boolean | No | **True** = AI extraction mode (10 credits/page), **False** = markdown mode (2 credits/page). Default: True |
| output\_schema | object | No | The Pydantic object that describes the structure and format of the response (AI extraction mode only) |
| location\_geo\_code | string | No | Optional geo code for location-based search (e.g., "us") |
| time\_range | TimeRange | No | Optional time range filter for search results. Options: TimeRange.PAST\_HOUR, TimeRange.PAST\_24\_HOURS, TimeRange.PAST\_WEEK, TimeRange.PAST\_MONTH, TimeRange.PAST\_YEAR |
Define a simple schema for structured search results:
```python theme={null}
from pydantic import BaseModel, Field
from typing import List
class ProductInfo(BaseModel):
name: str = Field(description="Product name")
description: str = Field(description="Product description")
price: str = Field(description="Product price")
features: List[str] = Field(description="List of key features")
availability: str = Field(description="Availability information")
from scrapegraph_py.models import TimeRange
response = client.searchscraper(
user_prompt="Find information about iPhone 15 Pro",
output_schema=ProductInfo,
location_geo_code="us", # Optional: Geo code for location-based search
time_range=TimeRange.PAST_MONTH # Optional: Filter results by time range
)
print(f"Product: {response.name}")
print(f"Price: {response.price}")
print("\nFeatures:")
for feature in response.features:
print(f"- {feature}")
```
Define a complex schema for comprehensive market research:
```python theme={null}
from typing import List
from pydantic import BaseModel, Field
class MarketPlayer(BaseModel):
name: str = Field(description="Company name")
market_share: str = Field(description="Market share percentage")
key_products: List[str] = Field(description="Main products in market")
strengths: List[str] = Field(description="Company's market strengths")
class MarketTrend(BaseModel):
name: str = Field(description="Trend name")
description: str = Field(description="Trend description")
impact: str = Field(description="Expected market impact")
timeframe: str = Field(description="Trend timeframe")
class MarketAnalysis(BaseModel):
market_size: str = Field(description="Total market size")
growth_rate: str = Field(description="Annual growth rate")
key_players: List[MarketPlayer] = Field(description="Major market players")
trends: List[MarketTrend] = Field(description="Market trends")
challenges: List[str] = Field(description="Industry challenges")
opportunities: List[str] = Field(description="Market opportunities")
from scrapegraph_py.models import TimeRange
# Perform comprehensive market research
response = client.searchscraper(
user_prompt="Analyze the current AI chip market landscape",
output_schema=MarketAnalysis,
location_geo_code="us", # Optional: Geo code for location-based search
time_range=TimeRange.PAST_MONTH # Optional: Filter results by time range
)
# Access structured market data
print(f"Market Size: {response.market_size}")
print(f"Growth Rate: {response.growth_rate}")
print("\nKey Players:")
for player in response.key_players:
print(f"\n{player.name}")
print(f"Market Share: {player.market_share}")
print("Key Products:")
for product in player.key_products:
print(f"- {product}")
print("\nMarket Trends:")
for trend in response.trends:
print(f"\n{trend.name}")
print(f"Impact: {trend.impact}")
print(f"Timeframe: {trend.timeframe}")
```
Use markdown mode for cost-effective content gathering:
```python theme={null}
from scrapegraph_py import Client
client = Client(api_key="your-api-key")
from scrapegraph_py.models import TimeRange
# Enable markdown mode for cost-effective content gathering
response = client.searchscraper(
user_prompt="Latest developments in artificial intelligence",
num_results=3,
extraction_mode=False, # Enable markdown mode (2 credits per page vs 10 credits)
location_geo_code="us", # Optional: Geo code for location-based search
time_range=TimeRange.PAST_WEEK # Optional: Filter results by time range
)
# Access the raw markdown content
markdown_content = response['markdown_content']
reference_urls = response['reference_urls']
print(f"Markdown content length: {len(markdown_content)} characters")
print(f"Reference URLs: {len(reference_urls)}")
# Process the markdown content
print("Content preview:", markdown_content[:500] + "...")
# Save to file for analysis
with open('ai_research_content.md', 'w', encoding='utf-8') as f:
f.write(markdown_content)
print("Content saved to ai_research_content.md")
```
**Markdown Mode Benefits:**
* **Cost-effective**: Only 2 credits per page (vs 10 credits for AI extraction)
* **Full content**: Get complete page content in markdown format
* **Faster**: No AI processing overhead
* **Perfect for**: Content analysis, bulk data collection, building datasets
Filter search results by date range to get only recent information:
```python theme={null}
from scrapegraph_py import Client
from scrapegraph_py.models import TimeRange
client = Client(api_key="your-api-key")
# Search for recent news from the past week
response = client.searchscraper(
user_prompt="Latest news about AI developments",
num_results=5,
time_range=TimeRange.PAST_WEEK # Options: PAST_HOUR, PAST_24_HOURS, PAST_WEEK, PAST_MONTH, PAST_YEAR
)
print("Recent AI news:", response['result'])
print("Reference URLs:", response['reference_urls'])
```
**Time Range Options:**
* `TimeRange.PAST_HOUR` - Results from the past hour
* `TimeRange.PAST_24_HOURS` - Results from the past 24 hours
* `TimeRange.PAST_WEEK` - Results from the past week
* `TimeRange.PAST_MONTH` - Results from the past month
* `TimeRange.PAST_YEAR` - Results from the past year
**Use Cases:**
* Finding recent news and updates
* Tracking time-sensitive information
* Getting latest product releases
* Monitoring recent market changes
### Markdownify
Convert any webpage into clean, formatted markdown:
```python theme={null}
response = client.markdownify(
website_url="https://example.com"
)
```
## Async Support
All endpoints support asynchronous operations:
```python theme={null}
import asyncio
from scrapegraph_py import AsyncClient
async def main():
async with AsyncClient() as client:
response = await client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main content"
)
print(response)
asyncio.run(main())
```
## Feedback
Help us improve by submitting feedback programmatically:
```python theme={null}
client.submit_feedback(
request_id="your-request-id",
rating=5,
feedback_text="Great results!"
)
```
## Support
Report issues and contribute to the SDK
Get help from our development team
This project is licensed under the MIT License. See the [LICENSE](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/LICENSE) file for details.