ScrapeHero Crawler API Documentation
This guide explains how to interact with crawler projects in ScrapeHero Cloud using our REST API.
Overview
The ScrapeHero Cloud API lets you programmatically control and monitor your crawler projects. You can:
- List and manage crawlers
- Monitor job runs
- Access scraped datasets
- Control crawler execution
⚠️ Important Notice for Existing Users
The legacy endpoint
cloud.scrapehero.com/api/v2
remains fully operational and supported.
- If you're an existing user, you can continue using this endpoint without modifying your integrations
- Both
cloud.scrapehero.com/api/v2
andapp.scrapehero.com/api/v2
provide identical functionality
Authentication
Before using the API, generate an authorization token:
- Navigate to your project's dashboard
- Open the Integrations tab
- Generate your auth token
Include this token in the Authorization
header of all API requests:
Authorization: Token `your-auth-token-here`
API Endpoints
List All Crawlers
Lists all crawler projects available to your account.
GET https://app.scrapehero.com/api/v2/crawlers/
Response:
{
"count": 0,
"next": "http://example.com",
"previous": "http://example.com",
"results": [
{
"name": "string",
"slug": "string"
}
]
}
Get Crawler Details
Retrieves details for a specific crawler.
GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/
Response:
{
"name": "string",
"slug": "string"
}
List Crawler Jobs
Retrieves all job runs for a crawler project.
What is a Job run?
A job run refers to each individual execution of a specific project. For example, if a project has been run 10 times, each of those executions is considered a separate job run.
GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/jobs/
Response:
{
"count": 0,
"next": "http://example.com",
"previous": "http://example.com",
"results": [
{
"id": 0,
"status": "waiting",
"pages_crawled": "string",
"start_time": "2019-08-24T14:15:22Z",
"end_time": "2019-08-24T14:15:22Z",
"dataset_count": "string"
}
]
}
Get Job Details
Retrieves details for a specific job run.
GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/jobs/{job_pk}/
Response:
{
"id": 0,
"status": "waiting",
"pages_crawled": "string",
"start_time": "2019-08-24T14:15:22Z",
"end_time": "2019-08-24T14:15:22Z",
"dataset_count": "string",
"crawler_input": "string",
"datasets": [
{
"id": 0,
"name": "string",
"count": -9223372036854776000
}
]
}
List Datasets
Lists all datasets for a specific job.
GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/jobs/{job_pk}/datasets/
Response:
[
{
"id": 0,
"name": "string",
"count": -9223372036854776000
}
]
Get Dataset Details
Retrieves details for a specific dataset.
"GET" https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/jobs/{job_pk}/datasets/{data_set_pk}/
Response:
{
"id": 0,
"name": "string",
"count": -9223372036854776000
}
Stream Dataset Data
Streams the data from a specific dataset.
GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/jobs/{job_pk}/datasets/{data_set_pk}/data/
Get Latest Job
Retrieves the most recent job for a crawler.
GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/latest-job/
Response:
{
"id": 0,
"status": "waiting",
"pages_crawled": "string",
"start_time": "2019-08-24T14:15:22Z",
"end_time": "2019-08-24T14:15:22Z",
"dataset_count": "string",
"crawler_input": "string",
"datasets": [
{
"id": 0,
"name": "string",
"count": -9223372036854776000
}
],
"crawler_slug": "string"
}
Control Crawler
Start or stop a crawler project.
POST https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/{action}/
Where {action}
is either start
or stop
.
Response:
{
"detail": "string",
"job": {
"id": 0,
"status": "waiting",
"pages_crawled": "string",
"start_time": "2019-08-24T14:15:22Z",
"end_time": "2019-08-24T14:15:22Z",
"dataset_count": "string"
}
}
Get Subscription Details
Retrieves your current subscription information.
GET https://app.scrapehero.com/api/v2/user/subscription/
Response:
{
"name": "string",
"email": "user@example.com",
"page_credits": {
"remaining": 0,
"total": 0
},
"plan_name": "Free",
"renewal_date": "2019-08-24T14:15:22Z",
"metadata": {}
}
Error Handling
The API uses standard HTTP response codes:
- 200: Success
- 400: Bad request
- 401: Unauthorized
- 403: Forbidden
- 404: Not found
- 500: Internal server error
If an error occurs, the response will include a message explaining what went wrong and how to fix it.
Rate Limiting
The API implements rate limiting based on your subscription plan. Check your subscription details endpoint for your current limits.
Need Help?
If you encounter any issues or need assistance:
- Email our support team at cloud-support@scrapehero.com
- Visit our documentation at app.scrapehero.com/docs/