Skip to main content

ScrapeHero Crawler API Documentation

This guide explains how to interact with crawler projects in ScrapeHero Cloud using our REST API.

Overview

The ScrapeHero Cloud API lets you programmatically control and monitor your crawler projects. You can:

  • List and manage crawlers
  • Monitor job runs
  • Access scraped datasets
  • Control crawler execution

⚠️ Important Notice for Existing Users

The legacy endpoint cloud.scrapehero.com/api/v2 remains fully operational and supported.

  • If you're an existing user, you can continue using this endpoint without modifying your integrations
  • Both cloud.scrapehero.com/api/v2 and app.scrapehero.com/api/v2 provide identical functionality

Authentication

Before using the API, generate an authorization token:

  1. Navigate to your project's dashboard
  2. Open the Integrations tab
  3. Generate your auth token

Include this token in the Authorization header of all API requests:

Authorization: Token `your-auth-token-here`

API Endpoints

List All Crawlers

Lists all crawler projects available to your account.

GET https://app.scrapehero.com/api/v2/crawlers/

Response:

{
"count": 0,
"next": "http://example.com",
"previous": "http://example.com",
"results": [
{
"name": "string",
"slug": "string"
}
]
}

Get Crawler Details

Retrieves details for a specific crawler.

GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/

Response:

{
"name": "string",
"slug": "string"
}

List Crawler Jobs

Retrieves all job runs for a crawler project.

What is a Job run?

A job run refers to each individual execution of a specific project. For example, if a project has been run 10 times, each of those executions is considered a separate job run.

GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/jobs/

Response:

{
"count": 0,
"next": "http://example.com",
"previous": "http://example.com",
"results": [
{
"id": 0,
"status": "waiting",
"pages_crawled": "string",
"start_time": "2019-08-24T14:15:22Z",
"end_time": "2019-08-24T14:15:22Z",
"dataset_count": "string"
}
]
}

Get Job Details

Retrieves details for a specific job run.

GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/jobs/{job_pk}/

Response:

{
"id": 0,
"status": "waiting",
"pages_crawled": "string",
"start_time": "2019-08-24T14:15:22Z",
"end_time": "2019-08-24T14:15:22Z",
"dataset_count": "string",
"crawler_input": "string",
"datasets": [
{
"id": 0,
"name": "string",
"count": -9223372036854776000
}
]
}

List Datasets

Lists all datasets for a specific job.

GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/jobs/{job_pk}/datasets/

Response:

[
{
"id": 0,
"name": "string",
"count": -9223372036854776000
}
]

Get Dataset Details

Retrieves details for a specific dataset.

"GET" https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/jobs/{job_pk}/datasets/{data_set_pk}/

Response:

{
"id": 0,
"name": "string",
"count": -9223372036854776000
}

Stream Dataset Data

Streams the data from a specific dataset.

GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/jobs/{job_pk}/datasets/{data_set_pk}/data/

Get Latest Job

Retrieves the most recent job for a crawler.

GET https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/latest-job/

Response:

{
"id": 0,
"status": "waiting",
"pages_crawled": "string",
"start_time": "2019-08-24T14:15:22Z",
"end_time": "2019-08-24T14:15:22Z",
"dataset_count": "string",
"crawler_input": "string",
"datasets": [
{
"id": 0,
"name": "string",
"count": -9223372036854776000
}
],
"crawler_slug": "string"
}

Control Crawler

Start or stop a crawler project.

POST https://app.scrapehero.com/api/v2/crawlers/{crawler_slug}/{action}/

Where {action} is either start or stop.

Response:

{
"detail": "string",
"job": {
"id": 0,
"status": "waiting",
"pages_crawled": "string",
"start_time": "2019-08-24T14:15:22Z",
"end_time": "2019-08-24T14:15:22Z",
"dataset_count": "string"
}
}

Get Subscription Details

Retrieves your current subscription information.

GET https://app.scrapehero.com/api/v2/user/subscription/

Response:

{
"name": "string",
"email": "user@example.com",
"page_credits": {
"remaining": 0,
"total": 0
},
"plan_name": "Free",
"renewal_date": "2019-08-24T14:15:22Z",
"metadata": {}
}

Error Handling

The API uses standard HTTP response codes:

  • 200: Success
  • 400: Bad request
  • 401: Unauthorized
  • 403: Forbidden
  • 404: Not found
  • 500: Internal server error

If an error occurs, the response will include a message explaining what went wrong and how to fix it.

Rate Limiting

The API implements rate limiting based on your subscription plan. Check your subscription details endpoint for your current limits.

Need Help?

If you encounter any issues or need assistance: