back to home

v1 · stable

CrawlGraph API

A small HTTP API for programmatic backlink lookups, release discovery, and gap-analysis jobs. JSON in, JSON out. Bearer-token auth. Designed to fit into n8n, scripts, dashboards — anywhere you'd write five lines of code instead of clicking.

Available to lifetime-tier customers. Free accounts get zero API calls — pick up a lifetime licence on the landing page if you don't have one yet.

1. Quickstart

Three lines of curl. Replace the token with your own from the account page.

curl -X POST https://crawlgraph.com/api/v1/backlinks \
  -H "Authorization: Bearer cg_live_…" \
  -H "Content-Type: application/json" \
  -d '{"domain": "example.com"}'

Response (excerpt)

{
  "domain": "example.com",
  "release_id": "CC-MAIN-2026-04",
  "release_label": "Apr 2026",
  "total_linking_domains": 4821,
  "returned": 1000,
  "results": [
    { "linking_domain": "blog.foo.com", "num_hosts": 12, "tld": "com",
      "cg_authority": 84, "cg_rank": 1421 },
    { "linking_domain": "news.bar.org", "num_hosts": 7,  "tld": "org",
      "cg_authority": 71, "cg_rank": 9402 }
  ]
}

2. Authentication

Every request to /api/v1/* needs a bearer token in the Authorization header:

Authorization: Bearer cg_live_<your-key>
  • Keys are prefixed with cg_live_ and roughly 52 characters long.
  • Get a key → from your account page. You can have up to 10 active keys per user and label each one (e.g. production, n8n-bot).
  • The full key is shown only once at creation. If you lose it, revoke and create a new one — there's no recovery path.
  • All /api/v1/* endpoints require a valid, non-revoked key tied to an active lifetime user.

3. Quotas & rate limits

ResourceMonthly quotaCounter
backlinks calls1,000 / moper user
gap-analysis jobs50 / moper user
releases lookupsunlimitednot counted
  • Window is the calendar month in UTC. Hard reset on the 1st at 00:00 UTC — no rollover.
  • Only successful (2xx) calls count. Validation errors, auth failures, and quota rejections are free.
  • Failed gap jobs do not refund quota in v1. If something looks wrong, email support and quote the request_id.
  • A separate IP-based limiter caps bursts at roughly 60 requests per minute on /api/v1/* to protect the backend.

Response headers

Every 2xx response (and 429s) carries these headers so your client can pace itself:

HeaderMeaning
X-RateLimit-Limit-BacklinksMonthly cap (1000).
X-RateLimit-Remaining-BacklinksCalls left this month.
X-RateLimit-Limit-GapMonthly gap-job cap (50).
X-RateLimit-Remaining-GapGap jobs left this month.
X-RateLimit-ResetUnix timestamp of the next month rollover.
X-Request-IDEcho this on support tickets.
Retry-AfterSeconds until quota resets. Sent only on 429.

4. Errors

Every non-2xx response uses the same envelope:

{
  "error": "<code>",
  "message": "<human readable>",
  "request_id": "req_a1b2c3d4"
}
CodeStatusMeaning
auth_missing401Authorization header missing or malformed.
auth_invalid401Key unknown, revoked, or owner refunded.
quota_exceeded429Monthly quota hit; check Retry-After.
validation_error400Request body or query failed validation.
not_found404Resource doesn't exist or isn't yours.
internal_error500Server bug — quote the request_id.

5. Endpoints

POST/api/v1/backlinks

Synchronous backlink lookup for a single domain. Counts against the backlinks quota.

Request body

{
  "domain": "example.com",
  "release_id": "CC-MAIN-2026-04",   // optional; default = latest
  "limit": 1000,                      // optional; default 1000, max 10000
  "sort": "authority"                 // optional; "authority" (default) | "hosts"
}

Response

{
  "domain": "example.com",
  "release_id": "CC-MAIN-2026-04",
  "release_label": "Apr 2026",
  "total_linking_domains": 4821,
  "returned": 1000,
  "results": [
    { "linking_domain": "blog.foo.com", "num_hosts": 12, "tld": "com",
      "cg_authority": 84, "cg_rank": 1421 },
    { "linking_domain": "news.bar.org", "num_hosts":  7, "tld": "org",
      "cg_authority": 71, "cg_rank": 9402 }
  ]
}

Notes

  • limit caps at 10,000. The API is for programmatic use, not bulk export — use the dashboard for full datasets.
  • Field names match the internal service: linking_domain, num_hosts, tld. No rename layer.
  • cg_authority is a 0–100 log-rank percentile derived from Common Crawl's harmonic centrality (higher = more authoritative). cg_rank is the raw PageRank position across the whole graph (1 = top-ranked domain). Both are null for domains that don't appear in the ranks file.
  • sort="authority" (default) orders by cg_authority DESC then num_hosts DESC; sort="hosts" preserves the legacy num_hosts DESC order.
  • Malformed domain, unknown release_id, or out-of-range limit 400 validation_error.

curl

curl -X POST https://crawlgraph.com/api/v1/backlinks \
  -H "Authorization: Bearer cg_live_…" \
  -H "Content-Type: application/json" \
  -d '{"domain": "example.com", "limit": 500}'

Python (requests)

import os, requests

r = requests.post(
    "https://crawlgraph.com/api/v1/backlinks",
    headers={"Authorization": f"Bearer {os.environ['CG_KEY']}"},
    json={"domain": "example.com", "limit": 500},
    timeout=30,
)
r.raise_for_status()
data = r.json()
print(data["total_linking_domains"], "linking domains")

Node (fetch)

const res = await fetch("https://crawlgraph.com/api/v1/backlinks", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.CG_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ domain: "example.com", limit: 500 }),
});
if (!res.ok) throw new Error(`HTTP ${res.status}`);
const data = await res.json();
console.log(data.total_linking_domains, "linking domains");

GET /api/v1/releases

GET/api/v1/releases

List the Common Crawl releases CrawlGraph has indexed. Read-only, not counted against quota — this is the caller learning what to pass.

Response

{
  "releases": [
    { "id": "CC-MAIN-2026-04", "label": "Apr 2026", "available": true },
    { "id": "CC-MAIN-2025-50", "label": "Dec 2025", "available": true }
  ]
}

curl

curl https://crawlgraph.com/api/v1/releases \
  -H "Authorization: Bearer cg_live_…"

Python (requests)

import os, requests

r = requests.get(
    "https://crawlgraph.com/api/v1/releases",
    headers={"Authorization": f"Bearer {os.environ['CG_KEY']}"},
    timeout=30,
)
r.raise_for_status()
for rel in r.json()["releases"]:
    print(rel["id"], rel["label"])

Node (fetch)

const res = await fetch("https://crawlgraph.com/api/v1/releases", {
  headers: { Authorization: `Bearer ${process.env.CG_KEY}` },
});
const { releases } = await res.json();
for (const r of releases) console.log(r.id, r.label);

POST /api/v1/gap-analysis · GET /api/v1/gap-analysis/{job_id}

POST/api/v1/gap-analysis
GET/api/v1/gap-analysis/{job_id}

Async gap-analysis. POST submits a job (counts against the gap quota); GET polls for status and result. Jobs are retained for 7 days.

POST request body

{
  "my_domain": "example.com",
  "competitor_domains": ["a.com", "b.com", "c.com"]   // 1–5 entries
}

POST response (202)

{
  "job_id": "gap_a1b2c3",
  "status": "queued",
  "poll_url": "/api/v1/gap-analysis/gap_a1b2c3"
}

GET response — running

{
  "job_id": "gap_a1b2c3",
  "status": "running",
  "started_at": "2026-04-27T12:34:56Z",
  "progress_pct": 42
}

GET response — completed

{
  "job_id": "gap_a1b2c3",
  "status": "completed",
  "completed_at": "2026-04-27T12:36:18Z",
  "result": {
    "my_domain": "example.com",
    "competitor_domains": ["a.com", "b.com", "c.com"],
    "gaps": [
      { "linking_domain": "x.com", "found_on": ["a.com", "b.com"] }
    ],
    "total_gaps": 1284
  }
}

GET response — failed

{
  "job_id": "gap_a1b2c3",
  "status": "failed",
  "error": { "code": "internal_error", "message": "..." }
}

Notes

  • Max 5 competitors per request — matches the dashboard cap.
  • GET returns 404 not_found if the job isn't yours, even if the id exists.
  • Failed jobs don't refund quota in v1. Email support with the request_id if it matters.

curl

# 1. Submit
curl -X POST https://crawlgraph.com/api/v1/gap-analysis \
  -H "Authorization: Bearer cg_live_…" \
  -H "Content-Type: application/json" \
  -d '{"my_domain": "example.com", "competitor_domains": ["a.com","b.com"]}'

# 2. Poll
curl https://crawlgraph.com/api/v1/gap-analysis/gap_a1b2c3 \
  -H "Authorization: Bearer cg_live_…"

Python (requests)

import os, time, requests

H = {"Authorization": f"Bearer {os.environ['CG_KEY']}"}
sub = requests.post(
    "https://crawlgraph.com/api/v1/gap-analysis",
    headers=H,
    json={
        "my_domain": "example.com",
        "competitor_domains": ["a.com", "b.com"],
    },
    timeout=30,
).json()

job_id = sub["job_id"]
while True:
    j = requests.get(
        f"https://crawlgraph.com/api/v1/gap-analysis/{job_id}",
        headers=H, timeout=30,
    ).json()
    if j["status"] in ("completed", "failed"):
        break
    time.sleep(5)
print(j)

Node (fetch)

const H = { Authorization: `Bearer ${process.env.CG_KEY}` };

const submit = await fetch("https://crawlgraph.com/api/v1/gap-analysis", {
  method: "POST",
  headers: { ...H, "Content-Type": "application/json" },
  body: JSON.stringify({
    my_domain: "example.com",
    competitor_domains: ["a.com", "b.com"],
  }),
}).then(r => r.json());

const jobId = submit.job_id;
let job;
do {
  await new Promise(r => setTimeout(r, 5000));
  job = await fetch(
    `https://crawlgraph.com/api/v1/gap-analysis/${jobId}`,
    { headers: H },
  ).then(r => r.json());
} while (job.status !== "completed" && job.status !== "failed");
console.log(job);

GET /api/v1/changes

GET/api/v1/changes

Quarter-over-quarter diff for a domain. Returns referring domains added or lost between two CC releases. Counts as one call against the backlinks quota — same bucket as POST /api/v1/backlinks.

Today this endpoint is a stub. We have ingested only one Common Crawl quarter so there is nothing to diff against. The shape below is final; the response body just carries comparison_available: false until the next release lands (estimated July 2026).

Query parameters

  • domain — optional; the target host (same shape as /backlinks).
  • from — optional; older release id. Defaults to the previous quarter (which doesn't exist yet → stub).
  • to — optional; newer release id. Defaults to the latest available release.

Response — stub (today)

{
  "comparison_available": false,
  "current_release": "cc-main-2026-jan-feb-mar",
  "next_available_after": "cc-main-2026-apr-may-jun",
  "next_available_estimate": "2026-07-15",
  "message": "first delta will be available after the next quarterly Common Crawl release ingests"
}

curl

curl "https://crawlgraph.com/api/v1/changes?domain=example.com" \
  -H "Authorization: Bearer cg_live_…"

6. Webhooks & changelog

Webhooks are coming in a future version. For now, polling the gap-analysis job endpoint is the only async pattern.

The current API is at v1. Breaking changes will land on /api/v2 with a deprecation window — your v1 integrations won't break overnight. Check back here for changelogs.

7. OpenAPI

For tooling integration (Postman, openapi-typescript, Insomnia, etc.) the OpenAPI 3.1 schema is served at /api/v1/openapi.json. It covers exactly the endpoints documented above and is regenerated on every deploy.