v1 · stable
CrawlGraph API
A small HTTP API for programmatic backlink lookups, release discovery, and gap-analysis jobs. JSON in, JSON out. Bearer-token auth. Designed to fit into n8n, scripts, dashboards — anywhere you'd write five lines of code instead of clicking.
Available to lifetime-tier customers. Free accounts get zero API calls — pick up a lifetime licence on the landing page if you don't have one yet.
1. Quickstart
Three lines of curl. Replace the token with your own from the account page.
curl -X POST https://crawlgraph.com/api/v1/backlinks \
-H "Authorization: Bearer cg_live_…" \
-H "Content-Type: application/json" \
-d '{"domain": "example.com"}'Response (excerpt)
{
"domain": "example.com",
"release_id": "CC-MAIN-2026-04",
"release_label": "Apr 2026",
"total_linking_domains": 4821,
"returned": 1000,
"results": [
{ "linking_domain": "blog.foo.com", "num_hosts": 12, "tld": "com",
"cg_authority": 84, "cg_rank": 1421 },
{ "linking_domain": "news.bar.org", "num_hosts": 7, "tld": "org",
"cg_authority": 71, "cg_rank": 9402 }
]
}2. Authentication
Every request to /api/v1/* needs a bearer token in the Authorization header:
Authorization: Bearer cg_live_<your-key>- Keys are prefixed with
cg_live_and roughly 52 characters long. - Get a key → from your account page. You can have up to 10 active keys per user and label each one (e.g. production, n8n-bot).
- The full key is shown only once at creation. If you lose it, revoke and create a new one — there's no recovery path.
- All
/api/v1/*endpoints require a valid, non-revoked key tied to an active lifetime user.
3. Quotas & rate limits
| Resource | Monthly quota | Counter |
|---|---|---|
| backlinks calls | 1,000 / mo | per user |
| gap-analysis jobs | 50 / mo | per user |
| releases lookups | unlimited | not counted |
- Window is the calendar month in UTC. Hard reset on the 1st at 00:00 UTC — no rollover.
- Only successful (2xx) calls count. Validation errors, auth failures, and quota rejections are free.
- Failed gap jobs do not refund quota in v1. If something looks wrong, email support and quote the
request_id. - A separate IP-based limiter caps bursts at roughly 60 requests per minute on
/api/v1/*to protect the backend.
Response headers
Every 2xx response (and 429s) carries these headers so your client can pace itself:
| Header | Meaning |
|---|---|
| X-RateLimit-Limit-Backlinks | Monthly cap (1000). |
| X-RateLimit-Remaining-Backlinks | Calls left this month. |
| X-RateLimit-Limit-Gap | Monthly gap-job cap (50). |
| X-RateLimit-Remaining-Gap | Gap jobs left this month. |
| X-RateLimit-Reset | Unix timestamp of the next month rollover. |
| X-Request-ID | Echo this on support tickets. |
| Retry-After | Seconds until quota resets. Sent only on 429. |
4. Errors
Every non-2xx response uses the same envelope:
{
"error": "<code>",
"message": "<human readable>",
"request_id": "req_a1b2c3d4"
}| Code | Status | Meaning |
|---|---|---|
| auth_missing | 401 | Authorization header missing or malformed. |
| auth_invalid | 401 | Key unknown, revoked, or owner refunded. |
| quota_exceeded | 429 | Monthly quota hit; check Retry-After. |
| validation_error | 400 | Request body or query failed validation. |
| not_found | 404 | Resource doesn't exist or isn't yours. |
| internal_error | 500 | Server bug — quote the request_id. |
5. Endpoints
POST /api/v1/backlinks
/api/v1/backlinksSynchronous backlink lookup for a single domain. Counts against the backlinks quota.
Request body
{
"domain": "example.com",
"release_id": "CC-MAIN-2026-04", // optional; default = latest
"limit": 1000, // optional; default 1000, max 10000
"sort": "authority" // optional; "authority" (default) | "hosts"
}Response
{
"domain": "example.com",
"release_id": "CC-MAIN-2026-04",
"release_label": "Apr 2026",
"total_linking_domains": 4821,
"returned": 1000,
"results": [
{ "linking_domain": "blog.foo.com", "num_hosts": 12, "tld": "com",
"cg_authority": 84, "cg_rank": 1421 },
{ "linking_domain": "news.bar.org", "num_hosts": 7, "tld": "org",
"cg_authority": 71, "cg_rank": 9402 }
]
}Notes
- limit caps at 10,000. The API is for programmatic use, not bulk export — use the dashboard for full datasets.
- Field names match the internal service: linking_domain, num_hosts, tld. No rename layer.
- cg_authority is a 0–100 log-rank percentile derived from Common Crawl's harmonic centrality (higher = more authoritative). cg_rank is the raw PageRank position across the whole graph (1 = top-ranked domain). Both are null for domains that don't appear in the ranks file.
- sort="authority" (default) orders by cg_authority DESC then num_hosts DESC; sort="hosts" preserves the legacy num_hosts DESC order.
- Malformed domain, unknown release_id, or out-of-range limit → 400 validation_error.
curl
curl -X POST https://crawlgraph.com/api/v1/backlinks \
-H "Authorization: Bearer cg_live_…" \
-H "Content-Type: application/json" \
-d '{"domain": "example.com", "limit": 500}'Python (requests)
import os, requests
r = requests.post(
"https://crawlgraph.com/api/v1/backlinks",
headers={"Authorization": f"Bearer {os.environ['CG_KEY']}"},
json={"domain": "example.com", "limit": 500},
timeout=30,
)
r.raise_for_status()
data = r.json()
print(data["total_linking_domains"], "linking domains")Node (fetch)
const res = await fetch("https://crawlgraph.com/api/v1/backlinks", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.CG_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ domain: "example.com", limit: 500 }),
});
if (!res.ok) throw new Error(`HTTP ${res.status}`);
const data = await res.json();
console.log(data.total_linking_domains, "linking domains");GET /api/v1/releases
/api/v1/releasesList the Common Crawl releases CrawlGraph has indexed. Read-only, not counted against quota — this is the caller learning what to pass.
Response
{
"releases": [
{ "id": "CC-MAIN-2026-04", "label": "Apr 2026", "available": true },
{ "id": "CC-MAIN-2025-50", "label": "Dec 2025", "available": true }
]
}curl
curl https://crawlgraph.com/api/v1/releases \
-H "Authorization: Bearer cg_live_…"Python (requests)
import os, requests
r = requests.get(
"https://crawlgraph.com/api/v1/releases",
headers={"Authorization": f"Bearer {os.environ['CG_KEY']}"},
timeout=30,
)
r.raise_for_status()
for rel in r.json()["releases"]:
print(rel["id"], rel["label"])Node (fetch)
const res = await fetch("https://crawlgraph.com/api/v1/releases", {
headers: { Authorization: `Bearer ${process.env.CG_KEY}` },
});
const { releases } = await res.json();
for (const r of releases) console.log(r.id, r.label);POST /api/v1/gap-analysis · GET /api/v1/gap-analysis/{job_id}
/api/v1/gap-analysis/api/v1/gap-analysis/{job_id}Async gap-analysis. POST submits a job (counts against the gap quota); GET polls for status and result. Jobs are retained for 7 days.
POST request body
{
"my_domain": "example.com",
"competitor_domains": ["a.com", "b.com", "c.com"] // 1–5 entries
}POST response (202)
{
"job_id": "gap_a1b2c3",
"status": "queued",
"poll_url": "/api/v1/gap-analysis/gap_a1b2c3"
}GET response — running
{
"job_id": "gap_a1b2c3",
"status": "running",
"started_at": "2026-04-27T12:34:56Z",
"progress_pct": 42
}GET response — completed
{
"job_id": "gap_a1b2c3",
"status": "completed",
"completed_at": "2026-04-27T12:36:18Z",
"result": {
"my_domain": "example.com",
"competitor_domains": ["a.com", "b.com", "c.com"],
"gaps": [
{ "linking_domain": "x.com", "found_on": ["a.com", "b.com"] }
],
"total_gaps": 1284
}
}GET response — failed
{
"job_id": "gap_a1b2c3",
"status": "failed",
"error": { "code": "internal_error", "message": "..." }
}Notes
- Max 5 competitors per request — matches the dashboard cap.
- GET returns 404 not_found if the job isn't yours, even if the id exists.
- Failed jobs don't refund quota in v1. Email support with the request_id if it matters.
curl
# 1. Submit
curl -X POST https://crawlgraph.com/api/v1/gap-analysis \
-H "Authorization: Bearer cg_live_…" \
-H "Content-Type: application/json" \
-d '{"my_domain": "example.com", "competitor_domains": ["a.com","b.com"]}'
# 2. Poll
curl https://crawlgraph.com/api/v1/gap-analysis/gap_a1b2c3 \
-H "Authorization: Bearer cg_live_…"Python (requests)
import os, time, requests
H = {"Authorization": f"Bearer {os.environ['CG_KEY']}"}
sub = requests.post(
"https://crawlgraph.com/api/v1/gap-analysis",
headers=H,
json={
"my_domain": "example.com",
"competitor_domains": ["a.com", "b.com"],
},
timeout=30,
).json()
job_id = sub["job_id"]
while True:
j = requests.get(
f"https://crawlgraph.com/api/v1/gap-analysis/{job_id}",
headers=H, timeout=30,
).json()
if j["status"] in ("completed", "failed"):
break
time.sleep(5)
print(j)Node (fetch)
const H = { Authorization: `Bearer ${process.env.CG_KEY}` };
const submit = await fetch("https://crawlgraph.com/api/v1/gap-analysis", {
method: "POST",
headers: { ...H, "Content-Type": "application/json" },
body: JSON.stringify({
my_domain: "example.com",
competitor_domains: ["a.com", "b.com"],
}),
}).then(r => r.json());
const jobId = submit.job_id;
let job;
do {
await new Promise(r => setTimeout(r, 5000));
job = await fetch(
`https://crawlgraph.com/api/v1/gap-analysis/${jobId}`,
{ headers: H },
).then(r => r.json());
} while (job.status !== "completed" && job.status !== "failed");
console.log(job);GET /api/v1/changes
/api/v1/changesQuarter-over-quarter diff for a domain. Returns referring domains added or lost between two CC releases. Counts as one call against the backlinks quota — same bucket as POST /api/v1/backlinks.
Today this endpoint is a stub. We have ingested only one Common Crawl quarter so there is nothing to diff against. The shape below is final; the response body just carries comparison_available: false until the next release lands (estimated July 2026).
Query parameters
- domain — optional; the target host (same shape as /backlinks).
- from — optional; older release id. Defaults to the previous quarter (which doesn't exist yet → stub).
- to — optional; newer release id. Defaults to the latest available release.
Response — stub (today)
{
"comparison_available": false,
"current_release": "cc-main-2026-jan-feb-mar",
"next_available_after": "cc-main-2026-apr-may-jun",
"next_available_estimate": "2026-07-15",
"message": "first delta will be available after the next quarterly Common Crawl release ingests"
}curl
curl "https://crawlgraph.com/api/v1/changes?domain=example.com" \
-H "Authorization: Bearer cg_live_…"6. Webhooks & changelog
Webhooks are coming in a future version. For now, polling the gap-analysis job endpoint is the only async pattern.
The current API is at v1. Breaking changes will land on /api/v2 with a deprecation window — your v1 integrations won't break overnight. Check back here for changelogs.
7. OpenAPI
For tooling integration (Postman, openapi-typescript, Insomnia, etc.) the OpenAPI 3.1 schema is served at /api/v1/openapi.json. It covers exactly the endpoints documented above and is regenerated on every deploy.