~/crawlgraph — zsh
$crawl stripe.com --top 5
github.io92follow
css-tricks.com88follow
lobste.rs86follow
algolia.com84follow
web.dev80follow

common crawl release schedule: when the next crawl drops

common crawl publishes a new web crawl every 1-2 months. the hyperlink graph follows about six weeks later. here's the full release history, the next expected drop, and why the cadence matters for backlink data freshness.

pete the seo wizard
May 2, 2026 · 5 min read · 950 words
sharexlinkedin

common crawl publishes a new web crawl every 1-2 months. the hyperlink graph derived from that crawl follows about six weeks later. if you're using common crawl as a data source — for research, ai training, or seo tools like crawlgraph — knowing the cadence tells you exactly how fresh your data is. here's the full release history, the next expected release, and the lag between warc and graph publication you should know about.

the cadence in one sentence

cc-main releases land roughly every 1-2 months. each release is named cc-main-YYYY-month-month-month when it spans several months of crawling. the corresponding hyperlink graph (used for domain-level backlink research) is published ~6 weeks later than the warc files. the most recent release as of writing this is cc-main-2026-jan-feb-mar, which is what crawlgraph currently queries.

release calendar (2024-2026)

full archive at commoncrawl.org/latest-crawl. the table below pulls the major recent releases relevant to backlink research.

releasewarc publishedgraph publishedpageshosts
cc-main-2024-30jul 2024sep 20243.0B40M
cc-main-2024-42oct 2024dec 20243.1B42M
cc-main-2025-08feb 2025apr 20253.2B43M
cc-main-2025-26jun 2025aug 20253.0B44M
cc-main-2025-44oct 2025dec 20253.1B45M
cc-main-2026-jan-feb-marmar 2026apr 20265.9B (composite)45.5M
we update this table within 48 hours of every release

the calendar is maintained against commoncrawl.github.io/cc-crawl-statistics. if you spot a release missing here that's already published upstream, email [email protected].

the warc-to-graph lag

the raw web crawl (warc files) is one product. the hyperlink graph derived from it is a separate product, and it always trails. the graph is the result of parsing every page in every warc, extracting outbound links, normalizing hostnames, and emitting two files: vertices.txt.gz (one row per domain, ~850 mb) and edges.txt.gz (one row per source-destination pair, ~16 gb).

that work takes time. expect the graph for a given crawl to land roughly 4-8 weeks after the warc files. if you're building backlink tooling, plan for a 6-week lag from "google sees the link" to "common crawl shows it in the graph."

the news crawl runs faster

common crawl also publishes a news crawl with a different cadence. warc files there publish within hours of being written — useful for time-sensitive research on news sites, but irrelevant for general-web backlink intelligence. the news crawl is decoupled because the monthly schedule was, in their words, "not well-adapted to news content."

when's the next release

based on the historical cadence, the next cc-main release should land around june-july 2026, with the corresponding hyperlink graph available in august 2026. crawlgraph will reindex within days of the graph publication, and you'll see the new release name in the dropdown on the homepage.

why the cadence matters for seo data freshness

every backlink tool runs on some upstream crawl. ahrefs and semrush run on their own proprietary crawls (theirs are continuous, not batched). crawlgraph runs on common crawl. that difference shows up most when a site is brand new — common crawl typically misses sites under ~30 days old, because the site wasn't live when the crawl was running.

for sites older than a quarter, the cadence is rarely the bottleneck. the top-50 referring domains are stable across releases. for fast-moving link-building campaigns where you need 15-minute granularity, common crawl isn't the right data source. for everything slower than that — the 95% of audits — it is.

on this site

when you query crawlgraph for a domain, the data comes from whatever release is shown in the dropdown. cold queries take 30-90 seconds (we read the parquet files directly); repeat queries are sub-100ms (cached). see how the underlying graph works and five free ways to find backlinks.

faq

how often does common crawl release new data?

every 1-2 months. each release is named cc-main-YYYY-MM or cc-main-YYYY-month-month-month for composite releases. the corresponding hyperlink graph follows ~6 weeks later.

when will the next common crawl release be?

based on the historical cadence, the next cc-main release should land around june-july 2026 with the graph available in august 2026.

what's the difference between common crawl's main crawl and news crawl?

the main crawl runs every 1-2 months and covers the general web. the news crawl publishes warc files within hours of being written and only covers news sites. the news crawl was decoupled because the monthly main-crawl cadence is too slow for news content.

how long after a crawl is the hyperlink graph available?

roughly 4-8 weeks. the graph requires parsing every page in the crawl, extracting outbound links, and normalizing the result into vertices and edges files. expect a 6-week lag as the rule of thumb.

ahrefs · backlinkslocked
upgrade required · $129/mo
crawlgraph · live $99 once
G
github.io92
C
css-tricks.com88
L
lobste.rs86
A
algolia.com84
W
web.dev80
same data · one-time
$99$129/moonce
unlock the data →
stripe checkout · instant access
methodology#common crawl#methodology#release schedule
sharexlinkedin
pete the seo wizard
author

writes the queries we run internally. ships one tactical post a week.

the dispatch
one post a week.

+ a free domain audit when you sign up.