common crawl publishes a new web crawl every 1-2 months. the hyperlink graph derived from that crawl follows about six weeks later. if you're using common crawl as a data source — for research, ai training, or seo tools like crawlgraph — knowing the cadence tells you exactly how fresh your data is. here's the full release history, the next expected release, and the lag between warc and graph publication you should know about.
the cadence in one sentence
cc-main releases land roughly every 1-2 months. each release is named cc-main-YYYY-month-month-month when it spans several months of crawling. the corresponding hyperlink graph (used for domain-level backlink research) is published ~6 weeks later than the warc files. the most recent release as of writing this is cc-main-2026-jan-feb-mar, which is what crawlgraph currently queries.
release calendar (2024-2026)
full archive at commoncrawl.org/latest-crawl. the table below pulls the major recent releases relevant to backlink research.
| release | warc published | graph published | pages | hosts |
|---|---|---|---|---|
| cc-main-2024-30 | jul 2024 | sep 2024 | 3.0B | 40M |
| cc-main-2024-42 | oct 2024 | dec 2024 | 3.1B | 42M |
| cc-main-2025-08 | feb 2025 | apr 2025 | 3.2B | 43M |
| cc-main-2025-26 | jun 2025 | aug 2025 | 3.0B | 44M |
| cc-main-2025-44 | oct 2025 | dec 2025 | 3.1B | 45M |
| cc-main-2026-jan-feb-mar | mar 2026 | apr 2026 | 5.9B (composite) | 45.5M |
the calendar is maintained against commoncrawl.github.io/cc-crawl-statistics. if you spot a release missing here that's already published upstream, email [email protected].
the warc-to-graph lag
the raw web crawl (warc files) is one product. the hyperlink graph derived from it is a separate product, and it always trails. the graph is the result of parsing every page in every warc, extracting outbound links, normalizing hostnames, and emitting two files: vertices.txt.gz (one row per domain, ~850 mb) and edges.txt.gz (one row per source-destination pair, ~16 gb).
that work takes time. expect the graph for a given crawl to land roughly 4-8 weeks after the warc files. if you're building backlink tooling, plan for a 6-week lag from "google sees the link" to "common crawl shows it in the graph."
the news crawl runs faster
common crawl also publishes a news crawl with a different cadence. warc files there publish within hours of being written — useful for time-sensitive research on news sites, but irrelevant for general-web backlink intelligence. the news crawl is decoupled because the monthly schedule was, in their words, "not well-adapted to news content."
when's the next release
based on the historical cadence, the next cc-main release should land around june-july 2026, with the corresponding hyperlink graph available in august 2026. crawlgraph will reindex within days of the graph publication, and you'll see the new release name in the dropdown on the homepage.
why the cadence matters for seo data freshness
every backlink tool runs on some upstream crawl. ahrefs and semrush run on their own proprietary crawls (theirs are continuous, not batched). crawlgraph runs on common crawl. that difference shows up most when a site is brand new — common crawl typically misses sites under ~30 days old, because the site wasn't live when the crawl was running.
for sites older than a quarter, the cadence is rarely the bottleneck. the top-50 referring domains are stable across releases. for fast-moving link-building campaigns where you need 15-minute granularity, common crawl isn't the right data source. for everything slower than that — the 95% of audits — it is.
when you query crawlgraph for a domain, the data comes from whatever release is shown in the dropdown. cold queries take 30-90 seconds (we read the parquet files directly); repeat queries are sub-100ms (cached). see how the underlying graph works and five free ways to find backlinks.
faq
how often does common crawl release new data?
every 1-2 months. each release is named cc-main-YYYY-MM or cc-main-YYYY-month-month-month for composite releases. the corresponding hyperlink graph follows ~6 weeks later.
when will the next common crawl release be?
based on the historical cadence, the next cc-main release should land around june-july 2026 with the graph available in august 2026.
what's the difference between common crawl's main crawl and news crawl?
the main crawl runs every 1-2 months and covers the general web. the news crawl publishes warc files within hours of being written and only covers news sites. the news crawl was decoupled because the monthly main-crawl cadence is too slow for news content.
how long after a crawl is the hyperlink graph available?
roughly 4-8 weeks. the graph requires parsing every page in the crawl, extracting outbound links, and normalizing the result into vertices and edges files. expect a 6-week lag as the rule of thumb.
writes the queries we run internally. ships one tactical post a week.
+ a free domain audit when you sign up.