common crawl release schedule: when the next crawl drops

common crawl publishes a new web crawl every 1-2 months. the hyperlink graph derived from that crawl follows about six weeks later. if you're using common crawl as a data source — for research, ai training, or seo tools like crawlgraph — knowing the cadence tells you exactly how fresh your data is. here's the full release history, the next expected release, and the lag between warc and graph publication you should know about.

the cadence in one sentence

cc-main releases land roughly every 1-2 months. each release is named cc-main-YYYY-month-month-month when it spans several months of crawling. the corresponding hyperlink graph (used for domain-level backlink research) is published ~6 weeks later than the warc files. the most recent release as of writing this is cc-main-2026-jan-feb-mar, which is what crawlgraph currently queries.

release calendar (2024-2026)

full archive at commoncrawl.org/latest-crawl. the table below pulls the major recent releases relevant to backlink research.

release	warc published	graph published	pages	hosts
cc-main-2024-30	jul 2024	sep 2024	3.0B	40M
cc-main-2024-42	oct 2024	dec 2024	3.1B	42M
cc-main-2025-08	feb 2025	apr 2025	3.2B	43M
cc-main-2025-26	jun 2025	aug 2025	3.0B	44M
cc-main-2025-44	oct 2025	dec 2025	3.1B	45M
cc-main-2026-jan-feb-mar	mar 2026	apr 2026	5.9B (composite)	45.5M

we update this table within 48 hours of every release

the calendar is maintained against commoncrawl.github.io/cc-crawl-statistics. if you spot a release missing here that's already published upstream, email [email protected].

the warc-to-graph lag

the raw web crawl (warc files) is one product. the hyperlink graph derived from it is a separate product, and it always trails. the graph is the result of parsing every page in every warc, extracting outbound links, normalizing hostnames, and emitting two files: vertices.txt.gz (one row per domain, ~850 mb) and edges.txt.gz (one row per source-destination pair, ~16 gb).

that work takes time. expect the graph for a given crawl to land roughly 4-8 weeks after the warc files. if you're building backlink tooling, plan for a 6-week lag from "google sees the link" to "common crawl shows it in the graph."

the news crawl runs faster

common crawl also publishes a news crawl with a different cadence. warc files there publish within hours of being written — useful for time-sensitive research on news sites, but irrelevant for general-web backlink intelligence. the news crawl is decoupled because the monthly schedule was, in their words, "not well-adapted to news content."

when's the next release

based on the historical cadence, the next cc-main release should land around june-july 2026, with the corresponding hyperlink graph available in august 2026. crawlgraph will reindex within days of the graph publication, and you'll see the new release name in the dropdown on the homepage.

why the cadence matters for seo data freshness

every backlink tool runs on some upstream crawl. ahrefs and semrush run on their own proprietary crawls (theirs are continuous, not batched). crawlgraph runs on common crawl. that difference shows up most when a site is brand new — common crawl typically misses sites under ~30 days old, because the site wasn't live when the crawl was running.

for sites older than a quarter, the cadence is rarely the bottleneck. the top-50 referring domains are stable across releases. for fast-moving link-building campaigns where you need 15-minute granularity, common crawl isn't the right data source. for everything slower than that — the 95% of audits — it is.

on this site

when you query crawlgraph for a domain, the data comes from whatever release is shown in the dropdown. cold queries take 30-90 seconds (we read the parquet files directly); repeat queries are sub-100ms (cached). see how the underlying graph works and five free ways to find backlinks.

faq

how often does common crawl release new data?

every 1-2 months. each release is named cc-main-YYYY-MM or cc-main-YYYY-month-month-month for composite releases. the corresponding hyperlink graph follows ~6 weeks later.

when will the next common crawl release be?

based on the historical cadence, the next cc-main release should land around june-july 2026 with the graph available in august 2026.

what's the difference between common crawl's main crawl and news crawl?

the main crawl runs every 1-2 months and covers the general web. the news crawl publishes warc files within hours of being written and only covers news sites. the news crawl was decoupled because the monthly main-crawl cadence is too slow for news content.

how long after a crawl is the hyperlink graph available?

roughly 4-8 weeks. the graph requires parsing every page in the crawl, extracting outbound links, and normalizing the result into vertices and edges files. expect a 6-week lag as the rule of thumb.

the cadence in one sentence

release calendar (2024-2026)

the warc-to-graph lag

the news crawl runs faster

when's the next release

why the cadence matters for seo data freshness

faq

how often does common crawl release new data?

when will the next common crawl release be?

what's the difference between common crawl's main crawl and news crawl?

how long after a crawl is the hyperlink graph available?

common crawl, explained for SEOs