a backlink gap analysis answers one question: which sites link to my competitors but not to me? the domains in that overlap already link to things in your category, already publish about your topic, and already decided your kind of site is worth a mention. that makes them the most pre-qualified outreach list you can build, and it is the single highest-roi move in link building.
most link prospecting starts cold: you find a site, guess whether it takes external links, guess whether your topic fits, then pitch into the dark. a gap analysis flips that. you start from sites that have already said yes to a competitor. the only thing you are changing is the target. this guide walks through what a gap analysis actually is, why it outperforms cold prospecting, and a concrete free process to run one end to end.
what a backlink gap analysis actually is
the mechanics are simple set math. take the referring domains of two or three competitors, take your own referring domains, and compute the set of domains that link to the competitors but not to you. that difference is the gap. each domain in it is a site that links into your category and has, so far, skipped you.
the reason this matters is qualification. a referring domain that points at a direct competitor has cleared three filters that a cold prospect has not: it accepts external links, it publishes about your topic, and it considers a site like yours reference-worthy. you are not asking “will this site ever link out?” you already know it does. you are asking the much easier question: “will it also link to me?”
run the gap on referring domains, not individual backlinks. one site can give a competitor forty links, but for prospecting it is still one outreach target. counting raw links inflates the list and buries the real opportunities. dedupe to the domain first.
why it is the highest-roi link move
cold link prospecting has a brutal funnel. you research a hundred sites, maybe thirty are relevant, maybe ten take external links, maybe two reply, maybe one links. gap-sourced prospects skip the first two filters entirely because the competitor backlink already proved them. the reply rate and the conversion rate both climb, and your research time per won link drops hard.
there is a second-order benefit. domains that link to multiple competitors are the strongest signal of all. a site linking to three of your rivals is almost certainly a roundup, a resource page, a category directory, or a journalist who covers the space. those are repeatable, high-intent targets, and they tend to be the ones worth pitching first.
| cold prospecting | gap analysis | |
|---|---|---|
| takes external links? | unknown | proven yes |
| covers your topic? | guess | proven yes |
| links to your category? | unknown | proven yes |
| research time per prospect | high | low |
| typical reply rate | low | higher |
| best for | net-new niches | established categories |
the gap column is not magic. it does not work if you have no real competitors, or if your category is so new that nobody links to anyone. but for any established space, it is the first list you should build and the last one you should run out of.
seen enough? run it on your site free.
5 backlinks free. $99 once for unlimited.
the free step-by-step process
here is the whole process. every step can be done for free. the only thing that varies is how much of the work you do by hand versus how much a tool does for you.
step 1: pick 2 to 3 real competitors
not aspirational competitors. real ones, ranking for the same queries you want, at roughly your size or a step above. pick two or three, not ten. the overlap math gets noisier with more competitors, and three is enough to surface the domains that link across the whole category. choose sites whose audience genuinely overlaps with yours, not just any big name in the industry.
step 2: pull their referring domains
for each competitor, get the list of unique referring domains. you cannot do this in google search console or bing webmaster tools, because those only show properties you have verified. you need a tool that indexes the whole web. pull your own referring domains the same way so you have all four lists (or three competitors plus yourself) in hand.
step 3: compute the overlap that excludes you
this is the set difference. take the union of the competitor referring domains, then subtract every domain that already links to you. what is left is the raw gap. if you are doing this by hand, drop each list into a spreadsheet column, dedupe, and use a lookup to flag the domains that appear for a competitor but never for you.
step 4: filter platform and cdn noise
the raw gap is dirty. it is full of domains nobody can realistically pitch or that are not editorial links at all. strip these before you rank anything:
- platforms and ugc:
youtube.com,facebook.com,twitter.com,linkedin.com,medium.com - cdns and asset hosts:
cloudfront.net,akamai,gstatic.com, image and font hosts - shorteners and trackers:
bit.ly, utm-laden redirect domains, analytics endpoints - your competitors themselves and obvious aggregator scrapers
these show up for almost every site, so they tell you nothing about the category and they are not realistic outreach targets. cutting them is usually 30 to 50 percent of the raw list, and it makes the next step meaningful.
step 5: rank by authority
now sort the survivors by an authority metric so you pitch the valuable domains first. rank descending and pay special attention to two things: domains with high authority, and domains that appear for more than one competitor. a high-authority domain linking to two or three of your rivals is the top of your list, every time.
step 6: pitch
you now have a ranked, de-noised list of sites that already link to your category. the pitch writes itself because you know why they linked: they did a roundup, they keep a resource page, they reviewed a tool like yours, a writer there covers the beat. reference the competitor link honestly (“i noticed you linked to rival-a in your guide to X”), explain what you add that is not already covered, and ask for the specific placement. start at the top of the authority-ranked list and work down.
a domain in the gap that links to two or three competitors is almost always a roundup, resource page, or category directory. those convert better than one-off editorial mentions and they are repeatable. sort by number of competitors matched, then by authority, and pitch from the top.
doing the whole thing free
the bottleneck in the manual version is step 2: you cannot pull a competitor's referring domains from gsc, and the recurring tools charge $129 to $140 a month for it. crawlgraph runs the gap analysis for free on the common crawl webgraph (4.4 billion edges across roughly 120 million domains), and it ships a dedicated gap-analysis plus outreach-target finder that does steps 3 through 5 for you: it computes the overlap, filters the platform and cdn noise automatically, and ranks the survivors by cg_authority (a 0 to 100 score). the free tier runs the whole gap workflow and shows your top 5 gaps right in the browser at the gap-analysis tool; the $99-once lifetime tier unlocks the full ranked list, csv export, and the api.
if you have the lifetime tier, the same job is one api call. submit your domain and your competitor domains, and the response comes back with the ranked, de-noised gap already computed:
curl -X POST https://crawlgraph.com/api/v1/gap-analysis \
-H "Authorization: Bearer cg_live_…" \
-H "Content-Type: application/json" \
-d '{
"my_domain": "yoursite.com",
"competitor_domains": ["rival-a.com", "rival-b.com", "rival-c.com"]
}'the result carries the gap domains with their authority score and the list of competitors each one was found on, so you can sort straight into an outreach sheet:
{
"job_id": "gap_8c2f…",
"status": "completed",
"result": {
"total_gaps": 1284,
"gaps": [
{ "linking_domain": "industry-roundup.io", "cg_authority": 71, "found_on": ["rival-a.com", "rival-b.com"] },
{ "linking_domain": "niche-review.com", "cg_authority": 58, "found_on": ["rival-a.com", "rival-c.com"] }
]
}
}the lifetime tier includes 50 gap jobs per month and 1,000 backlink lookups per month, which is more than enough for one person running regular campaigns. the same job is available to ai clients through the hosted mcp server at crawlgraph.com/mcp (or npx -y crawlgraph-mcp locally for claude, cursor, and cline), so you can ask for a gap list in plain language. full reference lives at /docs/api.
the index is built on common crawl, which is smaller than ahrefs and refreshed quarterly rather than daily (the latest snapshot covers jan to mar 2026). for gap analysis that is rarely a problem, because the domains that link across a whole category are stable and do not churn week to week. but if you need today's links reflected today, a quarterly open dataset is not the right source. it is free and fully transparent; it is not real-time.
common mistakes to avoid
- too many competitors. ten competitors produces a huge, shallow overlap. two or three close rivals gives a sharper, more actionable list.
- ranking before filtering. if you sort by authority before stripping platform and cdn noise, the top of your list is youtube and cloudfront. filter first, rank second.
- counting backlinks instead of domains. dedupe to the referring domain. one outreach email goes to one site, no matter how many links it gave a competitor.
- ignoring the multi-competitor signal. the domains that link to several rivals are your best prospects. do not let them get lost in an authority-only sort.
- expecting same-week data. a link a competitor earned last week will not appear until the next common crawl snapshot lands. gap targets are stable, so this rarely bites, but it is worth knowing.
conclusion
a backlink gap analysis is the cheapest way to turn cold link prospecting into warm prospecting. pick two or three real competitors, pull their referring domains, subtract your own, strip the platform and cdn noise, rank what is left by authority, and pitch the multi-competitor domains first. every step can be done for free, by hand or with a tool that does the set math and the noise filtering for you.
the work is finding the overlap; the value is in pitching it well. start by running a domain through the homepage, see what the index has, and build your first gap list from there. for the data background, read common crawl, explained for seos and the broader free backlink guide.
writes the queries we run internally. ships one tactical post a week.
+ a free domain audit when you sign up.