Crawl budget is an important topic to understand when it comes to search engine optimization.
While it is not something related to running your site, it does influence how Google crawls your site – which, in turn, influences where and how your site ranks.
But what is crawl budget, and how does crawl budget optimization benefit you compared to simply leaving the whole concept alone?
Table of Contents
What is Crawl Budget?
A search engine like Google does not crawl pages equally and instantly. It could take days or even weeks for your site to be fully crawled, and not all new pages are going to be immediately indexed.
Crawl budget refers to the number of pages that Google will crawl on a site daily. While it is not a hard number and can vary quite heavily from one day to another, this is dependent on a range of factors that change your site’s total crawl budget.
Things like site errors, site size, and the number of links your site receives can influence your total crawl budget, which changes how many pages Google will index every day. While this might not sound all that important, the difference between a day-one indexing and a two-day wait can be a big one.
What are Crawlers?
Crawlers are the bots that let search engines “crawl” a site.
The crawling process focuses on indexing pages, with indexed pages being approved to appear in search results. In a sense, crawlers make your site visible to the search platforms that they come from.
Crawlers are important because they are the main way that your site is discovered by search platforms. Of course, crawling takes server resources, and the overall crawl demand across the internet is constantly high, which means that search platforms install a crawl rate limit.
While different search engine crawlers work in slightly different ways, the overall result is the same: a page that is crawled gets indexed. A better site means more crawl budget, allowing more pages to get indexed.
Why Would You Need to Optimize Your Crawl Budget?
Crawl budget becomes a problem when your overall crawl budget is lower than the number of pages on your site.
While it might not sound bad to wait a day for the other half of your site to index, you do not get control over which pages from which half, and a day can be more than enough time for a competitor to overtake you.
If you are not indexed quickly, then your supposed-to-be-indexed pages have no value. For example, a blog post about something current and topical might only be visible two or three days after it should have been.
If Google only crawls 2,000 pages on your site per day, but you have upwards of 200,000 pages overall, it could take 100 days (or even longer) to fully crawl your website.
Since you can’t choose to let Google crawl specific pages, the crawl process might completely miss important pages for most of that period.
In simple terms, a crawl budget is not always going to suit your site, and you do not get control over how search engines spend that crawl budget. This means that optimizing your site around your crawl budget is vital for getting the best results possible.
How to Understand Crawl Activity
It is a good idea to know which pages Google search crawlers are viewing on your site. Usually, you can check your site’s server logs for crawl activity – there are various tools to help identify Google crawl bot activity within your site’s logs.
You can sometimes also see this first-hand after the crawling process. If new or updated pages are not getting any organic traffic from search results, it is likely that the page is not indexed yet, so it literally cannot appear in search results.
Note that crawl frequency and budget are not necessarily an indicator of quality. The exact algorithm that Google uses for its crawling process is not fully known, just like most of its algorithms – while there are ways to optimize your site to better suit being crawled, slow crawling does not mean a bad site.
What Mistakes Impact Crawl Budget and Crawl Rate Limit?
In general, most factors affecting crawl budget are either related to server resources or time spent during the crawling. This means that it is entirely possible to be accidentally “wasting” crawl budget by having your site optimized in the wrong way.
While the full algorithm for determining crawl demand is not publicly known, there are some things that simply cause more problems than they solve.
In many cases, this includes creating more pages in unnecessary ways or making it difficult for crawlers to understand where they are supposed to go.
Your site’s overall authority and backlink profile have an impact on how pages are crawled. In general, a more “valuable” site will have more crawl demand since it is more useful to the users that search engines are trying to benefit.
Authority is increased by having a lot of good-quality inbound links from major sites, meaning that link building can indirectly boost crawl demand. Link equity shared through both internal and external links can matter, providing benefits in terms of crawling.
Note that this also applies if crawlers find your site through a link. If a URL to your site is discovered from an outside source, the crawler will follow it, providing a slight benefit to the crawling process.
Sitemaps tell search engines the layout of your site, giving them an understanding of how your website is structured without them having to discover that themselves.
This makes it much easier to direct them to the important parts of your website rather than having to just hope that they will find them.
If XML sitemaps are built poorly or point to pages that are non-indexable or literally do not exist anymore, then they might be sending crawlers to the wrong places or getting meant-to-be-ignored pages crawled.
The architecture of your site is the way that it is all laid out, both for human users and crawler bots.
This means your entire internal linking structure and the way that internal links allow users to navigate through the website, jumping from the home page to other pages and so on.
Crawlers need to follow links to explore your site. This means that a well-structured site can be crawled efficiently, while one with a lot of unnecessary links or a messy layout may become harder for bots to crawl properly.
Most crawlers begin at the homepage (which is usually the site URL with no extra URL parameters or sub-pages included) and then work their way “down” into the site.
This means that they can only find pages that are actually linked from another accessible page – if a whole bunch of product pages are only linked to by a category that users can’t navigate to directly, none of those pages get crawled.
Session IDs are used to track user preferences and handle a range of other tasks on larger sites. Unfortunately, both session and tracking IDs can sometimes involve the server creating multiple versions of the same page, which effectively produces duplicate content.
While this means that crawlers have to check more pages, most search engines recognize that these pages are irrelevant and will not index them.
Even so, it can waste crawl budget and might even impact your site’s overall trust and ranking power by appearing like duplicate content to search platforms.
Faceted navigation – where users can use filters to sort product pages – can be a huge problem for your internal linking structure.
Because of how these systems work, some will count as entirely different URLs, which can mean that Google Search crawlers will treat each filter as a different page.
This bloats your number of pages, leads to a lot of alternate URLs that all focus on the same content, and means that many pages are basically duplicate content of other pages within the filter system.
These crawl budget issues can be prevented by hiding these filter pages from crawlers or using other methods to make sure that each filter option does not result in multiple pages.
Otherwise, your site can gain thousands of other “pages” that are all variations on the same search system.
Google will follow all links it finds unless told otherwise, regardless of whether the links actually go anywhere.
Broken links might only take up one more spot in your crawl budget, but the more broken links you have, the more crawl budget you are wasting.
This can be especially bad if you have internal links pointing to pages that do not exist or were re-named. A crawler may try to follow several links to the missing pages and waste even more budget, leaving other parts of your site un-crawled.
Redirect chains are when a crawler needs to follow multiple redirects in a row to reach specific pieces of content. A redirect could be as fast as mere milliseconds or as long as several full seconds, and that slows Google search crawlers down dramatically.
Even worse is the infinite loops, where redirects accidentally dump the crawlers back at the beginning of the chain. Since the crawler can’t find the destination page it was told about, it will eventually realize the loop and (usually) abandon your site, even if there are still pages left to crawl.
While site speed does not directly influence your crawl budget, it still has a long-term impact if search engines misunderstand why the speed is slow.
Google wants to avoid overloading your servers, and if the crawler notices incredibly slow site speeds, it might reduce your crawl budget to match.
This means that a slower site’s crawl budget may be reduced purely because it is a slower site, meaning fewer crawled pages with every crawler visit.
This happens even if the page itself is what is slow, rather than the server the site is hosted on.
How to Optimize Your Site’s Crawl Budget
If you want to gain a higher crawl budget and ensure that your site is crawled properly, then there are multiple ways to optimize your website for a better crawling process.
How you approach the issue depends on the changes you are willing to make and how in-depth you want to go.
Remember, as mentioned before, the crawl budget is impacted by a range of factors. Some sites will simply have a lower crawl budget than others for reasons that only the algorithm knows.
Optimizing your crawl budget is important, but always remember that there is probably an upper limit to how far you can push it before you need to make big changes to your site.
As long as you are happy with the crawl budget you end up achieving, that is what matters most.
Create a New XML Sitemap
Generating a new XML sitemap is an easy way to send crawlers to all of the right web addresses.
A good XML sitemap is important for showing crawlers where all of the pages on your site actually are, which makes it important to update it with new URLs each time major new pages are added.
There are tools that website owners can use to auto-generate a new XML sitemap, allowing for quick replacement of the old map.
Without a good site map to break down the URLs Googlebot can find, the crawler will have to basically wander through your site with no end goal.
Check Page Availability
Crawlers rely on the return codes that servers get when they try to visit a page – for example, 200 marks an “ok,” whereas 301 designates that a page has “moved permanently” to a new location via a redirect. 304, meanwhile, prevents a page from being re-crawled by re-using data from the last crawl.
If a page can’t be accessed, it can’t be crawled. Some of these are obvious (such as 403, “forbidden,” which requires login details), while others are errors (the infamous 404 “not found”) or something that might be beyond your control (451 “unavailable for legal reasons”).
Make sure to check that each page works and that the servers hosting your site are not suffering any issues. If any return codes beyond 200 and 301 are used, Google usually will not crawl the page, even if the content is technically still accessible.
Perform Website Maintenance
One of the biggest factors that can impact your crawl limit is your website’s overall performance.
For example, as mentioned above, a poorly-optimized site that loads very slowly can drag out the crawling process.
This might result in you getting a lower crawl budget despite not actually doing anything wrong in terms of site architecture.
Sites with broken internal or external links or other elements that get in the way of crawler bots might also negatively impact the end result and hacked or soft-error pages are a known factor in having your crawl budget reduced unexpectedly.
Basically, if you are not keeping your site in good condition, you can expect overall crawl demand to drop dramatically.
Google cares about good-quality sites and does not want to rank poorly maintained websites very high, which often translates into worse crawl stats overall.
Improve Site Architecture
Better site architecture is both a technical SEO focus and something that influences the crawl budget. If your site’s layout makes sense and keeps all pages accessible, then crawlers are going to find the important pages properly and will know how to get around the website effectively.
On the other hand, a nonsensical layout with broken destination links and/or recursive loops makes it much harder for a crawler to get where it needs to go. Beyond that, this would also harm your site’s overall SEO value since internal links may not be as relevant.
Remember that pages need to be accessible via a link to be crawled. If a page is accessed only through external links pointing there through promotional ads or another site, that page might never get crawled, meaning that it is not indexed and will not appear in search results.
Robots.txt is a text file within sites that allows you to manage crawling traffic. This lets you effectively control how crawl requests happen and where crawlers spend their crawl budgets – mostly by telling them where they can and can’t go.
This also allows you to set delays before pages are crawled – for example, if you want the bot to wait for all assets to load before it crawls the page.
Doing this makes it a lot easier to maximize the benefits you get from your full crawl budget rather than having to worry about crawl budget being wasted.
What Else to Know About Crawl Budget Optimization
While this might be a lot to take in if you have not really thought about crawlers before, this is only scratching the surface of how crawler bots are handled. Understanding crawl demand is not as easy as it sounds, and no two sites will have to tackle it in the same way.
Use Tools like Google Search Console
Tools like Google Search Console are invaluable for telling you how Google is actually interacting with your site.
There are also similar tools available for other search engines, such as Bing Webmaster Tools – all of which are important for understanding your website health and expected crawl rate.
While Google Search Console will not necessarily tell you the crawl rate of each page individually, the Crawl Stats Report feature and information on ranking factor performance can be very useful.
Even just being able to break down site visitor log files can help you understand where crawlers are spending their budget.
Know Where To Find Your Crawl Stats
Google’s Crawl Stats Report makes it easy to see how Google search crawlers and other search engines have explored your website. Available through Google Search Console, the Crawl Stats Report is invaluable for seeing what kind of crawl requests and attempts have been made.
This can also help you get an understanding of your crawl health and how you should use your crawl budget wisely in the future. Having access to proper reports and server logs makes it much easier to tell what the next steps are, something that Google Search Console is perfect for.
Don’t Get Too Focused on Crawlers
Improving your crawl rate means nothing if your site is a mess of incorrect links, poor technical SEO, and awful optimization. There is no point in trying to increase your crawl limit if you have nothing to offer users, so be sure to focus on more than just the crawl rate when it comes to improving your site.
Actually, trying to boost your site’s quality through both off-site and technical SEO can really matter, especially if it removes major problems like server errors. Even just making sure that your site responds quickly can be enough to bump up your crawl limit.
Tools like Google Analytics, Bing Webmaster Tools, and Google Search Console are all very useful sources of data. It is pointless to worry about crawl budget if you do not even have any log files or Search Console reports to use as a reference point.
Make sure that you have actual information to work with. If your site slows down or crawlers begin to encounter server errors, make sure you can figure out why, whether that means using Google Search Console or digging into the code of your site itself.
Remember that Crawl Demand Changes
Just like SEO rankings, crawl budget and demand can change based on a lot of factors, some of which you will have no control over. Sometimes, you will just have to accept that a strategy that worked in the past might need to be tweaked or re-attempted.
While you can get your site into a comfortable position in the short term, there might be algorithm changes or adjustments to your content that directly impact how much crawl demand your site has. This is not always your fault, just a consequence of how the algorithm works.
Overall, Crawler Budget Optimization is Good
While things like “don’t hide URLs” and “keep pages accessible” might be obvious, a large portion of site owners on the internet tend not to know things like using robots.txt or building an XML sitemap. All of the techniques and tools here are only the top level of how deep and in-depth this optimization can go.
However, whether you are making major sweeping changes or just tweaking a few things to try and make navigation easier for the crawler bots, there are no downsides to proper optimization. Even a small change can have a big impact on how your site is approached by crawlers.
Of course, you also have to remember that no two sites are the same. Something that might benefit your website will not necessarily benefit another website in the exact same way – you want to use strategies, techniques, and options that make sense for your site’s structure and size.
Whatever techniques you end up using, there are always more ways to optimize your website. Crawlers may only be one specific part of how search rankings work, but they are perhaps the most important if you care about getting your site noticed and properly listed in search results.
Searcharoo is a Link Building & Content Marketing company run by SEO’s based in the UK.
Our goal from the start has been to provide premium links and content services, at fair and affordable prices.