Digitacy is an avenue for ardent individuals to educate and engage in discussions related to digital marketing, eCommerce, analytics and design.

Sitemaps and Crawl Errors on Google Search Console

Faisal Mumtaz

Your content is as good as writing a secret journal in your diary that is kept under your bed, if it is not being indexed by Google. Although Google is pretty is at crawling the internet, newer or unpopular websites still need to point Google in the right direction or it might take ages for it to find out that you exist.

The easiest way to find out if a URL exists on Google is to use "site:" command on Google search.

Example - site:walmart.com

This query is populate all indexed pages from Walmart.com, if a page does not show up here then it is either excluded from being indexed by robots.txt file or Google has not crawled it yet.

If you're starting from scratch then the first thing you want to do is create an account on Google Search Console and submit your sitemap. A sitemap is an XML file containing all pages on your website, it allows you to let Google know about all the URLs you want it to crawl and index.

You can lookup a URL within search console as well to check its crawl status. Paste it on the top search bar on GSC and it will show you whether your page is on Google or not. If it's not on Google, you can click on request indexing and Google bot will crawl the page in a couple of hours or days (depending on your crawl budget).

Most platforms such as WordPress, Squarespace, Wix and Shopify generate sitemaps automatically and it can be found in your root folder. Lookup yourdomain.com/sitemap.xml to confirm if it exists. If it doesn't then you will have to generate it using an online tool or crawler such as Screaming frog SEO spider.

For newer websites, if can take a couple of days for your web pages to show up on Google after submitting the sitemap. Look at coverage under index in Google Search Console to find out if all submitted URLs are valid and if any errors have popped up.

Here are the different error's that can show up in GSC

  • Server error (5xx) - Server is not accessible
  • Redirect Error - Redirecting to a page that doesn't exist
  • Submitted URL seems to be a soft 404 - Page returns 200(found) but looks like 404 to Google bot because it has very little content on it
  • Submitted URL marked noindex - Page exists on sitemap but it has noindex directive which instructs Google bot to exclude the page
  • Submitted URL blocked by robots.txt - Page exists on sitemap but it is prevented from being indexed by robots.txt file
  • Submitted URL returns unauthorized request (401) - Password protected page on sitemap
  • Submitted URL has crawl issue - Resources on the page such as images, JavaScript or CSS could not be loaded while Google bot tried to crawl the page
  • Submitted URL not found (404) - Page not found

Most of these errors are straightforward and easy to fix. After fixing each error you need to click on validate fix and Google bot will re-crawl the page and remove the error from your GSC account after validating that the issue has been resolved.

Share twitter/ facebook/ copy link