As already discussed in the previous chapter, Search engines are basically answering machines that are primarily there to answer the audience’s requests. The primary aim of any search engine is to arrange the online content in a systematic manner and take out the relevant content to answer users’ requests whenever called upon so.
However, your content first needs to be visible to the search engines to show it to the user. In this chapter, we’re going to explore what steps you must take for your content to be visible to search engines. Let’s explore!
A lot of people who just enter the world of SEO often wonder about the significance of different search engines. The majority of people are aware that Google dominates the market, but they still wonder how crucial is it to optimize for Bing, Yahoo, and other search engines as well. The truth is, even though there are more than 30 significant web search engines, the SEO industry mostly concentrates on Google.
Now you must wonder why this is so. The simplest response to this question is that the majority of people use Google to search the internet. To quantify this, more than 90% of web searches take place on Google, which is roughly 20 times more than Bing and Yahoo put together.
Now that we’ve established which search engine you need to focus on primarily, let’s figure out how a search engine works.
You need to understand these three main functions to understand how search engines work-
Crawling is the discovery process that search engines use to send out robots/crawlers to look for new and updated content. Your links are what allows the user to find your content no matter what format the content is in whether it’s a webpage, a PDF, an image, or a video.
First Googlebot fetches a few web pages to find new URLs and then follows the link to those results. The crawler is able to find new content by following this network of links, adding it to their Caffeine index, a sizable database of found URLs, to be later retrieved by searchers looking for information that the content on that URL is a good match for.
An index is a sizeable database of all the content that search engines have found and deemed suitable for serving the users. In short, all the relevant content is processed and stored in the index.
The main aim of the search is to answer the queries of users. They accomplish this by mining their index for stuff that is relevant to a user’s search, organizing that content, and then trying to answer the user’s question. Now, the term ranking refers to the process of arranging/ranking the search results according to their relevancy to the user’s query. In simple terms, the ranking of a page on search engines represents how relevant the search engine considers the page to a user’s query.
Search engine crawlers can't access parts or all of your websites if you advise search engines to take specific pages off of their index. You must have valid justifications for doing so. However, if you want your content to be visible on search engine results pages (SERPs), then you need to ensure it is crawlable and indexable. If you fail to do so then your content is effectively invisible.
Our aim with this chapter is to make you understand how search engines work, and how you can use crawling, indexing, and ranking in the favor of your websites, instead of against them.
How Your Pages Can Be Found by Search Engines.
As you've just learned, in order for your site to appear in the SERPs, it must be crawled and indexed. If you already have a website, it could be a good idea to see how many of your pages are included in the index. If Google is crawling and locating all of the pages you want it to and none of the pages you don't, this will provide you with important information about that.
The advanced search operator "site:yourdomain.com" can be used to examine your indexed pages. Enter "site:yourdomain.com" into the search bar on Google. Doing so would provide you with the results that Google has for your site in its index. You may get a fair idea of which pages on your site are indexed and how they currently appear in search results by looking at the number of results Google displays (see "About XX results" above).
For more accurate results, use and keep an eye on the Index Coverage data in Google Search Console. You can open a free Google Search Console account if you don't already have one. Using this tool, you may submit sitemaps for your website and keep track of the number of submitted pages that have actually been indexed by Google.
You might not appear in any of the search results for a number of reasons:
How You Can Instruct Search Engines to Crawl Your Website
If you used Google Search Console or the "site:domain.com" advanced search operator and discovered that some of your important pages are missing from the index and/or some of your unimportant pages have been mistakenly indexed, there are some optimizations you can apply to better instruct Googlebot how you want your web content to be crawled. By telling search engines how to crawl your website, you might be able to exert more control over what shows up there.
Most people think about making sure Google can find their important pages, but it's easy to forget that there are certainly some pages you don't want Googlebot to find. Examples include old URLs with little information, duplicate URLs (like sort-and-filter criteria for e-commerce), specific promo code pages, staging or test pages, and so on.
To restrict Googlebot from accessing specific pages and regions of your website, use robots.txt.
Robots.txt
Robots.txt files are found in the root directory of websites (for example, yourdomain.com/robots.txt), and they provide instructions for the precise areas of your site that search engines should and shouldn't crawl as well as the rate at which they should do so.
How robots.txt files are handled by Googlebot
Not every online robot adheres to robots.txt. People that intend harm (such as email address scrapers) create bots that disregard this protocol. In reality, some malicious individuals search for your private stuff using robots.txt files. Although it may make sense to prevent crawlers from accessing private pages, such as login and administration pages, in order to prevent them from appearing in the index, doing so also makes it easier for those with malicious intent to locate those URLs. Instead of adding these pages to your robots.txt file, it is preferable to NoIndex them and locks them behind a login form.
GSC URL Parameter Definition
By adding specific characteristics to URLs, some websites (most frequently those that deal with e-commerce) make the same content accessible on numerous alternative URLs. If you've ever done any online shopping, it's probable that you used filters to focus your search. For instance, you may look up "shoes" on Amazon and then narrow down your results by style, color, and size. The URL slightly varies after each refinement.
How does Google decide which URL to show to users when they search? Google does a decent job of determining the representative URL on its own, but you can utilize Google Search Console's URL Parameters function to specify exactly how you want Google to interpret your pages. By instructing Googlebot to "crawl no URLs with __ parameter" using this function, you're effectively requesting that it ignore this content, which could lead to the removal of certain pages from search results. If those criteria result in duplicate pages, that is what you want, but it is not ideal if you want those pages to be indexed.
Can Crawlers Access All of Your Key Content?
Let's learn about the optimizations that can help Googlebot identify your significant sites now that you are aware of some strategies for keeping search engine crawlers away from your unimportant material. Parts of your website may occasionally be crawled by search engines, while other pages or portions may be hidden for a variety of reasons. Making sure search engines can find all the information you want to be indexed, not just your homepage, is crucial.
Ask yourself these questions:
Search engines won't index pages that need users to log in, complete forms, or respond to surveys in order to access particular material. There is no way a crawler would log in.
Search forms cannot be used by robots. Some people think that if they add a search box to their website, search engines will be able to find anything that their website visitors look for.
Text that you want to be indexed shouldn't be displayed in non-text media types like photos, videos, GIFs, etc. There's no guarantee that search engines will be able to read and interpret it just yet, even if they are growing better at identifying photos. It's always ideal to incorporate text inside of your webpage's HTML markup.
A crawler must be led from page to page by a path of links on your own site, just as it must find your site via links from other sites. A page that doesn't have any links going to it from other pages but that you want search engines to find is essentially invisible. A common mistake that many websites do that makes it challenging for them to appear in search results is organizing their navigation in a way that is confusing to search engines.
Common navigational errors that may prevent crawlers from seeing your entire site include:
Because of this, it's crucial that your website features easy-to-navigate pages and useful URL folder structures.
Information architecture is the process of organizing and categorizing content on a website to improve user efficiency and findability. Users can easily navigate your website and find the information they require with the best information architecture.
The list of URLs on your website that crawlers can use to find and index your content is what a sitemap is exactly what it sounds like. Making a file that complies with Google's requirements and submitting it through Google Search Console is one of the simplest ways to be sure that Google is finding your most important pages. Although posting a sitemap won't replace effective site navigation, it can unquestionably help crawlers locate all of your important content.
Even if no other websites link to your website, you might be able to get it indexed by Google Search Console by uploading your XML sitemap. It's worth a shot, even if there is no guarantee that they will add a provided URL to their index.
When Crawlers Attempt to Access Your URLs, do they Encounter Errors?
Crawlers may run into issues while trying to crawl the URLs on your website. Visit the "Crawl Failures" report in Google Search Console to identify URLs where this might be occurring. You can see server errors and not found issues in this report. This and a wealth of other data, like crawl frequency, can be found in server log files, but because accessing and analyzing server log files is a more specialized technique, we won't get into it in detail in the Beginner's Guide.
It's critical to comprehend server problems and "not found" issues before you can take any action with the crawl error report that is significant.
These errors are caused by client faults, which signify the requested URL has incorrect syntax or cannot be processed. One of the most common 4xx problems is the "404 - not found" issue. These could occur as a result of a typographical error in the URL, the deletion of a page, or a botched redirect, to name a few. When they receive a 404 error, search engines cannot access a URL. A 404 page may cause users to lose patience and leave.
The 5xx errors are server errors, which means that the web page's host failed to process the request from the user or search engine to access the page. There is a tab specifically for these mistakes in the "Crawl Error" report from Google Search Console. These frequently happen because Googlebot gave up on a timed-out URL request. Consult Google's website for more details on fixing server connectivity issues.
The 301 (permanent) redirect is a thankfully effective technique to inform both searchers and search engines that your website has relocated. For example, suppose you switch a page from example.com/young-dogs/ to example.com/puppies/. Users and search engines alike require a bridge to connect the previous and new URLs. A 301 redirect is used on that bridge.
When do you implement a 301:
When you don’t implement a 301:
Avoid rerouting URLs to irrelevant pages, or URLs where the content of the old URL doesn't actually reside, as the 301 status code itself indicates that the page has been permanently transferred to a new address. A page that is already ranked for a query may lose its rating if you 301 it to a URL with different content since the content that made the page relevant for that specific query is no longer present. 301s are effective; move URLs carefully.
The 302-redirect option is also available, but it should only be used for short-term transfers and when passing link equity isn't a major issue. 302s resemble a road detour in certain ways. You are temporarily diverting traffic through a specific path, but that situation won't last.
The next step is to make sure your site can be indexed after making sure it is optimized for crawlability.
How Are Your Pages Interpreted and Stored by Search Engines?
Once you've established that your site has been crawled, the following step is to ensure it can be indexed. That's right; a search engine's ability to locate and crawl your website does not guarantee that it will be added to its index. The earlier section, where we discussed how search engines identify the pages on your site, covered crawling. The index contains the pages you have left unread. Once a crawler finds a page, the search engine presents it similarly to how a browser would. While doing this, the search engine evaluates the data on that page. All of that information is in the file's index.
Continue reading to find out more about indexing and how to ensure that your website appears in this crucial database.
Can I access a Google crawler's view of my sites?
Yes, the cached version of your page contains a snapshot of the most current Googlebot crawl. Google caches and visits websites on a sporadic basis. You can view your cached version of a page by choosing "Cached" from the drop-down menu next to the URL in the SERP.
You may also check your website's text-only version to see if your crucial material is being properly indexed and cached.
Are Any Pages Ever Taken Out of the Index?
Yes, you can take pages out of the index. Among the key explanations for why a URL might be taken down are:
You can use the URL Inspection tool to find out whether a page on your website that was previously indexed by Google is still visible, or you can utilize Fetch as Google, which offers a "Request Indexing" capability, to add specific URLs to the index. (Added bonus: The "render" option in GSC's "fetch" tool lets you check if there are any problems with the way Google understands your website.)
Describe How to Index Your Website for Search Engines.
Meta-directives for robots
You can provide search engines instructions on how to handle your web page by using meta directives (also known as "meta tags"). Such instructions as "don't index this page in search results" and "don't pass any link equity to any on-page links" can be sent to search engine crawlers. These directives are carried out by either the X-Robots-Tag in the HTTP header or the Robots Meta Tags, which are most frequently used, in the head> of your HTML pages.
1. Robots Meta Tag
In the HTML code of your website, you can utilize the robots meta tag. It may only exclude a few search engines or all of them. Here is a collection of the most used meta directives, along with instances of when you might use them.
Index/Noindex: Whether a page should be crawled and stored in a search engine's index for retrieval is indicated by the index/noindex tag. By choosing to utilize "noindex," you are telling crawlers that the page should not appear in search results. It is not necessary to use the "index" parameter because search engines by default believe they can index all pages.
Follow/Nofollow: Search engines can be instructed to follow or not follow links on a page using the follow/nofollow attribute. By selecting "Follow," bots will follow the links on your page and give link equity to those URLs. The search engines will not follow or pass any link equity to the links on the page if you choose to use the "nofollow" attribute. All pages are presumptively presumed to have the "follow" property by default.
noarchive: The noarchive directive prevents search engines from caching a copy of the page. Searchers can access viewable copies of every page that has been indexed by the engines by default through the cached link in the search results.
2. X-Robots-Tag
If you want to block search engines at scale, the x-robots tag is used within the HTTP header of your URL. It offers greater power and versatility than meta tags because it allows you to utilize regular expressions, block non-HTML files, and apply sitewide noindex tags.
You may avoid the main traps that might stop your key pages from being found by being aware of the various ways you can affect crawling and indexing.
How Do URLs Rank in Search Engines?
How do search engines make sure users who enter a query into the search field receive accurate results? The ranking is the practice of arranging search results from most relevant to least relevant for a given query.
Search engines use algorithms, which are a method or technique for retrieving and meaningfully ordering stored information, to evaluate relevance. To enhance the caliber of search results, these algorithms have undergone numerous revisions over time. Google, for instance, modifies its algorithms daily; some of these updates are tiny quality improvements, while others are core/broad algorithm updates implemented to address a particular problem, such as Penguin to address link spam. For a collection of both proven and unconfirmed Google modifications dating back to the year 2000, visit our Google Algorithm Change History.
Why is the algorithm updated so frequently? We do know that Google's goal when making algorithm tweaks is to increase overall search quality, despite the fact that Google doesn't always provide explanations as to why they do what they do. Google will typically respond to inquiries about algorithm upgrades by saying something along the lines of, "We're making quality updates all the time." This means that if your site suffered as a result of an algorithm change, you should compare it to Google's Quality Guidelines or Search Quality Rater Guidelines, both of which are very indicative of what search engines value.
The Goals of Search Engines
The goal of search engines has always been to deliver relevant responses to users’ queries in the most helpful manner. If that's the case, why does it seem like SEO has changed from earlier years?
Their initial comprehension of the language is fairly basic: "See Spot Run." They gradually develop a deeper understanding and learn semantics, which is the study of language meaning and the connections between words and sentences. With enough repetition, the student eventually gains fluency in the language and is able to respond to even ambiguous or incomplete queries.
When search engines were just beginning to understand human language, it was much easier to game the system by employing techniques and strategies that really go against quality standards. Use the example of "stuffing" to illustrate. If you want to raise your ranking for a particular term, like "funny jokes," you may add the word "funny jokes" repeatedly to your page and make it bold.
Instead of laughing at amusing jokes, individuals were instead inundated with irritating, difficult-to-read material as a result of this technique. Although it might have been successful in the past, search engines have never desired this.
The Role Links Play in SEO
When we talk about links, there are two alternative interpretations. While backlinks, also referred to as "inbound links," are links from other websites pointing to your website, internal links are links within your own website that point to other pages (on the same site).
Early on, in order to select how to rank search results, search engines needed help figuring out which URLs were more trustworthy than others. The number of links pointing to each site was counted to achieve this.
Backlinks function very similarly to actual Word-of-Mouth (WoM) referrals.
This is why PageRank was created. One of Google's founders, Larry Page, is honored with the name of a link analysis algorithm that is a part of Google's core algorithm. PageRank assesses the importance of a web page by analyzing the quality and number of links pointing to it. A website is thought to have more links if it is more important, pertinent, and reliable.
Having more natural backlinks from high-authority (trusted) websites increases your chances of appearing higher in search results.
The Function of Content in SEO
Links would be useless if they didn't point searchers somewhere. That thing is satisfied! Content is anything intended for searchers to consume, including text, images, video, and other types of content. If search engines are question-and-answer computers, then the content is how the engines provide those answers.
How do search engines determine which pages the searcher will find useful when there are hundreds of possible returns for a given query? Where your page will rank for a given query will be significantly influenced by how well the information on your page matches the goal of the query. In other words, does this website support the user's desired outcome and match their search terms?
Because the emphasis is on user satisfaction and task completion, there are no strict rules on how long your content should be, how many times it should contain a keyword, or what you should write in your header tags. Although all of those elements may have an impact on a website's search engine optimization, the users who will actually be reading the content should come first.
Even though there are now hundreds or even thousands of ranking factors, the top three have mostly remained the same: Rank Brain, on-page content (high-quality information that satisfies a searcher's purpose), and connections to your website, which act as third-party credibility signals.
What is RankBrain?
RankBrain is the name of the machine learning component of Google's core algorithm. Machine learning is a form of a computer program that continuously enhances its predictions over time through new observations and training data. In other words, it never stops learning, and because of this, search results should be improving over time.
You can guarantee that RankBrain will change those results, elevating the more relevant result and degrading the less relevant pages as a byproduct, for instance, if it discovers that a lower-ranking URL is giving users a better result than the higher-ranking URLs.
In what ways does this affect SEOs?
We must concentrate more than ever on satisfying searcher intent because Google will continue to employ RankBrain to highlight the most pertinent, helpful content. You've made a significant first step toward succeeding in a RankBrain environment if you give searchers who might land on your page the best information and experience possible.
Metrics of Engagement: Correlation, Cause, or Both?
Engagement indicators for Google rankings are probably a combination of correlation and causation. When we talk about engagement metrics, we're talking about data that demonstrates how visitors to your site who arrived via search results engage with it. This comprises items like:
Google’s Stance on This
Google has been explicit that they clearly use click data to adjust the SERP for specific queries, although never using the term "direct ranking signal."
It would seem that Google stops short of calling engagement metrics a "ranking signal" because those metrics are used to improve search quality, and the rank of specific URLs is just a byproduct of that. However, it would seem that engagement metrics are more than correlation because Google needs to maintain and improve search quality, making it seem inevitable that they are more than correlation.
The Change in Search Results
The phrase "10 blue links" was created to characterize the SERP's flat layout back when search engines had a lot of the sophistication they do today. Google would always deliver a page with 10 identically formatted organic results after a search. The coveted #1 position in this search environment represented the pinnacle of SEO. Then, though, something happened. Google started introducing SERP features, or additional forms for results, on their search result pages. These SERP properties include, among others: Paid Ads, Knowledge Panel, a Local (map) Pack, Featured Snippets, Sitelinks, and People Also Ask Boxes.
And Google continually adds new ones. Even "zero-result SERPs" were tested, where only one Knowledge Graph result was shown on the SERP and there were no results below it other than a "see more results" option. For two main reasons, the addition of these functionalities initially raised some eyebrows. One of the effects of several of these features was to further drive organic results down on the SERP. Since more searches are being addressed on the SERP directly, another outcome is that fewer searchers are clicking on the organic results.
So why would Google act in this way? Everything comes back to the search process. According to user behavior, some inquiries are better satiated by particular content types. Observe how the various SERP feature kinds correspond to the various query intent categories. There are a lot of factors that influence your content being ranked on SERPs, but you need to pay special attention to the structure if you want it to be crawled, indexed, and ranked.
We’re going to discuss it more in detail in Chapter 3.
eSearch Logix Technologies Pvt. Ltd.
Address (Delhi/NCR): 1st Floor, H-161, Sector 63,
Noida, Uttar Pradesh, India
SALES (INDIA): +91-836-8198-238
SALES (INT.): +1-(702)-909-2783
HR DEPTT.: +91-977-388-3610