How is Google finding pages which don't have any links to them?

Name: How is Google finding pages which don't have any links to them?
Uploaded: 2010-09-14T19:56:00.000Z
Duration: 3 min 9 s
Channel: Google Search Central
Description: - Googlebot can index pages without direct site links by following obscure links from external sites. Additionally, it can fill out simple HTML forms to discover new pages, especially if the forms have limited inputs like dropdowns. This allows Google to index pages generated by a site's search feat

15.1K views

•

September 14, 2010

Google Search Central

How is Google finding pages which don't have any links to them?

TL;DR

Googlebot can index pages via external links and form submissions.

Transcript

We've got an interesting question from Danny in Bucharest who wants to know, "How can Googlebot crawl and index pages that don't have any links to them on my website? I find each day two or three pages in the index that don't have any links to them on my site. The pages are generated by the search field of my website." Okay, so you almost threw me ... Read More

Key Insights

Googlebot can index pages even without direct links by following obscure links from other sites. This means external sites might link to your pages without your knowledge.
The Googlebot can fill out simple HTML forms to discover new pages, especially when forms have limited input elements like dropdowns.
Google published a paper in 2008 about crawling through HTML forms, highlighting their ability to discover pages through form submissions.
Complex forms, especially those requiring sensitive information like credit card numbers, are generally not crawled by Googlebot.
Webmasters can use robots.txt to disallow Googlebot from crawling certain areas, such as search results pages, to prevent unwanted indexing.
Googlebot's crawling strategy aims to discover and index as much web content as possible for fast retrieval during searches.
The Googlebot's ability to index pages is part of a broader strategy to provide comprehensive search results in under half a second.
Webmasters can learn more about Googlebot's form crawling capabilities by searching for related documentation and guidelines.

Install to Summarize YouTube Videos and Get Transcripts

Explore YouTube Video Summarizer or Get YouTube Transcript Extractor

Questions & Answers

Q: How does Googlebot index pages without direct links?

Googlebot can index pages without direct links by following obscure links from other websites. External sites might link to your pages without your knowledge, allowing Googlebot to discover and index those pages even if they are not directly linked from your site. This is part of Google's comprehensive crawling strategy.

Q: Can Googlebot fill out forms to discover new pages?

Yes, Googlebot can fill out simple HTML forms to discover new pages. If a form has limited input elements, such as dropdowns, Googlebot can enumerate the values and submit the form to find new content. This capability helps Google index pages generated by a site's search feature.

Q: What did Google publish in 2008 about form crawling?

In 2008, Google published a paper detailing their ability to crawl through HTML forms. This publication explained how Googlebot could discover new pages by submitting forms, especially when forms are simple with limited input elements. This allows Google to index additional content from websites that utilize forms.

Q: Are there any limitations to Googlebot's form crawling?

Yes, Googlebot generally does not crawl complex forms, especially those requiring sensitive information like credit card numbers. The bot is designed to avoid such forms due to privacy and security concerns. Instead, it focuses on simple forms that can be submitted without sensitive data.

Q: How can webmasters prevent Googlebot from crawling certain pages?

Webmasters can prevent Googlebot from crawling certain pages by using the robots.txt file to disallow specific areas. For example, they can disallow crawling of search results pages by specifying the appropriate directory in the robots.txt file, thus controlling what content gets indexed by Google.

Q: What is the purpose of Googlebot's comprehensive crawling strategy?

Googlebot's comprehensive crawling strategy aims to discover and index as much web content as possible. This ensures that Google can provide users with fast and comprehensive search results, typically returning results in under half a second. The strategy involves indexing pages with and without direct links.

Q: How can webmasters learn more about Googlebot's capabilities?

Webmasters can learn more about Googlebot's capabilities by searching for official documentation and guidelines on Google's website. This includes information on how Googlebot crawls forms, the types of forms it can handle, and how to use robots.txt to manage what gets indexed by Google.

Q: What role does form submission play in Google's indexing process?

Form submission plays a role in Google's indexing process by allowing Googlebot to discover new pages generated by forms. When forms are simple and have limited inputs, Googlebot can fill them out to access and index additional content, enhancing the comprehensiveness of Google's search index.

Summary & Key Takeaways

Googlebot can index pages without direct site links by following obscure links from external sites. Additionally, it can fill out simple HTML forms to discover new pages, especially if the forms have limited inputs like dropdowns. This allows Google to index pages generated by a site's search feature.
In 2008, Google published a paper on crawling HTML forms, explaining their method to discover pages through form submissions. However, complex forms requiring sensitive information are generally not crawled. Webmasters can control this behavior using robots.txt to disallow crawling of specific areas like search results.
Googlebot's strategy is to index as much web content as possible to provide comprehensive search results quickly. This includes indexing pages without direct links and discovering pages through form submissions. Webmasters can find more information on Google's form crawling capabilities through official documentation.

Read in Other Languages (beta)

English

Share This Summary 📚

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on:

Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator

Explore More Summaries from Google Search Central 📚

Japanese Google Policy Office Hours（Google ポリシーオフィスアワー 2022 年 04 月 28 日）

Google Search Central

How to Optimize Mobile Sites for Speed and User Experience

Google Search Central

English Google Webmaster Central office-hours from June 9, 2020

Google Search Central

How Does COVID-19 Impact SEO Work and Events?

Google Search Central

English Google Webmaster Central office-hours hangout

Google Search Central

Search Console Help Center | Search Off the Record

Google Search Central

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on:

Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator

How is Google finding pages which don't have any links to them?

15.1K views

•

September 14, 2010

Google Search Central

How is Google finding pages which don't have any links to them?

TL;DR

Googlebot can index pages via external links and form submissions.

Transcript

Key Insights

Googlebot can index pages even without direct links by following obscure links from other sites. This means external sites might link to your pages without your knowledge.
The Googlebot can fill out simple HTML forms to discover new pages, especially when forms have limited input elements like dropdowns.
Google published a paper in 2008 about crawling through HTML forms, highlighting their ability to discover pages through form submissions.
Complex forms, especially those requiring sensitive information like credit card numbers, are generally not crawled by Googlebot.
Webmasters can use robots.txt to disallow Googlebot from crawling certain areas, such as search results pages, to prevent unwanted indexing.
Googlebot's crawling strategy aims to discover and index as much web content as possible for fast retrieval during searches.
The Googlebot's ability to index pages is part of a broader strategy to provide comprehensive search results in under half a second.
Webmasters can learn more about Googlebot's form crawling capabilities by searching for related documentation and guidelines.

Install to Summarize YouTube Videos and Get Transcripts

Explore YouTube Video Summarizer or Get YouTube Transcript Extractor

Questions & Answers

Q: How does Googlebot index pages without direct links?

Q: Can Googlebot fill out forms to discover new pages?

Q: What did Google publish in 2008 about form crawling?

Q: Are there any limitations to Googlebot's form crawling?

Q: How can webmasters prevent Googlebot from crawling certain pages?

Q: What is the purpose of Googlebot's comprehensive crawling strategy?

Q: How can webmasters learn more about Googlebot's capabilities?

Q: What role does form submission play in Google's indexing process?

Summary & Key Takeaways

Googlebot can index pages without direct site links by following obscure links from external sites. Additionally, it can fill out simple HTML forms to discover new pages, especially if the forms have limited inputs like dropdowns. This allows Google to index pages generated by a site's search feature.
In 2008, Google published a paper on crawling HTML forms, explaining their method to discover pages through form submissions. However, complex forms requiring sensitive information are generally not crawled. Webmasters can control this behavior using robots.txt to disallow crawling of specific areas like search results.
Googlebot's strategy is to index as much web content as possible to provide comprehensive search results quickly. This includes indexing pages without direct links and discovering pages through form submissions. Webmasters can find more information on Google's form crawling capabilities through official documentation.