How to get Millions of Pages Indexed by the Search Engines – 2
In my previous post on Large Website Indexing – How to get millions of pages indexed by the Search Engines I told you a bit about my experience with very large websites and the underlying challenges every webmaster must face.
Duplicate content and poor website architecture the biggest enemies
Large Website Indexing is an art more than a science, it’s like being a veterinarian: he knows the animal is sick and can only “feel” his way around to identify the sickness and prescribe a cure.
One of the reasons I think large website indexing is an art is because of the inaccuracies and inconsistencies search Engines inject into our analyses, making all our assumptions fall short of the “real” solution with a large error and an indefinite approximation. Try running a site: command several time during the day and repeat the operation every day for a week nd see for yourself: the results are constantly shifting.
Analyzing a set of indexed pages can be tricky. You must work your way around the 1,000 page limitation set by Google.
When you run the site: command Google will tell you (approximately) how many pages have been indexed and are present in the index, but will show no more than 1,000 pages. Se for example let’ say your widgets online shop has been successfully spidered and indexed by Google. Running a site: command returns 35,000 pages: Google will show you no more than 1,000 if you are lucky, i.e. there is no duplicate content otherwise you’ll get this message (you’ve probably seen before):
Should this occur you need to go back to the first part and solve whatever problems you have, eliminating duplicate content issues.
Use this schematic as your guideline to identify your pages. By doing so you’ll be able to generate a mapping of all indexed pages versus existing ones and have a good idea on which pages are missing from the index.
You’ll have to put yourself to the test and invent appropriate queries to extract the information – here are a few advanced search guides for the major search engines: they are all very similar but present differences:
- Unofficial Google advanced Search Sheet (Official Google Cheat Sheet)
- Ask.com advanced search tips
- YAHOO! search short cuts (Official YAHOO! Advanced Search)
- Bing/Live Advanced Search Operators
This stuff is complicated to explain without a website to examine so here’s the deal: I’ll run an analysis on a website submitted by one of you readers – here’s the deal:
- Your website must have no less than an estimated 2,000 pages
- Must be online for at least 12 months
- Prepare a very brief summary of your website with whatever information you have on URL structure (if any – it will save me some time)
- Leave a comment expressing your interest in a free large website indexing analysis, and enclose the summary you prepared for me to review
- You must be willing to share the analysis by allowing me to post it on my blog
- I’ll choose a website and contact you directly for the URL
Now get to work and prepare your summary !