
Dynamic websites can be a problem, not because the search engines cannot index the content, but because of the way your website is set up, generating an indefinite amount of duplicate content.
This problem was addressed in a post I wrote just a few days ago entitled Dynamic Websites: How to Avoid Indexing Duplicate Content. That post had a focus on website architecture and problems that can arise when the programmers work alone on the design and development.
But even if your website is well thought out you still may need to filter out duplicate content – take for example contact forms, or forms that are used to make bookings for hotel rooms:


This can be another duplicate content nightmare. In fact when the search engine spiders crawl your website they’ll go form page to page and may (partially) fill in the forms generating an indefinite number of pages that will get indexed and inflate website dimensions.
In cases like this the process of blocking out pages can be relatively easy – you need to block the form with your robots.txt file by indicating the exact page (or folder) generating the duplicate content. Forms shouldn’t really be indexed in the first place as they carry no useful information – all the more reason to keep them out of the search engine index.
So let’s say that you have booking forms on your website located at:
http://www.yourwebsite.com/Booking/hotel-booking.php
http://www.yourwebsite.com/Booking/car-rental-booking.php
…
This is a fortunate case because there is a folder where all the booking forms have been placed so all you need to do is block out the entire folder with your robots.txt file with this instruction line:
Disallow: /Booking/
That’s pretty simple isn’t it ? The website where I spotted this problem had ballooned to 10.000 pages from the actual 346.
It will take teh search engines some time to eliminate the duplicate content pages, so don’t get worried about that: it takes the time that it takes, don’t rush the process.
In Google Webmaster Tools you can actually see progress as the number of blocked pages increases day by day (depending on how often your site is crawled and how many pages are crawled on a daily basis).
UPDATE: Here is another interesting post related to dynamic websites and duplicate content: Cleaning Up the Retail Site Navigation Mess
Related articles by Zemanta
- 6 Tips on Setting Up a Sitemap
- Discover How the Search Engines Work With Web Crawlers
- Web Crawlers- an Efficient Way to Search Engine Listing
- What Should One Know Before Getting There Website Optimized
- How Do Search Engines Work With Web Spider
- How Do Search Engines Work – Web Crawlers
- Automated Google Sitemaps Generators
- Search Engine Basics
- What is a Search Engine Anyway?



{ 9 comments }
Didn’t ever think of this problem. Thanks for posting
Forms as a problem? This is something new for me and obviously well thought out. I hate messing around with the robots.txt file since I am still somewhat of a newbie, but this is always helpful info so we don’t get slapped by the big G!
I didn’t really think of this problem either. It’s something many webmasters must be aware of. Thank you for the heads up.
- John.
This is a great post. You hear so much MIS-information about “duplicate content” that it’s good to see someone talk about what really IS duplicate content and what you can do about it.
I never pay attention to such case too. But, you’re right that this can be a problem too. Thanks for underlining this matter for me before I get such problem.
Duplicate content really is such a mystery. I always like to hear someone elses point of view, it helps me kind of with my own theories.
I stopped using contact forms all together on my websites, I said hell with it. I most likely get some weird emails or some spam so I just don’t even bother anymore.
The nice thing about many blogs is that the permalinks in the title helps reduce the duplicate content problem created. I can see how it could be a problem in dynamic sites though.
On a different note, there might be a problem with the styling on your formatting for the comment input boxes for name, email, and website.
Nice picture you have there:))
Comments on this entry are closed.
{ 1 trackback }