Google's Duplicate Internet Content Filter in Action
By Tony Simpson
If you don't believe Google's Duplicate Content Filter exists, I have Dramatic Proof their Internet content filter exists and it's very effective.
On July 5, 2005 I published an article entitled "7 Top Ways to Avoid Link Theft" which was picked up and included as content on other websites.
Before the article was released I checked on Google whether any results already existed for the exact phrase "7 Top Ways to Avoid Link Theft" and there were no listings for that term.
Over the next few weeks I monitored through a search query on Google how many results appeared in Google for the title of my article. One week after publication there were 6,760 results listed in Google, a week later it was 14,100 and it reached a peak of 17,000 results by July 26, 2005.
4 weeks after publication the results in Google had fallen slightly to 16,600.
Almost 6 weeks after publication the results listed in Google had fallen to 44.
In a matter of less than two weeks the number of search results on Google.com for the title of my article had gone from 16,600 to just 44.
In case you're thinking this is because all these other websites dropped by article and replaced it with other content I should add that a search on Yahoo.com on the same day still showed 14,300 results for my article.
What's more of these 44 results on Google, more than half consist of listings from the same websites. In other words some sites have the same article duplicated on different pages on their website.
So Google's Internet Content Filter is not used to remove duplicate listings from the preferred websites it chooses to keep in the search results.
On August 28th, 2005 8 weeks after first publication I distributed the article again to a new list of article sites to repeat the process. After 6 weeks the same article had reached a peak of 5,620 results on Google. Less than 2 weeks later the results had fallen to 217.
For me this was dramatic proof that Google's Duplicate Internet Content Filter is active and very effective. If you're wondering if other major search engines have a duplicate content filter I can confirm that Yahoo certainly does. The same article which was once listed on 14,300 sites on Yahoo, has fallen to 344 over the same time period.
From these results it would seem Google takes about 6 to 8 weeks to remove duplicate content using its Duplicate Internet Content Filter.
But the question remaining is just how does Google decide which out of over 16,000 results does it keep and which does it reject ?
I have witnessed situations where my own articles appear in results on other websites, but are not listed in the results for my own website.
So clearly Google does not take into account who the originator and author of the original article was when deciding which sites will remain in its search results.
It also seems to have nothing to do with where Google first finds the article.
Some articles I have published to my website for several weeks before releasing them for distribution to other websites.
In that time the Google spiders have visited my site several times and Google has had enough time to work out that the article was first found on my site.
It would be interesting to see if it's possible to work out what factors Google is using in its Internet Content Filter to decide which results to keep in its listing and which ones to remove. But that's for another article.
About The Author
Tony Simpson is a Web Designer and Search Engine Optimizer who brings a touch of reality to building a Web Business.