Google and duplicate content


If you’ve asked me recently about how I copy my posts onto LiveJournal, read this! 

After years of being the #1 hit for “Rich Lafferty” on Google, I noticed last night that I’ve pretty much completely disappeared from Google (while maintaining top links in Yahoo and MSN Search). At first I thought that I’d been caught by really bad timing, because I had accidentally turned on search engine blocking in WordPress for about five minutes, but checking my logs, Google didn’t come by during that time. I also noticed that my site was no longer cached by Google.

Digging around on SEO forums (I feel dirty!), I found a post about how Google handles duplicate content in an official Google blog. It seems that Google tries to only list one copy of duplicate sites, and it’s not just PageRank that makes it decide which to do. So when your content appears on two sites, Google guesses which site people will want to see and the other one disappears.

I’m pretty sure that’s what happened to me; see, for instance, this search for a unique sentence that should appear both on richtext and on my LiveJournal mirror. And note that all of my content is still in Google, so it’s not like I was banned. Lastly, note that a search for Rich Lafferty turns up my LiveJournal pretty high in the results considering that my name only shows up once on that page.

Anyhow, I’ve turned on search engine blocking on my LiveJournal, which will hopefully suffice to inform the Googlebot that my own website is the best copy to point people to, not my LiveJournal copy. If you’ve done the same thing I have, copying your website content from WordPress to LiveJournal with LJXP, you probably want to do the same.

(Incidentally, while looking into this I discovered Google Webmaster Tools, which offers a lot of useful insight as to what Google is doing with your website, lets you tweak how they list and spider your site, and tells you what people are searching for that’s leading them there.)