How to avoid duplicate content in WordPress

The way the content is arranged and structured in WordPress is useful for the users, no doubt, but becomes a problem when search engines index three times more pages than they should. Literally, on average, for a fresh WordPress installation with 10 published posts, Google ends up indexing close to 30 URLs, all pointing back to just 10 unique articles.

You might say you’ll get more traffic from Google if you have more pages in Google index, but there is a bigger chance of getting a penalty rather than a boost in traffic. There are obvious advantages of clearing up this mess, like better crawling results, better Page Rank distribution and original content.

The problem can be solved pretty simple, just by adding “noindex, nofollow” to unwanted pages. These pages can still be crawled by spiders; however the search engines will not include them in the index anymore.

Using the built-in WordPress functions you can create and add the code below inside your header.php file, before the </head> tag.

<?php if($paged > 1){
  echo '<meta name="robots" content="noindex,follow" />';
} ?>

<?php if(is_author()){
  echo '<meta name="robots" content="noindex,follow" />';
} ?>

<?php if(is_trackback()){
  echo '<meta name="robots" content="noindex,follow" />';
} ?>

I am allowing Google Bot and also the other search engine spiders to crawl and index my categories, which contain only the excerpts of my blog posts, so the content is not really the same as the content on the pages displaying the single posts.

You can use the following code if you want to exclude also the categories from the Google index.

<?php if (is_category() ) {
  echo '<meta name="robots" content="noindex,follow" /> ';
} ?>

Google likes pages that have large amounts of content, therefore pages like categories or archives, will most likely receive more credit then single post pages for example.

After using the code above you might notice a small decrease in traffic but it should be temporary, until the link juice gets redistributed between the pages that remained in the index.

You can check to see if everything went well by doing a search query site: example.com and see if the pages you wanted to remove from Google index are still there or not. Depending on the crawl rate of your website it can take anywhere from a couple of hours to a few days for the changes appear in the results page.

Leave a Reply

    1. Over 9 years experience

      I've developed websites using WordPress and Joomla content management systems and have built bespoke websites using Zend Framework.

    2. Obsessive about code

      I design and develop hand-crafted websites using HTML5 & CSS3 web standards. The result is a quick, future-proofed search engine friendly website.

    3. Pixel Perfect web design

      I can create unique web designs catered to your vision that will help grow your business to its full capacity and engage with your customers.