{{ searchResult.published_at | date:'d MMMM yyyy' }}

Loading ...
Loading ...

Enter a search term such as “mobile analytics” or browse our content using the filters above.

No_results

That’s not only a poor Scrabble score but we also couldn’t find any results matching “”.
Check your spelling or try broadening your search.

Logo_distressed

Sorry about this, there is a problem with our search at the moment.
Please try again later.

Duplicate content has been causing major issues for online retailers for many years, primarily due to the negative impact it has on search engine rankings.

Due to the size and complexity of online retail websites, there are far more areas that need to be addressed and monitored in order to eliminate duplicate content issues.

Here are nine potential causes of duplicate content, along with resolutions to help you overcome them.

1. Duplicate content caused by faceted navigation

This issue is very common for ecommerce sites and is likely to be the worst from this list for SEO. A single category page on some retail websites could have over 100 variations of it’s URL, due to the many combinations of parameters for facets / filters.

Here is an example of how a duplicate content issue caused by faceted navigation could arise:

Duplicate content from faceted navigation.

Top version = unfiltered category page | Bottom version = filtered version of the same page.

The example above illustrates how a query string is appended onto the existing URL to filter the results, however the content on the page will remain the same, resulting in duplicate content. Search engines will be able to crawl these duplicate pages and, in addition to the SEO issues with duplicate content, these pages will also eat up crawl budget too.

I’ve seen plenty of websites held back by indexation issues caused by faceted navigation and have also seen Google send messages via Webmaster Tools. 

Preventing these pages from being indexed

There are a number of ways you can prevent search engines from accessing / indexing faceted navigation pages, here are the ones that I would recommend: 

Meta robots rules

Assigning meta robots rules to filter pages is the best solution in my opinion and I’ve had the most success with this in the past. 

I would always recommend using the following meta robots tag:

<meta name="robots" content="noindex,follow">

The ‘noindex’ tag tells search engines not to index the page and the ‘follow’ tag tells them to continue following the links on the page.

Parameter handling in Google Webmaster Tools

Although I’ve had mixed results with them in the past, the parameter handling tool within Google Webmaster Tools has definitely improved and lots of SEO’s I know use it as their primary way to address dynamic pages.

Webmaster Tools is generally pretty good at identifying the parameters on your website, however you can also manually add additional ones if it doesn’t find the ones you're looking to address.

Parameter handling in Webmaster Tools

For the example above, you can see that I’ve told Google that the parameter sorts page content and have asked them not to crawl any of these URLs. There are a number of options for this, enabling you to have exceptions for pages you would like to be indexed.

Canonical tag

The canonical tag was introduced in 2009 to help webmasters tell search engines that a URL is a variation of another URL. The canonical tag can be used to tell search engines that filter pages are duplicate versions of the original category page.

I would recommend using the meta robots rules over the canonical tag as I’ve seen plenty of examples where search engines have ignored the canonical tag and continued to index pages. 

2. Duplicate content caused by product ordering

Similarly to faceted navigation, directory-ordering parameters create duplicate variations of pages (with the same content and meta content), which can be accessed by search engines.

Often, these pages manage to escape under the radar for retailers as they won’t have the same level of volume as faceted navigation. 

I would recommend adopting the same methods of resolving this issue as with duplicate content caused by faceted navigation; either using meta robots rules (recommended as the best option), the canonical tag or using parameter handling in Google Webmaster Tools.  

3. Duplicate content caused by hierarchical URLs

A few years ago hierarchical URLs were considered to be best practice for ecommerce websites, as they illustrate the structure of the website. Now, as SEO has evolved, hierarchical URLs can often be a cause of duplicate content issues as they create multiple variations of the same products if they’re featured within the same category.

In most cases, these products will have the same or very similar content, which will provide detrimental to search engine rankings.

I would usually recommend, if possible, creating rewrite rules to change these pages to top-level product URLs. If you’re unable to do this, I would recommend using the canonical tag to pass value to the preferred page and also make it clear which one is the primary version to the search engines.

4. Duplicate content caused by search pages

Like faceted navigation, catalogue search pages are another prime example of a common duplicate content perpetrator, with lots of large and small retailers leaving them accessible to search engines.

I’ve seen scenarios where retailers have had well over 100,000 search pages indexed, which has caused significant issues for them with their rankings.

The easiest way to prevent search engines from accessing these pages is to block the directory in the robots.txt file.

Example: 

To block pages like this:/shop/catalogsearch/result/?q=testquery

You would add this line into the robots.txt file: Disallow: /shop/catalogsearch/

If these pages have already been indexed for your website, I would recommend just removing the directory within Google Webmaster Tools – your pages will generally be removed from the index within 12 hours.

5. Duplicate content caused by internationalisation

Something that I see time and time again from retailers is the introduction of international versions of their websites before they’ve translated all of their content. The result of this is lots of duplicate versions of products and categories with slightly different URLs.

This is not always the case, some platforms will not manage international products with multi-site functionality (whichwould create replicas of the initial site architecture), so are less likely have this issue.

In my opinion, the only truly effective way to resolve this situation is to add the international content, although you could temporarily block access to the pages until the content has been added.

For those looking to launch international versions of their website, I would strongly recommend using university students, as they’re quick and affordable. In the past we’ve had great results from posting adverts on university websites.

6. Duplicate content caused by pagination

Pagination is another really common duplicate content issue for online retailers and before the introduction of the rel=next and prev tags, it was seen as a big issue for SEO.

The rel=next and prev tags, which were introduced by Google in 2011, allow webmasters to tell search engines which pages are pagination and prevent them from being seen as duplicate content. 

7. Duplicate content caused by session IDs

Session IDs are one of the most annoying things to have to face in SEO, as URLs are created based on user sessions and can cause an unlimited amount of new duplicate pages to be created / indexed.

Ecommerce websites commonly have issues with session IDs, as the unique IDs are appended to the URLs when there’s a change in the host name, so the session ID would be appended to the next page visited. So when users move from one subdomain (often because of an SSL certificate) to another, a session ID is appended to the URL.

Session IDs can be a complete nightmare to eliminate, but the best (and only real solution) is to resolve the issue properly and stop the session IDs from being created. 

You can also use the parameter handling section of Google Webmaster Tools to tell search engines to ignore session IDs. 

8. Duplicate content caused by print pages

Often, more so with older ecommerce websites, there is an option on product pages to display a printer friendly version of the page, which would display the same content but on a different URL. These pages are duplicate versions of the product pages and are therefore duplicate content.

In order to prevent these pages from being indexed, you need to either apply meta robots rules (noindex, follow) to the pages if they’re dynamic or disallow the directory in the robots.txt file.

9. Duplicate content caused by review pages

Customer reviews are displayed in different ways depending on the way the site has been built (or the platform it’s been built on). Some websites display all of the reviews on product pages and then have separate (often paginated) pages with just the reviews.

Here’s an example of this:

Duplicate content caused by review pages.

As you can see, the review pages contain the same customer review content but on a different URL. 

In order to prevent these pages from being indexed, you just need to disallow the directory in the robots.txt file, or if they’re dynamic, apply meta robots rules (noindex, follow).

If you have had any other issues with duplicate content, please feel free to ask questions within the comments below or email me at paul (@) gpmd.co.uk. 

Paul Rogers

Published 8 January, 2013 by Paul Rogers

Paul work as Digital Marketing Manager for Session Digital / Inviqa and is also the Co-Founder of MageSEO. He is a contributor to Econsultancy. 

3 more posts from this author

Comments (20)

Comment
No-profile-pic
Save or Cancel
Adam Palczewski

Adam Palczewski, Global Digital Operations Director at Mindshare / WPP

Thanks for this post Paul. Just to challenge you even more, here is another big one I would add to the list.

<< 10. Duplicate Content caused by owning multiple brands selling the same products all using the same CMS platform. >>

Imagine 9 separate brands/urls with duplicate content... this is the challenge brands like ShopDirect have.

Regards,
Adam

almost 4 years ago

Avatar-blank-50x50

Michael

A great post and totally agree with Adams comment. Having worked on sites where there are issues of duplicate content issues because of the product feeds that drive much of ecommerce I know this frustration.

With so many sites accepting the feed its vital to ensure that your most important pages are edited and that they offer more than the other x hundred sites; standing out to both users looking at pages and bots indexing

almost 4 years ago

Avatar-blank-50x50

Alison

I see many parallels here with the problems experienced when trying to decide the best page-naming policy within a web analytics tool.
The need to know which page is viewed but also how it was reached takes up alot of the effort in new implementations.

almost 4 years ago

Avatar-blank-50x50

Jan-Willem Bobbink

What about crawling budget and faceted navigation? Even if you noindex all those filtered pages, Google still needs to crawl them all. You should definitely use a different solution for that. Think of URLs with hashtags or simply use AJAX for filtering? Make sure you add an option to the CMS for making exceptions on that for keywords like "brown converse sneakers size 6" if there is enough volume to create a specific landingpage for it.

almost 4 years ago

Paul Rogers

Paul Rogers, Digital Marketing Manager at Session Digital / Inviqa

Something that I forgot to mention in this article is that I'd recommend implementing all three solutions for faceted navigation and order pages to cover all of the bases.

Also, I'd recommend add nofollow to links to pages that you don't want Google to access, like faceted navigation pages.

almost 4 years ago

Paul Rogers

Paul Rogers, Digital Marketing Manager at Session Digital / Inviqa

Hi all,

Adam:
Thanks for commenting, I agree, this is a nightmare, I used to work for a retail company that had relatively similar product ranges on around 10 websites and duplicate content was always an issue.

Michael:
Thanks for commenting, I agree, prioritising the products is key, I always used to try and get different people writing content for different sites, but it wasn't always possible. We also had a lot of legacy issues with the site and we prioritised the products by traffic.

Jan-Willem:
Thanks for commenting, I agree with the creation of landing pages, we have built a Magento module that automatically populates categories, which are created based on filters and product attributes. This has been really important in some of the work we've done for our clients.

I know that pages that have the noindex tag are still crawlable, it only prevents the pages from being indexed. I would use parameter handling to tell Google not to access the faceted pages too.

I agree that AJAX filtering is a great way of avoiding this issue, however it's a bit of a luxury for a lot of retail websites.

almost 4 years ago

Avatar-blank-50x50

Gole

Hi Paul, it is really a great read.

I have one more confusion. Suppose I have a product review site where i get user reviews on the products. I am showing each review on the product page and also creating separate pages for each review, also showing reviews on particular person's profile page on same site. I know it's not a good practice.

Can you please suggest what to do in this case

almost 4 years ago

Paul Rogers

Paul Rogers, Digital Marketing Manager at Session Digital / Inviqa

Hi Gole,

A lot depends on if there's any unique content on the pages with the duplicate review content. If not, I would probably suggest removing them, canonicalising them or redirecting them.

Since panda, having lots of pages with little or not unique content is a risk.

almost 4 years ago

Richard Hatfield

Richard Hatfield, Director at Allies Limited

Thanks for the post Paul. Some great points here that highlight the complexity of ecommerce. Should the retailers with significant Google PPC budgets have to 'jump through these hoops' or should Google with their huge tech resources be doing more to handle this problem for their customers. I am sure a solution to this would be mutually beneficial.

almost 4 years ago

Paul Rogers

Paul Rogers, Digital Marketing Manager at Session Digital / Inviqa

Hi Richard,

Thanks for your comment, it's a great point. I think they probably should be doing more, but more so that what they're doing now should've been done years ago.

They've improved the parameter handling resource in GWT a lot over the last 12 months, but it could still be a lot better. It should be easier for users to prevent Google from crawling pages and types of pages.

I'm hoping to see more improvements from GWT this year, with indexation being a big part of the changes.

Thanks,

Paul

almost 4 years ago

Richard Hatfield

Richard Hatfield, Director at Allies Limited

Fingers crossed!

Rick

almost 4 years ago

Avatar-blank-50x50

Charles Smith

Here is another issue; I am a second hand book seller and upload our stock onto Amazon, plus we have the same stock on our own site.

Is there a way of preventing Google from treating our site as duplicate content?

almost 4 years ago

Paul Rogers

Paul Rogers, Digital Marketing Manager at Session Digital / Inviqa

Hi Charles,

Google tends to look at the original source of the content (the one that it crawls first) as the author. I've had issues in the past with copywriters writing content and adding it to their site as a case study before we published it.

The only real way of avoiding this is to write unique content for your own website and created= separate content that can be distributed to other websites.

Unless you could get the other site to agree to add a canonical tag directed back to your website, which would be unlikely.

almost 4 years ago

Avatar-blank-50x50

Charles Smith

Thanks for the reply Paul.

We have 40,000 books on line, so its not possible to create unique content to try and avoid duplicate copy.

And a book title and its author, isbn number cant be made unique - it has to be the same.

Are you saying that whichever site we upload our stock list to first will be seen as the original content, and the second site the duplicate?

So if we upload to our site first and then waited a few days and uploaded the same stock to Amazon, the Amazon site would be regarded by Google as duplicate, and so not indexed ?

I am sure Amazon would have something to say about that.

almost 4 years ago

Paul Rogers

Paul Rogers, Digital Marketing Manager at Session Digital / Inviqa

Whenever I've dealt with Amazon for retailing, they've always specified that the content needs to be unique.

Google does usually attribute the content to the original author, as this prevents people from stealing each others content to improve their rankings.

You're more than welcome to email me if you have any additional questions (paul @ gpmd.co.uk)

almost 4 years ago

Ashley Friedlein

Ashley Friedlein, Founder, Econsultancy & President, Centaur Marketing at Econsultancy, Centaur MarketingStaff

Interesting post Paul. We (on this site) have to deal with the internationalisation issue.

From July of last year we started serving different URLs for every page of the site for *every* country in the world using subdirectories i.e. on the same domain. This is so that we have uniquely addressable URLs for each country as we begin to make the content different for each country. However, until we get round to that, most of the content is identical.

The way we deal with duplicate content is to:
1. Canonicalise URLs to the 'master' version which has no country-level sub-directory.
2. But then use Sitemaps to 'tell' Google about the different country versions using the "hreflang=" attributes, despite the fact the sites are actually all currently in English.

We had hoped that in the SERPs Google would show the country-level URLs. It did do a bit for a while but now seems to show the master/canonical URL only. This doesn't matter too much because we auto-redirect users once they click to the relevant country URL anyway.

Some data / charts to look at:
1. Econsultancy's SEO traffic for 5 months preceeding change and 5 months post change (http://assets.econsultancy.com/images/0002/7834/Econsultancy_SEO_visits_2012_H1_and_H2_comparison.jpg). As you can see not much impact either way.

2. From GWT a chart showing a big increase in the no. of URLs Google indexed on our domain over last year: http://assets.econsultancy.com/images/0002/7832/Econsultancy_URLs_indexed_2012.jpg

3. BUT, GWT showing that Google crawled a *lot* more URLs but didn't index them - we're presuming because it knows that they are duplicate: http://assets.econsultancy.com/images/0002/7833/Econsultancy_URLs_crawled_but_not_indexed_2012.jpg

As a broader point our experience is that duplicate content certainly doesn't do your search rankings any good but equally it seems to be rare that it does a lot of damage? So I do wonder sometimes whether (if you're using Sitemaps, GWT to 'educate' Google about your site) whether it is better to leave Google to figure stuff out rather than spend too much time worrying about duplicate content? In theory duplicate content can weaken your internal links and tidying up can help you 'link sculpt'. Whilst I'm sure this helps I do wonder whether the effort is worth the return vs other things you could be doing (e.g. creating a better website and content in the first place).

almost 4 years ago

Paul Rogers

Paul Rogers, Digital Marketing Manager at Session Digital / Inviqa

Hi Ashley, thanks for your comment.

We've had very similar situations with clients that have multi-national sites, where they've introduced new countries before they've had the content to add.

I would say that duplicate content can be a much bigger issue for ecommerce sites - I've worked with a number of clients who have seen significant drops in traffic (and have received messages from Google) due to dynamic pages being indexed. Duplicate content has also become a lot more important to look out for post-panda.

Thanks again for your comment, found hearing about what you're doing at Econsultancy really interesting.

Thanks,

Paul

almost 4 years ago

Avatar-blank-50x50

Jan

Are you missing one issue ?
non www and www version od the website

Jan

almost 4 years ago

Paul Rogers

Paul Rogers, Digital Marketing Manager at Session Digital / Inviqa

Hi Jan,

Thanks for your comment. I didn't include all of the duplicate content issues that could impact websites as I wanted to keep it fairly specific to ecommerce.

I know there are a few in there that would be relevant to non-retail sites too, but they're all potential issues that should be eliminated by retailers.

almost 4 years ago

Avatar-blank-50x50

diana s, seo manager at q

thank you for this very interesting post. regarding faceted navigation, do you recommend (in addition to meta robots, rel canonical, GWT and nofollow) to block those pages via robots.txt regex line ?

thank you

about 3 years ago

Comment
No-profile-pic
Save or Cancel
Daily_pulse_signup_wide

Enjoying this article?

Get more just like this, delivered to your inbox.

Keep up to date with the latest analysis, inspiration and learning from the Econsultancy blog with our free Daily Pulse newsletter. Each weekday, you ll receive a hand-picked digest of the latest and greatest articles, as well as snippets of new market data, best practice guides and trends research.