Often in SEO, we discuss the issue of duplicated content. It’s something that many people working in the realm of digital marketing and copywriting are aware of, however, what do we know about the real impact of duplicated content?
Probably one of the most common myths in SEO is that duplicate content is penalised. Google reads the content of a page from an HTML perspective and when you take into account all the replicated elements shared between each page including templates etc most of the content on a page appears duplicated to Google.
What Google is really bothered about is copied content. But we’ll come onto that.
Basically, Google wants quality pages. This is how they get their evaluators to look at things:
“Important: The Lowest rating is appropriate if all or almost all of the MC (main content) on the page is copied with little or no time, effort, expertise, manual curation, or added value for users. Such pages should be rated Lowest, even if the page assigns credit for the content to another source.“ – Google Search Quality Evaluator Guidelines July 2017
It’s interesting to note that at no point during the Google Search Quality Evaluator Guidelines is the phrase duplicated content discussed. There is, however, a reference to “Copied Main Content”. And whilst we know duplicated content is a thing, it’s treated differently that copied content. The important thing from an SEO perspective is that we are creating unique content that’s not lifted.
Every page has some duplicated content. It has to or the page wouldn’t look the same as the other pages on the website. Duplicated content in this sense is not used to manipulate the rankings. From Google’s perspective, that’s what they are most bothered about. People who are trying to game the system.
Copied content is penalised algorithmically and or manually, so why do we all have our knickers in a twist about duplicate content?
Probably every SEO consultant I’ve ever spoken to bangs on about duplicate content in websites being one the big reasons pages don’t rank but Google says differently.
If you search YouTube you will find hundreds of videos about Google’s Duplicate Content Penalty. However, this video from Google’s Andrey Lipattsev spells out that there is no duplicate content penalty.
Lipattsev goes to great lengths to specify that there is no penalty if your page doesn’t rank above a competitors using the same content. It’s just Google has worked out that the content is copied and not unique.
So to roundup Google’s position on duplicate content: There is no duplicate content penalty. All Google is looking for unique content. This is where a good SEO copywriter comes in. But remember that content doesn’t have to just be unique, it has to add a certain amount of value to the topic. If there is duplicated content Google is clever enough to just filter it out of the results. Although there’s not a specific duplicate content penalty, it’s probably best to try to avoid as much as possible. But sorting out every bit of duplicate content on a website isn’t going to be the silver bullet you’re looking for.
Google wants us to create signals which demonstrate which pages are canonical. So if you’re creating pages with significant amounts of duplication with another page then you should link back to the other page wherever possible.
So the best SEO strategy remains to be creating a lot of quality content. Making it obvious to Google which is the canonical or lead content page. Distributing your content using good old fashioned content marketing and social media marketing, whilst trying to generate links back to your quality content to demonstrate relevance and authority to the search engine.
Probably one of the worst approaches you could take is trying to opimise thin or duplicate pages. You will get very little back in return. On the other hand, there will be some small gains.
What are the best SEO copywriting practices for duplicate content?
As discussed above, most business owners, marketers and even SEO consultants are confused about the issue of duplicate content penalties. Google loves a bit of obfuscation on a subject and they’ve got it with this one! Even though they’ve gone out of their way to state there are no duplicate content penalties, we know that duplicated content can have some issues from an SEO perspective.
The things that Google are currently looking out for is content that’s added to a website with the intention of manipulating. This could be very thin content, boiler-plate content or spun content that’s near to duplicated. This is where the problems occur and this is the type of content you want to be looking at rectifying if you are looking for rankings.
There is a lot of content out there that was inspired by other sources. Sometimes you can create a piece of content and not realises how much you’ve used another source as the structure for your piece. This in some ways is still plagiarism. This type of content is harder to detect.
So what are we looking for where content has been copied from other sources? The easiest to detect is a copy and paste job. That is scraped or simply copied content. Other times, the wording is minimally changed. In their quality guidelines states:
“This type of copying makes it difficult to find the exact matching original source. Sometimes just a few words are changed, or whole sentences are changed, or a “find and replace” modification is made, where one word is replaced with another throughout the text. These types of changes are deliberately done to make it difficult to find the original source of the content. We call this kind of content “copied with minimal alteration.”
I think it’s an interesting admission that just a few words being changed in a piece of content can be enough to make Google confused about which source is the canonical source. This should make us less worried about the about duplicate content between pages on a site.
And if we get past the issue of there actually being a duplicate content penalty Google’s John Mueller has said “we do have things around duplicate content” adding “that are penalty worthy.”
So no penalty. But they want content to be unique and actively do things about it.
How do SEO consultants classify duplicate content?
Probably the best place to start is Google. Google state that duplicate content is:
“Duplicate content generally refers to substantive blocks of content within or across domains that either completely matches other content or are appreciably similar. Mostly, this is not deceptive in origin.”
Most people in the modern era of internet marketing realise that they can’t just go about copying and pasting content from other sources, so this is more about the website’s foundation content.
If you republish content that appears elsewhere you run the risk of it damaging your rankings. I think press releases are a good example of why Google doesn’t like talking about duplicate content penalties. In this circumstance it’s not unheard of for content appearing on your own website to appear in one form or another on multiple websites. Yes, it’s duplicated. But it’s not manipulative. It’s just gone viral and you’re proud of it so you’ve shared it on your news feed.
I often get approached by businesses who have a number of websites all essentially selling the same products, often using identical content to sell it. For starters, this strategy isn’t really in keeping with Google’s Webmaster Guidelines which makes it clear you should have one website or you should have a really good reason for having multiple websites, ie you sell radically different products or services, perhaps to different industries or consumers and it would just make no sense to have them on the same site. If you have a number of websites all selling the same product you’re not doing that, you are just trying to manipulate Google’s algorithm. If that’s your game, expect to struggle to rank well.
You’re not going to rank well because you’ll be cannibalising your content. Google sets out to filter copy found on other websites that are the same as your own. They do this so they don’t get a SERP full of the same content. But it’s easy to see how people could perceive this as a duplicate content penalty.
There are a couple of other things to think about. The first is the danger of a manual action. As I’ve already mentioned, Google uses manual evaluators to determine their quality is kept high. They are the checks and balances. There are also times that Google will manually remove a page or site due to manipulative behaviour.
Possibly worse than that though is getting hit by Google Panda. The Google Panda Update was put in place to lower the ranks of poor quality websites. At the time there was a prevalence of content farms and private blog networks. Panda went about systematically reducing their ability to rank highly. The great news is this means higher quality websites will naturally move to the top of the rankings.
Basically, if you want your website to rank, the best idea is to make sure your content isn’t available anywhere else and when you’re creating content then try to work out how to create content that’s better than anything else out there. Moz talks about the 10x Content rule of content creation. I believe this is definitely true for SEO copywriting. This is a way to help you create the kind of RICH, UNIQUE, RELEVANT, INFORMATIVE and REMARKABLE content Google is looking for.
As I’ve already noted there are situations where duplicated content is necessary. And onpage duplicated content is one of the places where Google looks at it differently.
As an SEO consultancy, GrowTraffic advocates creating a lot of content. Our SEO copywriters go out of their way to create the type of killer content that really helps a website to rank. And we advocate creating a lot of it!
Often clients think they can rely on fiddling around with the optimisation of their website and see the results come flooding in. Experience tells me that although onsite optimisations can be made and are important, it’s content creation and distribution that’s really going to make the difference. And you’ve to make it good, and linkable so you’ll actually get people linking back to it. This is what SEO consultants are looking to achieve during the SEO copywriting phase.
What’s boilerplate content anyway?
When reviewing a Google patent, Bill Slawski explained that “Computer programmers will sometimes use the term “boilerplate” code to refer to standard stock code that they often insert into programs. Lawyers use legal boilerplate in contracts – often the small print on the back of a contract that doesn’t change regardless of what a contract is about.”
Google has got loads better at dealing with boilerplate content. One of the things John Mueller says Google is looking out for is pages which “stand on their own”, what exactly does that mean? And how could they determine is the content does stand on its own?
One of the key things they could do is work out how much of your page’s content is the same or similar to blocks on content of other pages on the website.
It’s not difficult to imagine that Google will easily discount the content blocks that are duplicated and then concentrate on the elements of content on the page. In this sense, the best idea is to make sure you don’t have too much boilerplate content on your site. This is something to consider when building a website.
My issue always comes in when content is duplicated or spun and actually ranks well. I’ve previously gone about changing the content of hundreds of pages to make them unique but found these SEO copy efforts have been fruitless or have resulted in a loss of rankings.
But plenty of websites have this spun content and they rank. My experience is these websites rank either for a while or for search queries that don’t deliver a lot of search traffic and aren’t very competitive. For competitive terms, I’ve found they tend not to do well.
One of the rules I use is to work out if I really need to create a page of content is to ask myself “How is this different from the other page I’ve written”. Too many pages all going after the same terms end up with cannibalisation which essentially means you can’t rank for the terms you’re targeting on either page.
I worked with a business that created a lot of high-value reports in PDF format which outranked their services pages. These were great pieces of content. But they didn’t do anything to make anyone want to call. Handling things like PDFs, print-only pages and mobile only websites etc is essential so you don’t confuse Google.
Should you rewrite product descriptions to make the copy unique?
Having written a lot of copy for SEO in my time I’m in two minds on the answer to this question. On the face of it is you almost certainly should rewrite product descriptions to make them more unique.
The key thing is to be way about the amount of content you end up spinning on your site because Google may have ways of discounting the benefits.
Is Copied Content A Good SEO Copywriting Strategy
You might get some traction with using copied main content for a short period of time, however, it will be short and you will ultimately get penalised. Also, I don’t think you can characterize this as an SEO copywriting technique as it’s just copying and pasting!
If you want to outrank your competitors from an SEO copywriting perspective, the best option is to follow Moz’s 10x Content strategy. Really create a reason why you should outrank them. Create killer content.
Here’s what Google has to say about Copied Main Content
7.4.5 Copied Main Content
Every page needs MC. One way to create MC with no time, effort, or expertise is to copy it from another source. Important: We do not consider legitimately licensed or syndicated content to be “copied” (see here for more on web syndication). Examples of syndicated content in the U.S. include news articles by AP or Reuters.
The word “copied” refers to the practice of “scraping” content, or copying content from other nonaffiliated websites without adding any original content or value to users (see here for more information on copied or scraped content).
If all or most of the MC on the page is copied, think about the purpose of the page. Why does the page exist? What value does the page have for users? Why should users look at the page with copied content instead of the original source? Important: The Lowest rating is appropriate if all or almost all of the MC on the page is copied with little or no time, effort, expertise, manual curation, or added value for users. Such pages should be rated Lowest, even if the page assigns credit for the content to another source.
7.4.6 More About Copied Content
All of the following are considered copied content:
- Content copied exactly from an identifiable source. Sometimes an entire page is copied, and sometimes just parts of the page are copied. Sometimes multiple pages are copied and then pasted together into a single page. Text that has been copied exactly is usually the easiest type of copied content to identify.
- Content which is copied, but changed slightly from the original. This type of copying makes it difficult to find the exact matching original source. Sometimes just a few words are changed, or whole sentences are changed, or a “find and replace” modification is made, where one word is replaced with another throughout the text. These types of changes are deliberately done to make it difficult to find the original source of the content. We call this kind of content “copied with minimal alteration.”
- Content copied from a changing source, such as a search results page or news feed. You often will not be able to find an exact matching original source if it is a copy of “dynamic” content (content which changes frequently). However, we will still consider this to be copied content. Important: The Lowest rating is appropriate if all or almost all of the MC on the page is copied with little or no time, effort, expertise, manual curation, or added value for users. Such pages should be rated Lowest, even if the page assigns credit for the content to another source.
Is there a duplicate content penalty within a domain?
As we have seen, through John Mueller’s video, Google spells out that there is no duplicate content penalty.
Google has given us some explicit guidelines when it comes to managing duplication of content. He very clearly states: “We don’t have a duplicate content penalty. It’s not that we would demote a site for having a lot of duplicate content.”
When talking about very similar pages on a domain he goes on to say: “You don’t get penalized for having this kind of duplicate content” but he does say that pages should be unique and provide real value.
I take that to mean that Google doesn’t have to rank duplicate content, it could ignore it. Ignoring duplicate content isn’t the same as having a Google duplicate content penalty. The canonical copy of the content will continue to rank.
If you think about SEO copywriting for e-commerce websites, you’ve got to think that many products have a lot of different types of variants. I once worked with a marketer from Brother who told the story about how new products would become available and the only difference would be the colour. Naturally, there are a lot of products that will be very similar if not near identical and these variants will need the same or very similar copy producing for them. One of the most important things you can do is not to create lots of different pages for those products. Just have a dropdown selection for these kinds of variations in what is essentially the same product.
If you have some pages with variants you’re probably not going to do too bad, but if you have loads of variants you are going to need to do something about it so Google doesn’t start thinking about them as doorway pages.
Canonicalisation is important in this sense. And I’m not talking about canonical tags, I’m talking about putting in place the type of link architecture and site structure that encourages Google to determine which is the most important page on the website about this subject. In this blog, I’m talking about being an SEO consultant and the role of SEO copywriting, so I’ve linked back to those services pages.But I wouldn’t link from the services pages to this blog, other than through related content links.
But remember, creating all that unique content is costly. You don’t have to employ SEO copywriters for everything on your website but it’s a good idea to get someone in the know involved at some point.
When thinking about SEO I often say to people if it feels like you are creating something to manipulate your search results and not to help your users then you need to think again. Duplicate content will only damage your rankings if it’s there to try to deceive Google’s search results.
The problem comes when you’re trying to create content for big sites. You’re naturally going to want to cut corners from time to time. But this is where you start to come unstuck. Wherever possible, your content should be unique written by a real person to overcome this.
And steer clear of automated writing techniques. At least for now. Automated writing is a technique in which a spider crawls the net for copy in a relevant topic and then creates copy for you. I’ve tried it a couple of times but at the moment it’s still not great. And whilst it’s copied in parts and unique in others I still think you can tell it’s computer generated. The thing is, I think it will get there. This is something to keep your eyes on over the next couple of years.
Do categories and tags damage my rankings?
If you’re running on something like WordPress you’ll find that you’ve got duplicated blocks of content on multiple pages across your website. I think Google is good at understanding what’s going on here, but I still choose to use a nofollow either the categories or the tags just to be on the safe side.
In most cases, duplicate content only impacts the page on which the duplicate content is found, however, based on Google’s comments there could be a sitewide impact. Again, this is likely to be a manual penalty so there would have to be some serious wrongdoing!
Ultimately if you don’t want your results to be filtered out of Google’s rankings make sure you’re created really unique content.
Does Near Duplicate Content damage your rankings?
Google’s Gary Illyes defined near duplicate content by saying: “Think of it as a piece of content that was slightly changed, or if it was copied 1:1 but the boilerplate is different.”
This means there are a couple of different ways of describing near duplicate content. The first is to look at those pages of content which are taken from another website and changed slightly, and those where the main content is the same but the header and footer (etc) are changed.
Google has been working on improving the way it detects near duplicate content since around 2008, so I’d suggest if you’re thinking about taking someone else’s work and changing it a bit you’re not going to get very far. What interests me more is what you take another piece of content, then dramatically improve it. Whilst essentially it still being the same piece of content it’s got that 10x content principle.
Why does duplicate content still rank in Google?
So SEO consultants like me bang on about the detrimental impact of duplicate content for your rankings. We encourage people to implement SEO copywriting to create keyword targeted, unique content. But guess what? Duplicate content still ranks in Google.
If you are after a short term benefit, you might even want to work out how you can implement a duplicate content strategy. If you’ve got killer domain authority you might want to implement a strategy that does add layers of category pages with a small amount of spun content.
You’ll find most of these terms will have to be fairly long tail search terms, probably with an element of localisation added in them to really reduce the competitiveness of the queries you’re going after. I’ve experimented with this technique a number of times, both for myself and through clients websites and it’s something that really does work. I look at these like category pages on e-commerce websites. Just be careful. Before you do it have a look and see what the SERPs are like for your desired terms. If they aren’t competitive you’ll probably get away with it. Don’t expect to get away with it for the money terms though.
You might want to consider using this spun content across the category like pages. But make sure you’re creating content that’s in some way better than the content that’s already there and think about putting in place a plan to do some SEO copywriting on the content to make it more unique,
But if your website is like most of the websites on the internet you’re going to need to tread very carefully. It might work at first. But sooner rather than later you are going to be damaged by your duplicate content issue.
I think the key thing to remember here is if you implement a technique you know if a bit spammy and that creates a load of low-quality pages you need to bear in mind that this could create an issue down the road. You need to be able to undo what you’ve done. Because if you get a sitewide penalty this could seriously damage your business. And I’ve seen businesses go under because of changes in Google.
I’d say this is especially true in the content marketing era. The more and more content that’s created, Google is going to have to get more creative in the way it deals with copy. And I’d guess that SEO copywriting is going to become a much more sought after skillset amongst SEO consultants and marketers alike. And like what happened in the content farming era, I expect Google will have to crack down on those websites that just produce poor quality content and create benefits for those websites that consistently create quality content.
Should I noindex duplicate content?
The rel=”nofollow” meta description allows you to block Google’s search engine from indexing your page. This can be handy if you don’t want something to appear in the rankings. You’re going to use this tag for gated content used in lead generation activity. You may also want to use this on your PPC landing pages. However, there are some pages you’ll want to add this page to from an SEO perspective.
As discussed above, you might want to add a noindex tag to some of your tags. Especially where you have tags that are identical or near identical to your categories. Remember when deciding whether to no index tags vs categories, categories are generally more important than tags.
This does kind of fly in the face of some advice Google had given a while back. But I generally think it’s the best way of going about things. Especially if you’re dealing with a pre-existing website. If the site you’re looking at is new, it’s possibly OK to follow Google’s guidelines and noindex the pages containing duplicated content.
Google’s change in stance is because they want to see everything that’s going on on your website and then make the decision. As John Mueller says: “We now recommend not blocking access to duplicate content on your website, whether with a robots.txt file or other methods” Here is a list of meta tags Google understands.
Historically, Google used to say: Block appropriately: Rather than letting our algorithms determine the “best” version of a document, you may wish to help guide us to your preferred version. For instance, if you don’t want us to index the printer versions of your site’s articles, disallow those directories or make use of regular expressions in your robots.txt file.
When discussing duplicate content issues, Google has changed its tune a bit. Now Google is saying: “Google does not recommend blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can’t crawl pages with duplicate content, they can’t automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel=”canonical” link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Search Console.”
As we suggest using our SEO copywriting services for most of the websites we work on, we generally don’t need to worry too much about noindexing duplicated content. I’m generally happy to let Google find the content on the sites we work on. But if there are any sections of copied content, this must be noindexed and all the links need to be nofollowed. I did this for a site where they had been scraping 4 news articles a day for around 10 years!
Google goes on to say: “DC on a site is not grounds for action on that site unless it appears that the intent of the DC is to be deceptive and manipulate search engine results.” I think that’s fairly clear!
Do canonical links help with duplicate content issues?
OK, so we’ve talked a bit about canonical links but are they really going to sort out your duplicate content issues? They’ll help you overcome problems caused by certain searches, content management system idiosyncrasies or product variations where the pages are essentially the same.
You can use the canonical tag to point to a different domain if you’ve copied the content from another domain but my advice would be to noindex that page as well.
However, most of the duplicate is partially duplicated so canonicals won’t help you to deal with that (as the page you’re dealing with is the canon).
How to use Google Search Console to overcome duplicate content issues
One of the key things you can do to in Google Search Console is let Google know how you’d like the site to be indexed. For example, you can go with http://www.example.com or http://example.com. This will help overcome potential duplicate content across the whole site where a self-referencing 301 hasn’t been properly set up. Again a self-referencing canonical will also help here.
Does your CMS Create Duplicate Pages?
I tend to think of myself as a bit of a WordPress SEO consultant. I tend to do my web design and web development in WordPress – although I’ll also use Magento for e-commerce websites I can work on pretty much anything. Every CMS has its own idiosyncrasies.
When discussing content management systems and duplicate content Google says:
“Understand your content management system: Make sure you’re familiar with how content is displayed on your website. Blogs, forums, and related systems often show the same content in multiple formats. For example, a blog entry may appear on the home page of a blog, in an archive page, and in a page of other entries with the same label.
WordPress, Magento, Joomla, Drupal – they all come with slightly different duplicate content (and crawl equity performance) challenges.”
Can you be penalised for syndicated content?
This is a big fat yes. When talking about how Google handles content marketing for link acquisition they said: “Google does not discourage these types of articles in the cases when they inform users, educate another site’s audience or bring awareness to your cause or company. However, what does violate Google’s guidelines on link schemes is when the main intent is to build links in a large-scale way back to the author’s site …
For websites creating articles made for links, Google takes action on this behavior because it’s bad for the Web as a whole. When link building comes first, the quality of the articles can suffer and create a bad experience for users.”
Whenever I talk to people when carrying out outreach marketing, I generally talk to people about creating content that will be of benefit, I never talk to them about getting links back to the website I’m working on if links come great, but they’re a bonus.
I’ve always been a big advocate of adding your content to RSS feeds especially when thinking about long tail searches, it does mean there is duplicated content that’s out there, but I’m sure Google.
But as any good SEO consultant should tell you, if you are trying to create unnatural links, beware!
How does Google deal with thin content?
Google says you shouldn’t build stub pages. Afterall people don’t want to find placeholder pages. If you don’t have the content yet, don’t publish the page. I always think you should be able to build a page within an hour so, if you don’t know what the SEO copy should be, just riff it, then come back to it.
How can you reduce the amount of similar content?
From an SEO copywriting perspective, the answer is really simple. You’ve just got to create more unique content. Simply make the similar content less similar and create a reason for it to exist.
And I think that’s something you always need to bear in mind when creating copy for SEO. It’s something I often speak about as an SEO consultant.
If you can’t find the reason for your content to exist merge it with another piece and put some 301 redirects in place to let Google know what’s going on.
I used to work with a ticket provider that listed events happening closest to that town or city. Invariably some towns and cities were featuring the same listings because of their proximity to each other, or because one town/city didn’t have any events listed. The solution in that scenario is to create a regional listing with information about both specific locations rather than several pages with very similar content.
However, saying that I think the two pages with similar content are reasonable, add value and aren’t intentionally manipulative. So, in reality, you don’t need to worry too much about it. Just be careful not to fall foul of Google Panda which aims at removing poor quality content from the index.
How to deal with duplication caused by page pagination
This is something I’ve seen a lot with WordPress websites. This is something that an SEO consultant should be able to advise on. But don’t get too hung up on it.
Paginated pages happen where there is an archive of posts and the archives page is limited to a number of posts. Think about blog pages. They generally aren’t duplicated pages, however, there is sometimes duplicated page titles and page descriptions which can trigger problems in Google Search Console or through various web checking tools.
I can’t state too much here that you really shouldn’t spend too much time worrying about it. Google understand how pagination works. It’s been dealing with websites with pagination issues for a long time.
There are a few ways to help Google out here. This what they suggest:
- Do nothing. Paginated content is very common, and Google does a good job returning the most relevant results to users, regardless of whether the content is divided into multiple pages.
- Specify a View All page. Searchers commonly prefer to view a whole article or category on a single page. Therefore, if we think this is what the searcher is looking for, we try to show the View All page in search results. You can also add a rel=”canonical” link to the component pages to tell Google that the View All version is the version you want to appear in search results.
- Use rel=”next” and rel=”prev” links to indicate the relationship between component URLs. This markup provides a strong hint to Google that you would like us to treat these pages as a logical sequence, thus consolidating their linking properties and usually sending searchers to the first page.
I generally noindex paginated tags, however, I would try to sort this out if possible. If it looks too difficult or will take a lot of time I would go by Google’s advice and do nothing – I’m generally confident that Google can work out what’s going on with a paginated archive.
How should you use rel=”next” and rel=”prev” ?
Using these tags is about letting Google know how you want the sequence of pages to be indexed – often giving them an indication of importance or timeliness of the content.0
Google gives the following advice. Imagine you have the following pages of paginated content:
- In the <head> section of the first page (http://www.example.com/article-part1.html), add a link tag pointing to the next page in the sequence, like this:
<link rel=”next” href=”http://www.example.com/article-part2.html”>
Because this is the first URL in the sequence, there’s no need to add markup for rel=”prev”.
- On the second and third pages, add links pointing to the previous and next URLs in the sequence. For example, you could add the following to the second page of the sequence:
<link rel=”prev” href=”http://www.example.com/article-part1.html”>
<link rel=”next” href=”http://www.example.com/article-part3.html“>
- On the final page of the sequence (http://www.example.com/article-part4.html>), add a link pointing to the previous URL, like this:
<link rel=”prev” href=”http://www.example.com/article-part3.html”>
Because this is the final URL in the sequence, there’s no need to add a rel=”next” link.
Google treats rel=”previous” as a syntactic variant of rel=”prev”. Values can be either relative or absolute URLs (as allowed by the <link> tag). And, if you include a <base> link in your document, relative paths will resolve according to the base URL.
Some things to note:
- rel=”prev” and rel=”next” act as hints to Google, not absolute directives.
If a component page within a series includes parameters that don’t change the page’s content, such as session IDs, then the rel=”prev” and rel=”next” values should also contain the same parameters. This helps our linking process better match corresponding rel=”prev” and rel=”next” values. For example, the page http://www.example.com/article?story=abc&page=2&sessionid=123 should contain the following:
<link rel=”prev” href=”http://www.example.com/article?story=abc&page=1&sessionid=123″ />
<link rel=”next” href=”http://www.example.com/article?story=abc&page=3&sessionid=123″ />
- rel=”next” and rel=”prev” are orthogonal concepts to rel=”canonical”. You can include both declarations. For example, http://www.example.com/article?story=abc&page=2&sessionid=123 may contain:
<link rel=”canonical” href=”http://www.example.com/article?story=abc&page=2″/>
<link rel=”prev” href=”http://www.example.com/article?story=abc&page=1&sessionid=123″ />
<link rel=”next” href=”http://www.example.com/article?story=abc&page=3&sessionid=123″ />
- If Google finds mistakes in your implementation (for example, if an expected rel=”prev” or rel=”next” designation is missing), we’ll continue to index the page(s), and rely on our own heuristics to understand your content.
Again I’d go with caution. I still feel Google is very good at working this stuff out and it’s easy to get wrong and cause problems further down the line.
Most modern CMS’s don’t do this anymore, but I remember when you would find only the first page in a paginated series earning the canonical juice – because every other page in the series pointed back to it. It’s something to watch out for but most sites don’t work this way anymore and most developers are aware of this potential issue.
Google themselves say not to worry about this too much. They say that they can work out what is time-related posts and they go on to say they are more interested in semantically related content.
How to deal with duplicate content issues caused by website search pages
Google was pretty clear about what to do with the search pages created on your website. They say: “Use the robots.txt file on your web server to manage your crawling budget by preventing crawling of infinite spaces such as search result pages. Keep your robots.txt file up to date.”
Interestingly I can’t find a reference to this in Google Webmasters which may suggest Google has already got better at understanding and discounting these pages.
What tools check for duplicate content?
When first considering how duplicate content is impacting your website you will need to query your whole site to try to work out what’s going on.
The first thing I do is head over to Google’s search and copy and paste a paragraph of content from my website to see what comes up. You’ll be surprised at what you find in doing this! Also, it may be worth putting speech marks around your search query to really hone in on the results you want.
If this doesn’t help then you want to have a look at services such as Copyscape in which you can enter a web address to see if anyone has lifted your content – or identify if your content is too close to that of another website.
If you are looking for duplicate content within your own website you should be looking at software such as Deepcrawl or Screaming Frog (or any number of SEO software out there).
What to do if you find someone has copied your content?
Many people will tell you not to bother doing anything if you find that someone has significantly copied your website, however, I often do act on this.
I first of all, I directly contact the website owner. I let them know I think they’ve used my work and the output is a bit too close to my own. Often you’ll find this has happened completely by accident. We all use bits of content we find around the web for inspiration.
If it’s directly lifted I’m a little less forgiving, but I still give them chance to take it down.
If there is no response within a reasonable amount of time I go report them to Google for copying my content. This generally sorts things out within a short period of time.
If you find evidence of plagiarism, you can file a DMCA or contact Google, but I haven’t ever bothered with that, and many folks have republished my articles over the years.
I even found my article in a paid advert in a magazine before.