Since Google’s Panda update, webmasters have been trying to avoid a “duplicate content penalty.” You still need to take the issue of duplicate content seriously — it affects your whole site, not just the pages that host it. While Google has said they don’t penalize pages with duplicate content, if you’ve got a lot of it you can seriously hinder your ability to rank in search results.
If you’re not careful, you could be inadvertently publishing duplicate content a few different ways:
- Multiple URLs pointing to the same content
- Multilingual versions of the same page
- Paginated content
The good news here is that there are some on-page methods you can use to get rid of duplicate content on your site. They are known as rel="canonical”, hreflang and rel=”prev”/rel=”next” (pagination).
It’s well worth your time and effort to implement these fixes to make your site more findable on search engines. Let’s get started!
hreflang: Find Your Targeted Audience
What is it?
Introduced by Google in 2011, the hreflang tag lets you tell a search engine that a page is related to other pages in different languages and/or regions. If your website is https://example.com, and you’ve got the same page in Spanish on https://example.com/es, use the hreflang tag to tell search engines to serve that page to Spanish-speaking searchers.
It’s important to note that hreflang is a factor, not a directive, in search results. So if you have pages that are too similar (like English pages targeting the US and Canada) you run the risk of the wrong version ranking for a search term. Multilingual sites need to be a part of your overall marketing strategy.
How do I do it?
The hreflang annotation is implemented in the
section of an HTML page. For non-HTML pages the tag can be placed in the HTTP header. When done correctly the hreflang tag should look like this:- HTML: <link rel="alternate” hreflang=”en” href=”http://ift.tt/2aKQLl1;
- HTTP: link: <http://ift.tt/2ayDMj4;; rel="alternate”; hreflang=”en”
You must include links to every version of your page. If you have English, Spanish and French copies, put links to all three in the page
.If you have two or more pages in the same language but targeted to different geographies (say, the US, Canada and UK) you can extend the hreflang variable to include the country code like this:
- <link rel="alternate” hreflang=”en-us” href=”http://ift.tt/2aKQLl1;
- <link rel="alternate” hreflang=”en-ca” href=”http://ift.tt/2aKR2Vf;
- <link rel="alternate” hreflang=”en-gb” href=”http://ift.tt/2ayE09N;
If you’ve got a non-HTML page in multiple languages, separate each hreflang annotation using commas like this:
- link: <http://ift.tt/2ayDMj4;; rel="alternate”; hreflang=”en-us”,
- link: <http://ift.tt/2aKRoeG;; rel="alternate”; hreflang=”en-ca”,
- link: <http://ift.tt/2ayDt7Y;; rel="alternate”; hreflang=”en-gb”,
There’s also a third option to implement hreflang tags: your XML sitemap. Instead of adding markup to your pages, include the foreign language versions of your URLs in your sitemap. Just like with the other annotations, include a URL for each language.
<url>
<loc>http://ift.tt/2ayEbCb;
<xhtml:link
rel="alternate"
hreflang="en-us"
href="http://ift.tt/WjceXj;
/>
<xhtml:link
rel="alternate"
hreflang="en-ca"
href="http://ift.tt/2ayE75y;
/>
<xhtml:link
rel="alternate"
hreflang="en-gb"
href="http://ift.tt/2aKR27B;
/>
</url>
What could go wrong?
A common problem when inserting hreflang annotations are “Return Tag Errors.” These errors come from hreflang annotations that don’t link to each other. Annotations are a two-way street; if your English page links to your German page, your German page must link back to your English page. Possibly the most common Return Tag Error is omitting the self-reference — your English page needs to link to itself.
To check for Return Tag Errors, look in Google Search Console’s International Targeting data under Search Traffic. This will tell you how many hreflang tags Google found and how many have errors.
Another common problem implementing hreflang annotations is incorrect language or country codes. The hreflang value must be in ISO 639–1 format for language and ISO 3166–1 Alpha 2 format for country. Using ‘uk’ for the United Kingdom is the most common culprit; in this system the value should be ‘gb’ for Great Britain.
Note that your hreflang value must start with the language code and that region targeting is limited to countries — you can’t target the European Union or North America, for example.
rel="canonical”: Which Page is the Original?
What is it?
If you use a content management system, syndicate content or have an e-commerce shopping site, it’s easy to wind up with multiple URLs or domains all pointing to the same content. To combat this, tell search engines where they are to find the original using the rel="canonical” tag. When a search engine sees this annotation, they know the current page is a copy and where to find the canonical content.
How do I do it?
Start by deciding which URL you want to be canonical. In general, you should pick your best optimized URL as your canonical URL. Take it a step further and set your preferred domain in Google Search Console.
A nice benefit of setting a preferred domain is that search engines will take this into account when crawling links to your page; links to example.com will pass link juice to your preferred domain of www.example.com. The same goes for other indexing factors, such as trust and authority.
To properly tell a search engine that content is copied from your canonical URL, place the rel="canonical” annotation in the
of your page. It should look like this:- <link rel="canonical” href=”http://ift.tt/2aKQLl1;
If you’ve got a non-HTML version of a document (like a PDF available for download) you can include the canonical reference in the HTTP header like this:
- Link: <http://ift.tt/2aKRhiY;; rel=”canonical”
What could go wrong?
While the rel="canonical” tag seems simple enough to implement, getting it wrong can have a major impact on your search performance. There are a few common misapplications of canonicalization that you need to be sure to avoid:
Paginated content all pointing to page one: When you add the canonical annotation to paginated content match your page 1 URL to your canonical page 1 URL, page 2 to page 2, etc. We’ll cover this in a bit more detail later.
Canonical URLs that are not 100% exact matches: If your site uses protocol relative links, leaving off http/https will still result in search engines seeing duplicate content at those two addresses. Always make your preferred URLs 100% exact matches.
Pointing to canonical URLs that return a 404 error: Search engines will ignore tags that point to a dead page.
Multiple canonical tags: Search engines only support one rel="canonical” annotation per page. You can end up with multiple when a webmaster copies a page template that already includes rel=”canonical” or a plugin inserts a rel=”canonical” automatically. In cases of multiple canonical tags, Google will simply ignore all of them.
rel="prev”/”next”: Avoid Duplicate Title Tags & Meta Descriptions
What is it?
There are a few reasons you might want to break your content into multiple pages: you’ve got a long article or series of articles, your retail site has a long list of products within a category or you’re hosting a discussion forum with a lot of large comment threads. Paginated content generally won’t cause many problems with duplicate content in the body of a page, but will affect one very important aspect of your on-page SEO: title tags and meta descriptions. You can find any instances of duplicate titles and descriptions in Search Console in the HTML Improvements report under Search Appearance.
To tell search engines that you’ve got paginated content use the rel="prev” and rel=”next” annotations. These tags tell Google that your pages make up a connected series, consolidating their index properties (links, authority, etc.) and sending search visitors to page one.
How do I do it?
Continue reading %Avoid Duplicate Content with These Three Techniques%
by Greg Snow-Wasserman via SitePoint
No comments:
Post a Comment