Technical Issues Can Cause Content to be Indexed on Scraper Sites Before Original Site
If content on scraper sites is appearing in the index from those sites before the original site, this could be due to technical issues on the original site. For example, Googlebot might not be able to find main hub pages or category pages or may be getting stuck in crawl traps by following excess parameter URLs.
Google’s Algorithms Should be Able to Detect & Prioritize Original Content From Near Duplicate Versions
Google’s algorithms will ideally be able to detect spun content which has been rewritten from another source and see the original content as more valuable.
Having Multiple Pages for Different Product Variations Isn’t a Problem
John recommends two approaches for products with multiple variations, either ensure each individual page is indexed or have one main product page with each variation option available. The best method depends on the size of the site and the uniqueness of each variation.
GSC Data Across Duplicate Language Versions Will Only be Shown for Selected Canonical
Even if you have hreflang set up correctly, Google can fold together similar language version pages and choose one to index, meaning that data in Google Search Console will only be shown for the one selected canonical page.
Having Sections Of Duplicate Content on A Site Is Fine
Google will not demote your site if you have sections of duplicate content across several different pages. Instead they will recognise the content is contained on several pages and try to filter it out within search results and show just one page.
HTML & AMP Pages Containing the Same Content Will Not Be Negatively Seen As Duplicate Content
Having the same content on both HTML and AMP pages is not negatively seen as duplicate content by Google. However, it can lead to competition between the pages within search results. To avoid this, John recommends concentrating the value of both pages using the relevant rel alternate link and canonical tag.
International Websites on Separate Subdomains Will Not Be Penalized for Duplicate Content
Google will not penalize international websites that exist on separate subdomains if they have duplicate content. Instead, it will recognise the pages are identical and in most cases index both, but will only pick one URL to show in search results.
Pages with Internally Duplicated Content Are Indexed Separately but Folded Together in Search
Google will index pages with duplicate blocks of text separately but will work out which of those pages is most relevant to show for each query and will show just one of them in the search results.
Directory Sites Should Have Unique, Valuable Content to Perform in Search
To rank in search, directories should provide unique information that would make users want to visit that site instead of going straight to the website of the business that they want contact details for.