Notes from the Google Webmaster Hangout on 23rd February, when John Mueller discusses language markup, unique content and the rolling Panda algorithm update.
Anchor Text Counts Towards Target Page Relevancy
Google does include anchor text in links to a page for ranking relevancy.
Google Ignores Language Tags
Apart from hreflang, Google doesn’t use any language HTML tags to detect the language of page, they use the text to detect it.
They are sill used by Bing and translation services.
If you use the wrong language tag, Google will just ignore it and it won’t impact search results.
If the languages used in your hreflang tags doesn’t match the target content, they will probably be ignored.
Google Displays Translate Links for Mixed Language Content
Google may show a translate link in search results if your page contains some foreign language text on the page.
You can use the notranslate meta tag to prevent this, but it also prevents Chrome translate function from working automatically.
Add Unique Content to Pages with Duplicate Content
If multiple sites are using the same affiliate/product content, Google will try to establish the original source, and allow a few other version to rank. They will be treated as separate sites which can rank independently for different queries. The suggestion is that this is OK provided you include some unique content.
Later he also suggests this for pages with embedded videos.
Discussed again a bit later
Google Won’t Index URLs with Default Ports
URLs with default port number (e.g. http://www.example.com/:80), are not duplicate so don’t need to worry about them.
443 is the default port for HTTP.
Google Identifies Unique Sites on Subdomains
Google has a process to recognise when separate sites are hosted on subdomains, and when they are used for a single website.
They will then apply penalties and malware detection warnings per site, but this can affect the entire domain if they haven’t recognised them as different sites.
Google Don’t Respond to Spam Reports
Google doesn’t respond to invididual spam reports, but will use them to assess new algorithms which can be applied to all websites.
Pages May Rank Despite Spam
Google might be ignoring some spammy elements of a page, but continue to rank it well based on other signals. But don’t assume a spammy technique is always helping.
Panda Real-Time Continuously Rolling Indiscrete Update Clarification
“The Panda algorithm Is essentially something that is rolling more or less continuously in the sense that one update roll outs, and the fixed update gets prepared and starts rolling out as well. So it’s not that you’d see this discrete point in time where you could say this was that update, this was the next update, it’s rolling continuously. So from there it doesn’t make sense to look at specific dates.”
So that’s cleared that up.
PDFs Crawled Less than HTML
PDFs won’t be crawled as often as HTML pages because the content is generally more stable.
Google Limits Indexed Pages
Google has some kind of limit on the amount of content it will index per site, which may explain why some content such as PDFs isn’t indexed on larger sites.
External Sitemaps Robots.txt References
Google will use Sitemaps hosted on an external domain if they are referenced in the robots.txt.
External Links Don’t Influence Rankings
Linking to external websites doesn’t directly affect your own rankings.
Google Replaces Poor Descriptions
If Google is replacing your description tags with snippets, it won’t affect your rankings as descriptions are not used for this, but it says Google doesn’t think your descriptions are relevant for the page.
The reasons may be
– Contains spammy keywords
– Too vague
– Too promotional
– Doesn’t match the content on the page
Google Detects Pages with User Generated Content Spam
Google have an algorithm for detecting pages which contain any user generated spam, and may ignore the content on those pages.