Special Offer: FREE Google Mobile-First Indexing Readiness Test

Google PageSpeed

In March 2021 Google began removing all desktop-only sites from its index.

Is your website 100% ready for Mobile-First Indexing? Find Out!

Your report will compare your mobile and desktop pages and show you any discrepancies between SEO signals, content and structured-data markup, and test your site's mobile-friendliness.

Click to get your free test

Google's SMITH algorithm outperforms BERT on long-form text

11 January 2021

SMITH matches passages within the context of the entire content of a document

A recent research paper from Google describes work being done on a Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder designed to match long queries to long content - a task that the BERT algorithm finds difficult.

Quoting from the abstract of the paper:

"In recent years, self-attention based models like Transformers and BERT have achieved state-of-the-art performance in the task of text matching. These models, however, are still limited to short text like a few sentences or one paragraph due to the quadratic computational complexity of self-attention with respect to input text length. In this paper, we address the issue by proposing the Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder for long-form document matching. Our model contains several innovations to adapt self-attention models for longer text input...Our experimental results on several benchmark datasets for long-form document matching show that our proposed SMITH model outperforms the previous state-of-the-art models including hierarchical attention, multi-depth attention-based hierarchical recurrent neural network, and BERT. Comparing to BERT based baselines, our model is able to increase maximum input text length from 512 to 2048."

In plain English: BERT and tools like it rely on semantic matching to identify information in sentences within website content that's related to the language used in a search query. But they have trouble with matching long content to long queries.

Unlike BERT, which is designed to understand words within sentences, SMITH is able to predict further content of a page based on its top content, and also to understand page structure - sections, passages, sentences - and match queries to passages within the entire content of a page.

The researchers have concluded that SMITH is better than BERT at understanding and matching queries to the content of long pages:

“The experimental results on several benchmark datasets show that our proposed SMITH model outperforms previous state-of-the-art Siamese matching models including HAN, SMASH and BERT for long-form document matching...The SMITH model which enjoys longer input text lengths compared with other standard self-attention models is a better choice for long document representation learning and matching.”

It's unknown at this point if Google is actually using SMITH in its ranking algorithm But any tool as promising as it seems to be will likely be used sooner or later.

However, because BERT and SMITH, by design, have different capabilities, Google will likely continue to use both.

If you found this article helpful and would like to see more like it, please share it via the Share This Article link, below.

And if you have questions or comments, you can easily send them to me with the Quick Reply form, below, or send me an e-mail.

David Boggs    - David
View David Boggs's profile on LinkedIn

Google Certifications - David H Boggs
View my profile on Quora
Share This Article

Visit Website
5/5 based on 1 vote.
Show Individual Votes
Tags , , , , , , ,
Related Listings
External Article: https://arxiv.org/abs/2004.12297

Sorry, you don't have permission . Log in, or register if you haven't yet.

Please login or register.

Members currently reading this thread:

Previous Article | Next Article