A recent research paper from Google describes work being done on a Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder designed to match long queries to long content - a task that the BERT algorithm finds difficult.
Quoting from the abstract of the paper:
"In recent years, self-attention based models like Transformers and BERT have achieved state-of-the-art performance in the task of text matching. These models, however, are still limited to short text like a few sentences or one paragraph due to the quadratic computational complexity of self-attention with respect to input text length. In this paper, we address the issue by proposing the Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder for long-form document matching. Our model contains several innovations to adapt self-attention models for longer text input...Our experimental results on several benchmark datasets for long-form document matching show that our proposed SMITH model outperforms the previous state-of-the-art models including hierarchical attention, multi-depth attention-based hierarchical recurrent neural network, and BERT. Comparing to BERT based baselines, our model is able to increase maximum input text length from 512 to 2048."
In plain English: BERT and tools like it rely on semantic matching to identify information in sentences within website content that's related to the language used in a search query. But they have trouble with matching long content to long queries.
Unlike BERT, which is designed to understand words within sentences, SMITH is able to predict further content of a page based on its top content, and also to understand page structure - sections, passages, sentences - and match queries to passages within the entire content of a page.
The researchers have concluded that SMITH is better than BERT at understanding and matching queries to the content of long pages:
“The experimental results on several benchmark datasets show that our proposed SMITH model outperforms previous state-of-the-art Siamese matching models including HAN, SMASH and BERT for long-form document matching...The SMITH model which enjoys longer input text lengths compared with other standard self-attention models is a better choice for long document representation learning and matching.”
It's unknown at this point if Google is actually using SMITH in its ranking algorithm But any tool as promising as it seems to be will likely be used sooner or later.
However, because BERT and SMITH, by design, have different capabilities, Google will likely continue to use both.