Multi-term autocompletion

Edit on GitHub

Term completion is a feature where a user gets suggestions for search terms and matching search results as they type the query. We call a completion multi-term when it can combine terms from different attributes in an open-ended fashion. In the following example, a user entered “fortis” (a brand) and started typing “hammer” (a category): Auto-completion

After completing “hammer”, the search suggests that more terms are found in documents containing both “fortis” and “hammer.”

The Elasticsearch API offers the completion suggester, which works great in many cases but has one major drawback. It can only suggest fixed terms that are saved to Elasticsearch during index time. So in the preceding example, the terms “fortis” and “hammer” as well as both compound variations—for example, “fortis hammer” and “hammer fortis”—must be indexed.

Therefore, we recommend indexing all terms you want to offer autocompletion for (category names, facet values, brands, and other categorial terms) in one field called completion_terms: "completion_terms":

[
  "Fortis",
  "1000",
  "1250",
  "1500",
  "2000",
  "Fäustel",
  "Handwerkzeug",
  "Hammer"
]

The field is analyzed with a very simple analyzer, which is based on the Elasticsearch keyword tokenizer (the analyzer is only used to remove some stop words).

"completion_terms": {
  "type": "string",
  "analyzer": "completion_analyzer"
}

To have products match partial search terms (like “fortis ham”), we apply an edge_ngram filter to a field that contains the same data as completion_terms. Only documents that match the current search query are considered when building autocompletion terms. Autocompletion terms are fetched by aggregating on the completion_terms field and showing terms with the highest number of occurrences. All of this is happening in one query. The aggregation part of that query (the part used for the autocompletion) looks as follows:

"aggs": {
  "autocomplete": {
    "terms": {
      "field": "completion_terms",
      "size": 100
    }
  }
}

The main benefit of this approach is that you can continuously suggest new terms as user types. The main drawback is speed. The out-of-the-box completion suggester is much more optimized for speed.