Incorporating static relevance signals into the scoreedit
Many domains have static signals that are known to be correlated with relevance. For instance PageRank and url length are two commonly used features for web search in order to tune the score of web pages independently of the query.
There are two main queries that allow combining static score contributions with
textual relevance, eg. as computed with BM25:
- script_score
query
- rank_feature
query
For instance imagine that you have a pagerank
field that you wish to
combine with the BM25 score so that the final score is equal to
score = bm25_score + pagerank / (10 + pagerank)
.
With the script_score
query the query would
look like this:
GET index/_search { "query": { "script_score": { "query": { "match": { "body": "elasticsearch" } }, "script": { "source": "_score * saturation(doc['pagerank'].value, 10)" } } } }
|
while with the rank_feature
query it would
look like below:
GET _search { "query": { "bool": { "must": { "match": { "body": "elasticsearch" } }, "should": { "rank_feature": { "field": "pagerank", "saturation": { "pivot": 10 } } } } } }
|
While both options would return similar scores, there are trade-offs:
script_score provides a lot of flexibility,
enabling you to combine the text relevance score with static signals as you
prefer. On the other hand, the rank_feature
query only
exposes a couple ways to incorporate static signals into the score. However,
it relies on the rank_feature
and
rank_features
fields, which index values in a special way
that allows the rank_feature
query to skip
over non-competitive documents and get the top matches of a query faster.