Field-length norm

How long is the field? The shorter the field, the higher the weight. If a term appears in a short field, such as a title field, it is more likely that the content of that field is about the term than if the same term appears in a much bigger body field. The field length norm is calculated as follows:

norm(d) = 1 / √numTerms 

The field-length norm (norm) is the inverse square root of the number of terms in the field.

While the field-length norm is important for full-text search, many other fields don’t need norms. Norms consume approximately 1 byte per string field per document in the index, whether or not a document contains the field. Exact-value not_analyzed string fields have norms disabled by default, but you can use the field mapping to disable norms on analyzed fields as well:

PUT /my_index
{
"mappings": {
"doc": {
"properties": {
"text": {
"type": "string",
"norms": { "enabled": false }
}
}
}
}
}

This field will not take the field-length norm into account. A long field and a short field will be scored as if they were the same length.

For use cases such as logging, norms are not useful. All you care about is whether a field contains a particular error code or a particular browser identifier. The length of the field does not affect the outcome. Disabling norms can save a significant amount of memory.

Putting it together

These three factors—term frequency, inverse document frequency, and field-length norm—are calculated and stored at index time. Together, they are used to calculate the weight of a single term in a particular document.

When we refer to documents in the preceding formulae, we are actually talking about a field within a document. Each field has its own inverted index and thus, for TF/IDF purposes, the value of the field is the value of the document.

When we run a simple term query with explain set to true (see Understanding the Score), you will see that the only factors involved in calculating the score are the ones explained in the preceding sections:

PUT /my_index/doc/1
{ "text" : "quick brown fox" } GET /my_index/doc/_search?explain
{
"query": {
"term": {
"text": "fox"
}
}
}

The (abbreviated) explanation from the preceding request is as follows:

weight(text:fox in 0) [PerFieldSimilarity]:  0.15342641 

result of:
fieldWeight in 0 0.15342641
product of:
tf(freq=1.0), with freq of 1: 1.0

        idf(docFreq=1, maxDocs=1):           0.30685282 

        fieldNorm(doc=0):                    0.5 

The final score for term fox in field text in the document with internal Lucene doc ID 0.

The term fox appears once in the text field in this document.

The inverse document frequency of fox in the text field in all documents in this index.

The field-length normalization factor for this field.

Of course, queries usually consist of more than one term, so we need a way of combining the weights of multiple terms. For this, we turn to the vector space model.

 

 

最新文章

  1. YbSoftwareFactory 代码生成插件【二十二】:CMS基础功能的实现
  2. Spring Security笔记:HTTP Basic 认证
  3. JVM的基本结构
  4. .NET + OpenCV & Python + OpenCV 配置
  5. 用USB安装Linux系统(centos7)
  6. Android入门学习:Android 系统框架及应用程序执行过程
  7. 浙江工商大学15年校赛I题 Inversion 【归并排序求逆序对】
  8. CodeForces 610D Vika and Segments
  9. Nlpir Parser灵玖文本语义挖掘系统数据采集
  10. JAVA_SE基础——18.方法的递归
  11. spring注解第01课 @Configuration、@Bean
  12. python WebDriver如何处理右键菜单
  13. 关于django1.8版本的静态文件配置
  14. Go linux 实践4
  15. JSP基本语法总结【1】(jsp工作原理,脚本元素,指令元素,动作元素)
  16. Bash and a Tough Math Puzzle CodeForces - 914D (线段树二分)
  17. caffe Python API 之图片预处理
  18. JAVASCRIPT数据类型(值类型-引用类型-类型总览)
  19. nginx 两台机器 出现退款失败问题
  20. 一个朋友 js图表开发问题 用 c和 js 解决

热门文章

  1. js 扩展replaceAll
  2. vue 父子通信过程
  3. 文件I/O操作为什么叫输入/出流
  4. Android · SQLiteOpenHelper实例PrivateContactsDBHelper
  5. Mongo-Hadoop
  6. 自定义带下划线文本的UIButton
  7. C#中的Dictionary字典类常用方法介绍
  8. oracle 表压缩技术
  9. Paxos算法学习
  10. 【转】安卓逆向(一)--Smali基础