lemma与stem的区别

Difference between stem and lemma

先从wikipedia上看看什么是stem,什么是lemma?

Lemma(morphology):In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words(headword). In English, for example, run, runs, ran and running are forms of the same lexeme, with run as the lemma. Lexeme, in this context, refers to the set of all the forms that have the same meaning, and lemma refers to the particular form that is chosen by convention to represent the lexeme. In lexicography, this unit is usually also the citation form or headword by which it is indexed. Lemmas have special significance in highly inflected languages such as Turkish and Czech. The process of determining the lemma for a given word is called lemmatisation.

word stem:In linguistics, a stem is a part of a word. The term is used with slightly different meanings. In one usage, a stem is a form to which affixes can be attached. Thus, in this usage, the English word friendships contains the stem friend, to which the derivational suffix -ship is attached to form a new stem friendship, to which the inflectional suffix -s is attached. In a variant of this usage, the root of the word (in the example, friend) is not counted as a stem.In a slightly different usage, which is adopted in the remainder of this article, a word has a single stem, namely the part of the word that is common to all its inflected variants.Thus, in this usage, all derivational affixes are part of the stem. For example, the stem offriendships is friendship, to which the inflectional suffix -s is attached.

Difference between stem and lemma:

Stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the word. For example, from "produced", the lemma is "produce", but the stem is "produc-." This is because there are words such as production.In linguistic analysis, the stem is defined more generally as the analyzed base form from which all inflected forms can be formed. When phonology is taken into account, the definition of the unchangeable part of the word is not useful, as can be seen in the phonological forms of the words in the preceding example: "produced" /prəˈdjuːst/ vs. "production" /prəˈdʌkʃən/.Some lexemes have several stems but one lemma. For instance "to go" (the lemma) has the stems "go" and "went". (The past tense is based on a different verb, "to wend". The "-t" suffix may be considered as equivalent to "-ed".)

从上面我们可以看出,lemma一般是指词型的还原,一般就是一个结果,而stem是词干,根据不同的定义略微不同,下面我们看下使用程序分析的结果,其中lemma使用stanford的NLP工具,stem使用NLTK包中的stem(snow,porter,lancaster三个算法)

原句:This work shows that single and double Ala substitutions of His18 and Phe21 in IL-8 reduced up to 77-fold the binding affinity to IL-8 receptor subtypes A (CXCR1) and B (CXCR2) and to the Duffy antigen.

lemma:this work show that single and double alum substitution of his18 and phe21 in il-8 reduce up to 77-fold the binding affinity to il-8 receptor subtype a -lrb- cxcr1 -rrb- and b -lrb- cxcr2 -rrb- and to the duffy antigen .

snowstem:this work show that singl and doubl ala substitut of his18 and phe21 in il-8 reduc up to 77-fold the bind affin to il-8 receptor subtyp a ( cxcr1 ) and b ( cxcr2 ) and to the duffi antigen .

porterstem:Thi work show that singl and doubl Ala substitut of His18 and Phe21 in IL-8 reduc up to 77-fold the bind affin to IL-8 receptor subtyp A ( CXCR1 ) and B ( CXCR2 ) and to the Duffi antigen .

lancasterstem:this work show that singl and doubl ala substitut of his18 and phe21 in il-8 reduc up to 77-fold the bind affin to il-8 receptor subtyp a ( cxcr1 ) and b ( cxcr2 ) and to the duffi antigen .

最新文章

  1. 设计模式之美:Dynamic Property(动态属性)
  2. 200、301、302、304、404等HTTP状态码
  3. some basic graph theoretical measures
  4. Codeforces 55D
  5. Buffer -nodejs
  6. 使用SeaJS实现模块化JavaScript开发(新)
  7. 用Global.asax实现伪静态.
  8. 利用MyEclipse的ant插件生成Hibernate的映射文件
  9. hadoop笔记之Hive的数据存储(外部表)
  10. CDH 无法查看history log
  11. Elasticsearch介绍,一些概念的笔记
  12. Round #2
  13. 在pycharm中查看内建函数源码
  14. 正睿 2019 省选附加赛 Day10
  15. Python 编程核心知识体系(REF)
  16. 案例源码解读及思路:RabbitMQ在springboot中的配置
  17. Lua Linux环境下安装
  18. 不同三级域名与二级域名之间互相共享Cookie
  19. 对C#Chart控件使用整理
  20. iOS开发技巧 - 使用Alerts和Action Sheets显示弹出框

热门文章

  1. POJ1751 Highways
  2. SharePoint 2013 安装.net framework 4.5已经存在更高版本的解决方案
  3. 二、docker入门
  4. SharePoint服务器端对象模型 之 使用CAML进行数据查询(Part 4)
  5. 初步认识dubbo--小案例
  6. influxDB---数据库操作SQL
  7. docker openvas
  8. python并发编程&多进程(一)
  9. 我的Android进阶之旅------>Android利用温度传感器实现带动画效果的电子温度计
  10. PCL+Qt+VS可视化点云