This is an introduction to pandas categorical data type, including a short comparison with R’s factor.
Categoricals are a pandas data type, which correspond to categorical variables in statistics: a variable, which can take on only a limited, and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood types, country affiliations, observation time or ratings via Likert scales.

In contrast to statistical categorical variables, categorical data might have an order (e.g. ‘strongly agree’ vs ‘agree’ or ‘first observation’ vs. ‘second observation’), but numerical operations (additions, divisions, ...) are not possible.

All values of categorical data are either in categories or np.nan. Order is defined by the order of categories, not lexical order of the values. Internally, the data structure consists of a categories array and an integer array of codes which point to the real value in the categories array.

The categorical data type is useful in the following cases:

  • A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory, see here.
  • The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order, see here.
  • As a signal to other python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types).

概括:Categorical Data数据类型就类似“性别”、“血型”、“班级”等,只能是一些固定的“值“。Categorical Data可以有不同级别,但是不能用于数值计算。

最新文章

  1. MyCat源码分析系列之——结果合并
  2. MySql简易配置
  3. Dertouzos (5750)
  4. 【HTTP劫持和DNS劫持】实际JS对抗
  5. 【原】iOS动态性(四):一行代码实现iOS序列化与反序列化(runtime)
  6. 转:logBack.xml配置路径
  7. Python 监控nginx服务是否正常
  8. Spring学习笔记—最小化Spring XML配置
  9. [PE结构分析] 6.IMAGE_SECTION_HEADER
  10. poj1584A Round Peg in a Ground Hole
  11. JDBC 元数据 事务处理
  12. 分享php中四种webservice实现的简单架构方法及实例[转载]
  13. 使用泛型 类型“System.Collections.Generic.IEnumerator<T>”需要 1 个类型参数
  14. js 中对象--对象结构(原型链基础解析)
  15. Difference between LINQ to SQL and LINQ to Entity(DataContext and DbContext)
  16. NOIP 2001 提高组 题解
  17. 编辑器之神-vim的使用
  18. Beta冲刺四
  19. Pipenv和Python虚拟环境
  20. 将一个符合URL格式的字符串变成链接

热门文章

  1. 克隆虚拟机启动网卡提示错误 Device eth0 does not seem to be present, delaying initialization
  2. leetcode-mid-Linked list-2 Add Two Numbers
  3. ASM磁盘组删除磁盘
  4. JXLS (Excel导入、导出工具使用)
  5. python 调用C++ DLL,传递int,char,char*,数组和多维数组
  6. 阶段1 语言基础+高级_1-3-Java语言高级_04-集合_06 Set集合_3_HashSet集合存储数据的结构
  7. JMeter性能测试入门-不同类型线程组的使用
  8. 史上最全的ORACLE基础教程
  9. RocketMQ安装部署及整合Springboot
  10. 离线安装 Cloudera ( CDH 5.x )(转载)