内容学习自:

Python for Data Analysis, 2nd Edition        

就是这本

纯英文学的很累,对不对取决于百度翻译了

前情提要:

各种方法贴:

  https://www.cnblogs.com/baili-luoyun/p/10250177.html

    内容提要:本次内容主要讲的是pands基本入门

      一:pandas 主要有两种数据结构

        Series,DataFrame

      二: Series 

        1:定义:

  Series是一种类似于一维数组的对象,它由一组数据(各种NumPy数据类型)以及一组与之相关的数据标签(即索引)组成

        2:表现形式

  Series的字符串表现形式为:索引在左边,值在右边。

        3:创建一个一维数组

obj =pd.Series([4,5,6,7,8])        #创建一维数组
print(obj) print(obj.index)
print(obj.values)
>>>>>>>>>
0 4
1 5
2 6
3 7
4 8
dtype: int64
RangeIndex(start=0, stop=5, step=1)
[4 5 6 7 8]

        4:通过索引获得内容

          1>:单索引

obj1 = pd.Series([4,6,-7,-8],index=['d','a','b','c']) #修改索引
print(obj1)
>>>>
#通过索引获得内容
print(obj1['d'])
>>>>

d 4
a 6
b -7
c -8
dtype: int64
4

          2>:多索引

#多索引
print(obj1[['d','a','c']])
>>>>
d 4
a 6
b -7
c -8
dtype: int64
d 4
a 6
c -8
dtype: int64

          3>:布尔过滤

print(obj1[obj1<0])
>>>>

d 4
a 6
b -7
c -8
dtype: int64
b -7
c -8
dtype: int64

          4>:应用乘法

print(obj1*2)
>>>>>>>>>>
d 4
a 6
b -7
c -8
dtype: int64
d 8
a 12
b -14
c -16
dtype: int64

         5>:应用级函数

print(np.exp(obj1))
>>>>>
d 4
a 6
b -7
c -8
dtype: int64
d 54.598150
a 403.428793
b 0.000912
c 0.000335
dtype: float64

        6>:索引的映射关系

print('b'in obj1)
print('e'in obj1) >>>>>
d 4
a 6
b -7
c -8
dtype: int64
True
False

        5 :创建字典的Series:

          1:>创建字典型Series

sdata ={'Ohio':35000,'Texas':71000,'Oregon':16000,'Utah':5000 }
obj3 =pd.Series(sdata)
print(obj3) >>>> Ohio 35000
Texas 71000
Oregon 16000
Utah 5000
dtype: int64

          2:>Series 插入index 和valuse

sdata ={'Ohio':35000,'Texas':71000,'Oregon':16000,'Utah':5000 }
obj3 =pd.Series(sdata)
print(obj3)
# 插入index 和valuse
states =['California','Ohio','Oregon','Texas']
obj4 =pd.Series(sdata,index=states) print(obj4) >>>>>>>>>>>>>> Ohio 35000
Texas 71000
Oregon 16000
Utah 5000
dtype: int64
California NaN
Ohio 35000.0
Oregon 16000.0
Texas 71000.0
dtype: float64

          3>:检测数据是否缺失

l =pd.isnull(obj4)
print(l)
l2 =pd.notnull(obj4)
print(l2) >>>>>>>>>>>>
California True
Ohio False
Oregon False
Texas False
dtype: bool
California False
Ohio True
Oregon True
Texas True
dtype: bool

          4>:赋予名字

obj4.name ='population'
obj4.index.name ='state'
print(obj4)
>>>>>>>>>\
state
California NaN
Ohio 35000.0
Oregon 16000.0
Texas 71000.0
Name: population, dtype: float64

          5>:修改索引,修改索引的名字

obj =pd.Series([4,7,-6,3])
print(obj)
obj.index=['bob','Steve','jeff','Ryan']
print(obj)
>>>>>>>>>
0 4
1 7
2 -6
3 3
dtype: int64 bob 4
Steve 7
jeff -6
Ryan 3
dtype: int64

         三:DataFrame

    一:定义

  

DataFrame是一个表格型的数据结构,它含有一组有序的列,每列可以是不同的值类型(数值、字符串、布尔值等)。DataFrame既有行索引也有列索引,它可以被看做由Series组成的字典(共用同一个索引)。DataFrame中的数据是以一个或多个二维块存放的(而不是列表、字典或别的一维数据结构)
    二:创建

data ={'state':['Ohio','Ohio','Ohio','Nevada','Nevada','Nevada'],
'year':[2000,2001,2002,2001,2002,2003],
'pop':[1.5,1.7,3.6,2.4,2.8,3.2]
}
frame =pd.DataFrame(data)
print(frame)
>>>>>>>>>
state year pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.8
5 Nevada 2003 3.2
           2.1 head() #只获取前5行
print(frame.head())

>>>>>>>

    state  year  pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.8

    

     2.2# 利用抬头排序

print(pd.DataFrame(data,columns=['year','pop','state']))
>>>>>>>>
year pop state
0 2000 1.5 Ohio
1 2001 1.7 Ohio
2 2002 3.6 Ohio
3 2001 2.4 Nevada
4 2002 2.8 Nevada
5 2003 3.2 Nevada

      2.3:拆入数据如果找不到,缺失值,则返回None

# #插入数据如果找不到,缺失值,则返回NaN
#columns 列名
#index 行名
frame2 =pd.DataFrame(data,columns=['year','state','pop','debt'],
index=['one','two','three','four','five','six']
)
print(frame2)
>>>>>>>>>>>>
year state pop debt
one 2000 Ohio 1.5 NaN
two 2001 Ohio 1.7 NaN
three 2002 Ohio 3.6 NaN
four 2001 Nevada 2.4 NaN
five 2002 Nevada 2.8 NaN
six 2003 Nevada 3.2 NaN

   2.4:返回columns 的值

    

print(frame2.columns)
>>>>>>>>
Index(['year', 'state', 'pop', 'debt'], dtype='object')

    2.5:通过标记,或者属性的方式,获取某一列的值

# #单独获取某一列
print(frame2['state'])
print(frame2.year)
print('>>>>>>>>>>>>>>>>>>')
print(frame2['year']) >>>>>>>>>>>>>>
one Ohio
two Ohio
three Ohio
four Nevada
five Nevada
six Nevada
Name: state, dtype: object
one 2000
two 2001
three 2002
four 2001
five 2002
six 2003
Name: year, dtype: int64
>>>>>>>>>>>>>>>>>>
one 2000
two 2001
three 2002
four 2001
five 2002
six 2003
Name: year, dtype: int64

    2.6:loc 属性获取行的所有内容

print(frame2.loc['three'])
>>>>>>>>>>
year 2002
state Ohio
pop 3.6
debt NaN
Name: three, dtype: object

    2.7:通过赋值的方式进行修改

frame2['debt']=16.5
print(frame2)
>>>>>>>>
year state pop debt
one 2000 Ohio 1.5 16.5
two 2001 Ohio 1.7 16.5
three 2002 Ohio 3.6 16.5
four 2001 Nevada 2.4 16.5
five 2002 Nevada 2.8 16.5
six 2003 Nevada 3.2 16.5

   2.8:以 范围内容生成赋值

frame2['dabt']=np.arange(6.)
print(frame2)
>>>>>>>>>>

year state pop debt dabt
one 2000 Ohio 1.5 NaN 0.0
two 2001 Ohio 1.7 NaN 1.0
three 2002 Ohio 3.6 NaN 2.0
four 2001 Nevada 2.4 NaN 3.0
five 2002 Nevada 2.8 NaN 4.0
six 2003 Nevada 3.2 NaN 5.0

  2.9:以Series的方式进行赋值

print(frame2)
print(">>>>>>>>>>>>")
val =pd.Series([-1.2,-1.5,-1.7],index =['two','four','five'])
print(val)
print(">>>>>>>>>>>>>>")
frame2['debt'] =val
print(frame2)
>>>>>>>>>>>>>>>>>>>>>
year state pop debt
one 2000 Ohio 1.5 NaN
two 2001 Ohio 1.7 NaN
three 2002 Ohio 3.6 NaN
four 2001 Nevada 2.4 NaN
five 2002 Nevada 2.8 NaN
six 2003 Nevada 3.2 NaN
>>>>>>>>>>>>
two -1.2
four -1.5
five -1.7
dtype: float64
>>>>>>>>>>>>>>
year state pop debt
one 2000 Ohio 1.5 NaN
two 2001 Ohio 1.7 -1.2
three 2002 Ohio 3.6 NaN
four 2001 Nevada 2.4 -1.5
five 2002 Nevada 2.8 -1.7
six 2003 Nevada 3.2 NaN

  2.10:布尔型运算

frame2['eastern'] =frame2.state =='Ohio'
print(frame2)
>>>>>>>>
year state pop debt eastern
one 2000 Ohio 1.5 NaN True
two 2001 Ohio 1.7 NaN True
three 2002 Ohio 3.6 NaN True
four 2001 Nevada 2.4 NaN False
five 2002 Nevada 2.8 NaN False
six 2003 Nevada 3.2 NaN False

  

  

最新文章

  1. elasticsearch一些常用的配置
  2. Linux中shell脚本自动输入密码
  3. iptables详细说明
  4. PowerDesigner导出SQL时自动生成注释
  5. 1 Spring MVC 原理
  6. Ecplise + Xdebug 一波三折终于能单步调试了
  7. 设计模式-结合Android代码
  8. 修改sqlplus提示符
  9. 转:Top 10 Algorithms for Coding Interview
  10. SPFA 最短路径打印方法
  11. 3856: Monster
  12. ASP.NET Core中使用IOC三部曲(一.使用ASP.NET Core自带的IOC容器)
  13. python-布尔表达式
  14. Polar Code(1)极化码SC译码迭代公式的理解
  15. Vijos1982 NOIP2015Day2T2 子串 substring 动态规划
  16. Zabbix监控系统部署:基本功能测试
  17. 【AGC005F】Many Easy Problems (NTT)
  18. MVC+EF+PagedList+调用通用存储封装+多表联合信息展示分页+存储过程分页
  19. mysql数据库恢复
  20. 20159212杨翔实验一(熟悉Java开发环境)实验报告

热门文章

  1. sql unsigned
  2. ios 获取当前wifi名称
  3. yum 系列(一) yum 和 rpm 常用命令
  4. oracle 分页其实一个子查询就好了,没理解的自然只能见样学样
  5. Web图片编辑控件升级日志-Xproer.ImageEditor
  6. 看图说说JVM新生代垃圾收集器
  7. Linux 基础教程 28-nc命令
  8. MySQL 笔记整理(20) --幻读是什么,幻读有什么问题?
  9. urlrewrite重写url(转)
  10. 原创:MVC 5 实例教程(MvcMovieStore 新概念版:mvc5.0,EF6.01) - 4、创建数据上下文和数据实体模型