Data Manipulation with dplyr in R

select
The filter and arrange verbs
arrange
filter
fct_relevel {forcats}
- Filtering and arranging
Mutate
The count verb
Summarizing
top_n
Selecting
rename
transmute
Grouped mutates
Window functions

select

select(data，变量名）

The filter and arrange verbs

arrange

counties_selected <- counties %>%

  select(state, county, population, private_work, public_work, self_employed)

# Add a verb to sort in descending order of public_work

counties_selected %>%arrange(desc(public_work))

filter

counties_selected <- counties %>%

  select(state, county, population)

# Filter for counties in the state of California that have a population above 1000000

counties_selected %>%

  filter(state == "California",

         population > 1000000)

#筛选多个变量

filter(id %in% c("a","b","c"...)) 存在

filter(id %in% c("a","b","c"...)) 不存在

fct_relevel {forcats}

Reorder factor levels by hand

排序，order不好使的时候

f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a"))

fct_relevel(f)

fct_relevel(f, "a")

fct_relevel(f, "b", "a")

# Move to the third position

fct_relevel(f, "a", after = 2)

# Relevel to the end

fct_relevel(f, "a", after = Inf)

fct_relevel(f, "a", after = 3)

# Revel with a function

fct_relevel(f, sort)

fct_relevel(f, sample)

fct_relevel(f, rev)

Filtering and arranging

 counties_selected <- counties %>%

    select(state, county, population, private_work, public_work, self_employed)

>

> # Filter for Texas and more than 10000 people; sort in descending order of private_work

> counties_selected %>%filter(state=='Texas',population>10000)%>%arrange(desc(private_work))

# A tibble: 169 x 6

   state county  population private_work public_work self_employed

   <chr> <chr>        <dbl>        <dbl>       <dbl>         <dbl>

 1 Texas Gregg       123178         84.7         9.8           5.4

 2 Texas Collin      862215         84.1        10             5.8

 3 Texas Dallas     2485003         83.9         9.5           6.4

 4 Texas Harris     4356362         83.4        10.1           6.3

 5 Texas Andrews      16775         83.1         9.6           6.8

 6 Texas Tarrant    1914526         83.1        11.4           5.4

 7 Texas Titus        32553         82.5        10             7.4

 8 Texas Denton      731851         82.2        11.9           5.7

 9 Texas Ector       149557         82          11.2           6.7

10 Texas Moore        22281         82          11.7           5.9

# ... with 159 more rows

Mutate

counties_selected <- counties %>%

  select(state, county, population, public_work)

# Sort in descending order of the public_workers column

counties_selected %>%

  mutate(public_workers = public_work * population / 100) %>%arrange(desc(public_workers))

counties %>%

  # Select the five columns

  select(state, county, population, men, women) %>%

  # Add the proportion_men variable

  mutate(proportion_men = men / population) %>%

  # Filter for population of at least 10,000

  filter(population >= 10000) %>%

  # Arrange proportion of men in descending order

  arrange(desc(proportion_men))

The count verb

counties_selected %>%count(region,sort=TRUE)

counties_selected %>%count(state,wt=citizens,sort=TRUE)

Summarizing

# Summarize to find minimum population, maximum unemployment, and average income

counties_selected %>%summarize(

min_population=min(population),

max_unemployment=max(unemployment),

average_income=mean(income)

)

# Add a density column, then sort in descending order

counties_selected %>%

  group_by(state) %>%

  summarize(total_area = sum(land_area),

            total_population = sum(population),

            density=total_population/total_area) %>%arrange(desc(density))

发现了，归根到底是一种函数关系，看看该怎样处理这个函数比较简单，如果写不出来，可能和小学的时候应用题写不出来有关系

top_n

按照优先级来筛选

# Extract the most populated row for each state

counties_selected %>%

  group_by(state, metro) %>%

  summarize(total_pop = sum(population)) %>%

  top_n(1, total_pop)

Selecting

Using the select verb, we can answer interesting questions about our dataset by focusing in on related groups of verbs.

The colon (

最新文章

Oracle数据库，模糊查询、去重查询

php利用wsh突破函数禁用执行命令（安全模式同理）

checkbox 赋值给js 变量

UITableView的常用属性和cell的内存优化

Spring MVC Checkbox And Checkboxes Example

__block存储类型

jQuery_第三章_工厂函数

文档模型（JSON）使用介绍

linux下 git 安装

OpenVPN client端配置文件详细说明(转)

51nod1237 最大公约数之和

升级NGINX支持HTTP/2服务端推送

JS document.execCommand实现复制功能（带你出坑）

install apache-activemq

找质数|计蒜客2019蓝桥杯省赛 B 组模拟赛（一）

【转载】非对称加密过程详解（基于RSA非对称加密算法实现）

python day03--字符串

仿造mongodb的存储方式存一些假数据

(转) DB2 HADR

Elixir木蚂蚁支付服务器验签名方法

热门文章

git rebase -- 能够将分叉的分支重新合并.

【Java】基于RXTX的Java串口通信

python中class的定义及使用

SpringBoot整合NoSql--（二）MongoDB

css常用样式背景background如何使用

使用TableHasPrimaryKey或TableHasForeignKey来知道表是否有主键或外键

纪中5日T1 1564. 旅游

JavaWeb开发图书管理系统（新本版）源码

剑指offer-面试题12-矩阵中的路径-回溯法

[P5665][CSP2019D2T2] 划分

巴特西

Data Manipulation with dplyr in R

select

The filter and arrange verbs

arrange

filter

fct_relevel {forcats}

Filtering and arranging

Mutate

The count verb

Summarizing

top_n

Selecting

最新文章

热门文章