regular expression

1.R,strongly recommend this blog

The table_info examples are following:

du_mtime_cinema_showtime20190606

du_amap_shoppingmall_indoor_201903_d4

du_amap_shopping_mall_info_2017

du_amap_ship_201909

I want tables which start with"du_amap_" and end with year/month, so in the tables above,

I only want the fourth one. Below, in R, character escape (the backlash character \) should be \\.

"^"means the start and "$" means the end.

^can delete but $ can't because we use "grep" function.

keyword_all <-'^du_amap_.+2\\d{5}$'
keyword_table <- grep(keyword_all, table_info$Tables_in_risingdata, value =T)

str_extract_all is a function which only filters out the characters that fits the pattern.

The below codes extract the last six numbers:year and month

table_name_body <- 'amap_cvs_citycount'
month<-str_extract_all(string=keyword_table[p],pattern='\\d.+')%>% as.character()

* means the pattern in front of it will appear one or more times, | means or, and . means any characters.

Below codes means deleting "du_amap_" and "_201..".

keyword <- gsub('.*amap_|_201.*', '', table_name_body)
shoppingmall_amap$name<-gsub('(\\(.*\\))',"",shoppingmall_amap$name)

latitude and longtitude

\\d{2}[.]\\d+

find Chinese

[\u4E00-\u9FA5\\s]+ #many characters,including space
[\u4E00-\u9FA5]+ #many characters,not including space
[\u4E00-\u9FA5] #one character

2.Python

import re

查找数字,注意这里python转义只有一个\,但R里转义要两个:\\

pattern1 = re.compile(r'\d+')

这里是找表格里每行以(080)开头的数字

pattern1 = re.compile(r'\(080\)\d+')
fixed_line_all=pd.DataFrame()
for i in range(len(calls_pd[0])):
fixed_line=pattern1.findall(calls_pd[0][i])
fixed_line_all=set(fixed_line_all).union(fixed_line)
fixed_line_all=pd.DataFrame(fixed_line_all)

这里提取以7、8、9开头的前四位数

pattern2=re.compile(r'^(7\d{3}|8\d{3}|9\d{3})')
for i in range(len(fixed_line_bind[1])):
mobile_line=pattern2.findall(fixed_line_bind[1][i])
mobile_bang=set(mobile_bang).union(mobile_line)
mobile_bang=pd.DataFrame(mobile_bang)

最新文章

  1. ORACLE 移动数据文件 控制文件 重做日志文件
  2. Js 遍历json对象所有key及根据动态key获取值
  3. Python开发【前端】:HTML
  4. sql语句修改字段长度
  5. Ajax中的get和post两种请求方式的异同
  6. web服务器之nginx与apache
  7. 基础学习总结(六)--getContentRolver()的应用、内容监听者ContentObserver
  8. JDBC链接
  9. C++ Primer 学习笔记_62_重载操作符与转换 --调用操作符和函数对象
  10. android引导页的实现 及跳转到主页面
  11. Python实战之实现简单的登陆系统-作业
  12. 从canvas理解面向对象
  13. 【ASP.NET Core】根据 Content-Type 头部来筛选 Action
  14. Oracle 升级的必要性
  15. PID控制器开发笔记之十一:专家PID控制器的实现
  16. 像素与DPI之间的关系
  17. 每天CSS学习之top/left/right/bottom
  18. mysql存储过程双重循环示例
  19. (转)substring和substr以及slice和splice的用法和区别
  20. Windows未能启动:0xc00000e9错误

热门文章

  1. PS4游戏将登陆PC:一曲属于主机的悲歌
  2. 《Effective Java》笔记45-56:通用程序设计
  3. Picaso完美兼容OkHttp3.3,缓存优化两不误 - Tamic Developer"s Blog
  4. Docker 运行容器 CentOS7 使用systemctl 启动报错 Failed to get D-Bus connection: Operation not permitted
  5. css 进度条的文字根据进度渐变
  6. mongoose .find().limit()返回undefined
  7. Ajax同步和异步的区别?
  8. cocoapods安装以及ZXingObjC的安装
  9. iPhone UIButton图标与文字间距设置【转】
  10. flask 链接mysql数据库 小坑