以下案例是使用hive分析nginx的访问日志案例,其中字段分隔通过正则表达式匹配,具体步骤如下:

日志格式:
192.168.5.139 - - [08/Jun/2017:17:09:12 +0800] "GET //oportal/static/ui/layer/skin/default/icon.png HTTP/1.1" 200 9905 http://192.168.100.126//oportal/static/ui/layer/skin/layer.css "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" -
192.168.5.139 - - [08/Jun/2017:17:09:25 +0800] "GET //oportal/page/homepage/images/icon-02.png HTTP/1.1" 200 1322 http://192.168.100.126//dsfdsal/page/homepage/css/indet.css "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" -
192.168.5.139 - - [08/Jun/2017:17:09:25 +0800] "GET /dsfdortal/page/waittodo/waittodo.jsp?registesfsdame=%25E7%25BB%25BC%25E5%sdf2590%2588%25E9%25A2%2584%25E7%25AE%25sdf97&registerAsdfsdppid=bsdfsdas,ssdfsdfpf,bsdfsdgt,insdfsddi,hqrsdfdseport,hqosdfa,hqsfdsbi&resdfgisterId=FD748AA3sd82851A37F1693D3880C844EF&allviewsdfnum=10&appSource=undefined&tokenid=5728A0ED7998CC84B88FE8717A33FAB8aK79UkfS&waittodoNums=0&showway=0 HTTP/1.1" 200 3121 http://192.168.100.126//fposdfsdrtal/page/homdsfdepage/homepage.jsp?tokenid=5728A0ED7998CC84B88FE8717A33FAB8aK79UkfS "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" -
192.168.5.139 - - [08/Jun/2017:17:09:25 +0800] "GET //fposdfgfrtal/page/waittodo/css/db_index.css HTTP/1.1" 200 6310 http://192.168.100.126/fpdsfdsfortal/page/waittodo/waittodo.jsp?registerName=%25E7%25BB%25BC%25E5%2590%2588%25E9%25A2%2584%25E7%25AE%2597&registerAppid=bas,spf,bgt,indi,hqreport,hqoa,hqbi&registerId=FD748AA382851A37F1693D3880C844EF&allviewnum=10&appSource=undefined&tokenid=5728A0ED7998CC84B88FE8717A33FAB8aK79UkfS&waittodoNums=0&showway=0 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" -

正则匹配:
测试网站:http://wpjam.qiniudn.com/tool/regexpal/

([^ |^\n]*) ([^ ]*) ([^ ]*) (\[.*\]) (\".*?\") (-|[0-9]*) (-|[0-9]*) (\".*?\") (\".*?\") (-)

建立原表:

drop table if exists chavin.nginx_access_log;
CREATE TABLE chavin.nginx_access_log(
   host STRING,
   identity STRING,
   user STRING,
   time STRING,
   request STRING,
   status STRING,
   size STRING,
   referer STRING,
   agent STRING,
   other STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
   "input.regex" = "([^ |^\n]*) ([^ ]*) ([^ ]*) (\\[.*\\]) (\".*?\") (-|[0-9]*) (-|[0-9]*) (\".*?\") (\".*?\") (-)",
   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s"
)
STORED AS TEXTFILE;

清除access.log日志中空白行:

sed -i '/^$/d' access.log

加载access.log日志数据到hive中:

load data local inpath '/opt/datas/access.log' overwrite into table chavin.nginx_access_log;

查询数据,进行验证:

select * from chavin.nginx_access_log limit 5;

接下来可以建立业务子表,进行定向分析了。

最新文章

  1. logstash+elasticsearch+kibana管理日志(安装)
  2. 谈谈软件项目的dependency
  3. python练习程序(批量重命名)
  4. html+css 知识整理
  5. SelectionKey理解(总结)
  6. Android开发UI之个性化控件之Menu
  7. rsyslog imfile 模块说明
  8. codility上的问题(15) Xi 2012
  9. (中等) POJ 1191 棋盘分割,DP。
  10. form表单传递下拉框的Value和Text值,不适用Jquery传递
  11. UPYUN云服务体验计划,阅读神器Kindle、LaCie移动硬盘、索尼大法充电宝、高大上极路由、UPYUN代金券等你拿!
  12. hibernate-release-5.2.9.Final
  13. centos7 - mongodb3.6.5-配置文件
  14. python入门day01
  15. Nginx 4层反向代理
  16. Miniconda安装scrapy教程
  17. greenlet
  18. js中将类数组转换为数组的几种方法
  19. 第一章 在.net mvc生成EF入门
  20. Spring Boot项目搭建

热门文章

  1. 空间谱专题10:MUSIC算法
  2. python修饰器(装饰器)以及wraps
  3. HyperLogLog
  4. JVM:从实际案例聊聊Java应用的GC优化
  5. debian/deepin 15.3 15.4安装jdk 1.7 (或jdk 7),配置默认环境
  6. Java知多少(28)super关键字
  7. Mybatis使用MySQL模糊查询时输入中文检索不到结果怎么办--转自http://www.jb51.net/article/88236.htm
  8. ZeroMQ总结
  9. 关于SpringBoot如何返回视图
  10. 超简单Windows安装Scrapy (仅需一步)