统计web日志里面一个时间段的get请求数量
2024-08-20 23:53:37
日志数据:
::::::: - - [/Nov/::: +] "GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1"
::::::: - - [/Nov/::: +] "GET /CloudDocLib/xng/xngAction!listDeamons.action?page=0&count=10&sort=SYMBOL&order=asc&query=STYPE%3AEQA%3BCINDUSTRY.STYLE%3A009%3BCINDUSTRY.STYLECODE%3AZC7&jobListType=1&host=unknown HTTP/1.1"
::::::: - - [/Nov/::: +] "POST /CloudDocLib/xng/xngAction!startDeamon.action HTTP/1.1" ```
**要求:按照时间每个小时统计get产生的次数**
第一种做法是使用sql的做法:
scala代码:
import org.apache.Spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext} /**
* Created by xiaopengpeng on 2016/12/15.
*/
class countget { }
object countget{
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)
val spark = SparkSession
.builder()
.config(conf)
.getOrCreate()
import spark.implicits._
//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821
val logDF = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.\logs\localhost_access_log.--.txt”)
//.foreach(x=>x.split(” “).map())
.map(line =>line.split(” “)).map(list=>( list().substring(list().lastIndexOf(“/”)+,list().lastIndexOf(“/”)+),list()))
.toDF(“time”,”method”);
logDF.show();
logDF.createOrReplaceTempView(“log”);
spark.sql(“SELECT time,COUNT(method) FROM log WHERE method=’\”GET’ group by time”).show(); }
}
第二种做法是用的纯粹的scala代码实现的
代码:
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession /**
* Created by root on 2016/12/15.
*/
class CountGetByScala { }
object CountGetByScala{
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)
val spark = SparkSession
.builder()
.config(conf)
.getOrCreate()
import spark.implicits._
//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821
val logLine = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.\logs\localhost_access_log.--.txt”)
.map(line =>line.split(” “)).map(list=>( list().substring(list().lastIndexOf(“/”)+,list().lastIndexOf(“/”)+),list()))
val filter = logLine.filter(y=>y._2.equals(“\”GET”)) val group = filter.groupBy(line=>line._1)
val result = group.map(g =>(g._1,g._2.toList.size))
result.foreach(x=>println(x)) }
}
最新文章
- ViewPager+GridView实现横向滑动 仿大众点评
- ORA-06502:PL/SQL :numberic or value error: character string buffer too small
- 源码编译安装 MySQL 5.5.x 实践
- MySQL5.7安装与配置(YUM)
- Yocto开发笔记之《工具使用:TFTP &; NFS &; SSH》(QQ交流群:519230208)
- 移动 Web 开发技巧之(后续)
- mysql中limit与in不能同时使用的解决方式.
- C/C++学习之路----volatile
- Xcode5创建自己的静态库详解
- java.util.Dictionary源码分析
- 重读LPTHW-Lesson37
- Nginx 配置指令的执行顺序(七)
- php抽象类和接口
- SQL SERVER的检查点checkpoint
- 读APUE分析散列表的使用
- 流API--流的迭代
- Python从入门到放弃
- MySQLorder by用法
- python-day19 Django模板,路由分发,ORM
- 动态dp初探
热门文章
- 最值得一看的几条简单的谷歌 Google 搜索技巧,瞬间提升你的网络搜索能力
- mysql中什么是逻辑备份
- Spring------Spring data jpa 定义实体类(@OneToMany等的使用)
- Ubuntu12.04 Skype4.2 提示Skype can't connect,安装Skype4.3
- linux系统usb挂载
- 如何让IOS中的文本实现3D效果
- 总结微信小程序开发中遇到的坑
- .Net内存溢出 System.OutOfMemoryException
- c++11——列表初始化
- 开源的PaaS方案:在OpenStack上部署CloudFoundry (三)部署BOSH