日志数据:

::::::: - - [/Nov/::: +] "GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1"
::::::: - - [/Nov/::: +] "GET /CloudDocLib/xng/xngAction!listDeamons.action?page=0&count=10&sort=SYMBOL&order=asc&query=STYPE%3AEQA%3BCINDUSTRY.STYLE%3A009%3BCINDUSTRY.STYLECODE%3AZC7&jobListType=1&host=unknown HTTP/1.1"
::::::: - - [/Nov/::: +] "POST /CloudDocLib/xng/xngAction!startDeamon.action HTTP/1.1" ```
**要求:按照时间每个小时统计get产生的次数**
第一种做法是使用sql的做法:
scala代码:
import org.apache.Spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext} /**
* Created by xiaopengpeng on 2016/12/15.
*/
class countget { }
object countget{
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)
val spark = SparkSession
.builder()
.config(conf)
.getOrCreate()
import spark.implicits._
//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821
val logDF = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.\logs\localhost_access_log.--.txt”)
//.foreach(x=>x.split(” “).map())
.map(line =>line.split(” “)).map(list=>( list().substring(list().lastIndexOf(“/”)+,list().lastIndexOf(“/”)+),list()))
.toDF(“time”,”method”);
logDF.show();
logDF.createOrReplaceTempView(“log”);
spark.sql(“SELECT time,COUNT(method) FROM log WHERE method=’\”GET’ group by time”).show(); }
}
第二种做法是用的纯粹的scala代码实现的
代码:
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession /**
* Created by root on 2016/12/15.
*/
class CountGetByScala { }
object CountGetByScala{
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(“countget”).setMaster(“local[*]”)
val spark = SparkSession
.builder()
.config(conf)
.getOrCreate()
import spark.implicits._
//0:0:0:0:0:0:0:1 - - [11/Nov/2016:14:41:31 +0800] “GET /CloudDocLib/portal/deamon/manage.jsp HTTP/1.1” 200 13821
val logLine = spark.sparkContext.textFile(“D:\Program\apache-tomcat-7.0.\logs\localhost_access_log.--.txt”)
.map(line =>line.split(” “)).map(list=>( list().substring(list().lastIndexOf(“/”)+,list().lastIndexOf(“/”)+),list()))
val filter = logLine.filter(y=>y._2.equals(“\”GET”)) val group = filter.groupBy(line=>line._1)
val result = group.map(g =>(g._1,g._2.toList.size))
result.foreach(x=>println(x)) }
}
 

最新文章

  1. ViewPager+GridView实现横向滑动 仿大众点评
  2. ORA-06502:PL/SQL :numberic or value error: character string buffer too small
  3. 源码编译安装 MySQL 5.5.x 实践
  4. MySQL5.7安装与配置(YUM)
  5. Yocto开发笔记之《工具使用:TFTP & NFS & SSH》(QQ交流群:519230208)
  6. 移动 Web 开发技巧之(后续)
  7. mysql中limit与in不能同时使用的解决方式.
  8. C/C++学习之路----volatile
  9. Xcode5创建自己的静态库详解
  10. java.util.Dictionary源码分析
  11. 重读LPTHW-Lesson37
  12. Nginx 配置指令的执行顺序(七)
  13. php抽象类和接口
  14. SQL SERVER的检查点checkpoint
  15. 读APUE分析散列表的使用
  16. 流API--流的迭代
  17. Python从入门到放弃
  18. MySQLorder by用法
  19. python-day19 Django模板,路由分发,ORM
  20. 动态dp初探

热门文章

  1. 最值得一看的几条简单的谷歌 Google 搜索技巧,瞬间提升你的网络搜索能力
  2. mysql中什么是逻辑备份
  3. Spring------Spring data jpa 定义实体类(@OneToMany等的使用)
  4. Ubuntu12.04 Skype4.2 提示Skype can't connect,安装Skype4.3
  5. linux系统usb挂载
  6. 如何让IOS中的文本实现3D效果
  7. 总结微信小程序开发中遇到的坑
  8. .Net内存溢出 System.OutOfMemoryException
  9. c++11——列表初始化
  10. 开源的PaaS方案:在OpenStack上部署CloudFoundry (三)部署BOSH