之前将google cluster data导入了Azure上的MySQL数据库,下一步就是对这些数据进行分析,

挖掘用户的使用规律了。

首先,为了加快执行速度,对user,time等加入索引。

然后就可以使用以下代码进行统计了。

import os
import MySQLdb
import time
import thread def use4ADay(day, users):
conn=MySQLdb.connect(host="localhost",user="root",passwd="",db="googleclusterdata",charset="utf8")
cursor = conn.cursor() msAday = 24*60*60*1000000 for user in users:
user = user[0]
print user
use4ADay.user = user print 'day %s' %day
startTime = (day - 1) * msAday
endTime = day * msAday
dayCPUUse = 0
dayMEMUse = 0
dayDiskUse = 0
order = "select job_id from job_events where time >= %s and time < %s and user = '%s'" %(startTime, endTime, user)
print order
cursor.execute(order)
job_ids = cursor.fetchall()
for job_id in job_ids:
job_id = job_id[0]
print 'day %s' %day
order = "select task_index, event_type, cpu_request, memory_request, disk_space_request, time from task_events \
where time >= %s and time < %s and job_id = %d order by task_index"\
%(startTime, endTime, job_id)
print order
cursor.execute(order)
tasks = cursor.fetchall()
print 'tasks get'
i = 0
while i < len(tasks) - 1:
task = tasks[i]
if task[1] == 1:
task_index = task[0]
nextEvent = tasks[i+1]
if (nextEvent[1] == 4 or nextEvent[1] == 5) and nextEvent[0] == task_index:
taskLife = (nextEvent[5] - tasks[i][5]) / (10.0**6)
dayCPUUse += taskLife * task[2]
dayMEMUse += taskLife * task[3]
dayDiskUse += taskLife * task[4]
#print 'task: ', task_index, dayCPUUse, dayMEMUse, dayDiskUse
i = i+1
#print 'job: ', job_id, dayCPUUse, dayMEMUse, dayDiskUse
fOut = open('C:\\userUsageEachDay\\day%d.txt' %day, 'a')
fOut.write('%s\t%f\t%f\t%f\n' %(user, dayCPUUse, dayMEMUse, dayDiskUse))
fOut.close()
print 'day %d finish' %day
conn.close() conn=MySQLdb.connect(host="localhost",user="root",passwd="",db="googleclusterdata",charset="utf8")
cursor = conn.cursor()
#get all user_name
order = "select distinct user from job_events"
print order
cursor.execute(order)
users = cursor.fetchall()
conn.close() for day in range(1, 30):
try:
use4ADay(day, users)
except:
print 'day', day, 'failed!!'
fOut = open('C:\\failed.txt', 'a')
fOut.write('%s\t%d\t\n' %(use4ADay.user, day))
fOut.close()
#print 'starting thread for day %d' %day
#thread.start_new_thread(use4ADay, (day, users, ) )#use4ADay(2, users)

下一步,是统计每个用户整个月的消费频率,以及每次消费的平均消费量

fDay1 = open('C:\\Usage\\day1.txt')
users = []
for l in fDay1.readlines():
l = l.split('\t')
user = l[0]
users.append(user)
fDay1.close() #fOut = open('C:\\UseTraceOfAllUsers.txt', 'w')
for user in users:
useDays = 0
allPrice = 0
for day in range(1,30):
f = open('C:\\Usage\\day%d.txt' %day)
isFind = False
for l in f.readlines():
if l.count(user) > 0:
l = l.strip()
l = l.split('\t')
cpu = float(l[1])
mem = float(l[2])
disk = float(l[3])
money = 1.92*cpu + 15.6*mem + 1.2*disk
assert(money>=0)
isFind = True
break
if isFind and money != 0:
useDays += 1
allPrice += money
f.close()
if useDays != 0:
pass
#fOut.write('%s\t%s\n' %(str(useDays/29.0), str(allPrice/useDays)))
fOut.close()

最后就可以使用matlab进行画图啦。

x = load('C:\UseTraceOfAllUsers.txt')
plot(x(:,1), x(:,2), 'o');

结果如下:

对平均使用量取个对数的话

x = load('C:\UseTraceOfAllUsers.txt')
plot(x(:,1), log(x(:,2)), 'o');

最新文章

  1. Spring Boot中的事务管理
  2. 在C#代码中应用Log4Net(二)典型的使用方式
  3. 最新版Android开发工具
  4. 图解GCD
  5. iOS开发 Apple Pay
  6. 294. Flip Game II
  7. CSS和JavaScript以及Ajax实现预加载图片的方法及优缺点分析
  8. 辛星浅析跨域传输的CORS解决方式
  9. Find the maximum(规律,大数)
  10. Oracle EBS-SQL (MRP-4):检查例外信息查询_建议取消_采购申请.sql
  11. SE 2014 年4月21日(二)
  12. .htaccess 使用大全
  13. React的学习(上)
  14. 58 同城 iOS 客户端 iOS11 及 iPhone X 适配实践
  15. Jmeter接口测试实战-数据传递
  16. ubuntu 16.04扩充root 分区
  17. zookeeper图形化的客户端工具
  18. win10输入法五笔设置
  19. Mapperreduce的wordCount原理
  20. Java Timer

热门文章

  1. Delphi 音频播放
  2. DBSCAN算法及sklearn实现
  3. shodan使用
  4. ClassLoader心得
  5. (六)监控磁盘IO
  6. itop4412编译内核时出现“recipe for target &#39;arch/arm/mach-exynos/cpu-exynos4.o&#39; failed”的解决方法
  7. Ribbon整合Consul,出现No instances available for XXX 异常
  8. HDU 6045 - Is Derek lying | 2017 Multi-University Training Contest 2
  9. MySQL下载与安装、解决报错
  10. BZOJ 1406: [AHOI2007]密码箱 exgcd+唯一分解定理