从hdfs上加载文件并创建graph

scala> var graphs = GraphLoader.edgeListFile(sc,"/tmp/dataTest/graphTest.txt")
graphs: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@ab5670d

 可以看到只有一个task,也就是说,他的默认task数量默认就是1,我手动设置一下
scala> val graphs = GraphLoader.edgeListFile(sc, "/tmp/dataTest/graphTest.txt",numEdgePartitions=)
graphs: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@409ea4d1

 这时显示就是4个task
 
查看前10个vertices和edge(vertices和edge的属性值默认会是1)
我来对vertices的值进行修改
scala> var verttmp = graphs.mapVertices((id,attr) => attr*)
verttmp: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@25d7eb44
scala> verttmp.vertices.take()
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_37_0]
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_37_1]
res4: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,), (,), (,), (,), (,), (,), (,), (,), (,), (,))
也可以使用这个方式,这个方式更优化一些
scala> var verttmp = graphs.mapVertices((_,attr) => attr*)
verttmp: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@76828ce4
修改edge的属性值
scala> var edgetmp=graphs.mapEdges(e => e.attr*)
edgetmp: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@42ce3be7
scala> edgetmp.edges.take()
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_26_0]
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_26_1]
res6: Array[org.apache.spark.graphx.Edge[Int]] = Array(Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,), Edge(,,))
修改triplets的属性值(要求是:将srcAttr修改为以前的2倍,dstAttr修改为以前的3倍)
scala> var triptmp = graphs.mapTriplets(t => t.srcAttr* + t.dstAttr*)
triptmp: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@318ec664
scala> triptmp.triplets.take()
[Stage :> ( + ) / ]// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_26_0]
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_26_1]
res7: Array[org.apache.spark.graphx.EdgeTriplet[Int,Int]] = Array(((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),), ((,),(,),))
 
structural operators的操作有以下几种函数
class Graph[VD, ED] {
def reverse: Graph[VD, ED]
def subgraph(epred: EdgeTriplet[VD,ED] => Boolean,
vpred: (VertexId, VD) => Boolean): Graph[VD, ED]
def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED]
def groupEdges(merge: (ED, ED) => ED): Graph[VD,ED]
}
subgraph操作
def subgraph(epred: EdgeTriplet[VD,ED] => Boolean,
vpred: (VertexId, VD) => Boolean): Graph[VD, ED]
//改函数返回的graph是满足一个boolean条件的graph
//vd就是verticesRdd,包含vertexId和attr vpred:(vertexId,(vertexId,attr))
subgraph大数多应用场景:限制图的顶点和边,消除失效的链接
scala> var subg = graphs.subgraph(epred = e =>e.srcId>e.dstId)
subg: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@51483f93
查看结果
scala> subg.edges.take()
res12: Array[org.apache.spark.graphx.Edge[Int]] = Array(
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,),
Edge(,,))
查看subgraph的vertices和edge
scala> subg.vertices.count
res11: Long =
scala> subg.edges.count
res13: Long =
查看原来的graphs的vertices和edge
scala> graphs.vertices.count
res9: Long =
scala> graphs.edges.count
res10: Long =
 
Degrees 有(indegrees,outdegrees,Degrees)
 
indegrees:就是srcID到dstId的度数 ,自我理解就是条数
scala> graphs.inDegrees
res15: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,),
(,), (,), (,), (,), (,),
(,))
outdegrees:就是dstId到srcId的度数
scala> graphs.outDegrees.collect
[Stage :>( + ) / ]// :: WARN executor.Executor:
res18: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,), (,),
(,), (,), (,), (,), (,),
(,), (,), (,), (,), (,))
degrees:总度数
 
查出最大的出度,入度,总度数
创建函数
scala> def max(a:(VertexId,Int),b:(VertexId,Int))={if(a._2>b._2) a else b }
max: (a: (org.apache.spark.graphx.VertexId, Int), b: (org.apache.spark.graphx.VertexId, Int))
(org.apache.spark.graphx.VertexId, Int)
inDdgrees
scala> graphs.inDegrees.reduce(max)
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_14_0]
res35: (org.apache.spark.graphx.VertexId, Int) = (,) scala> graphs.outDegrees.reduce(max)
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_14_0]
res36: (org.apache.spark.graphx.VertexId, Int) = (,) scala> graphs.degrees.reduce(max)
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_14_0]
res38: (org.apache.spark.graphx.VertexId, Int) = (,)
joinVertices:将各个顶点改为他的入度 
outerJoinVertices:将各个顶点改为他的出度
将graphs中所有的vertexId的属性都设置为0
scala> var rawG=graphs.mapVertices((id,attr) => )
rawG: org.apache.spark.graphx.Graph[Int,String] = org.apache.spark.graphx.impl.GraphImpl@43d06473
查看结果
scala> rawG.vertices.collect
res47: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,), (,), (,), (,))
获取rwaG的inDegrees数据集
scala> var ind=rawG.inDegrees;
ind: org.apache.spark.graphx.VertexRDD[Int] = VertexRDDImpl[] at RDD at VertexRDD.scala:
查看结果
scala> ind.collect
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_60_0]
res49: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,), (,), (,))
使用joinVertices
scala> var temp=rawG.joinVertices[Int](ind)((_,_,optdeg) => optdeg)
temp: org.apache.spark.graphx.Graph[Int,String] = org.apache.spark.graphx.impl.GraphImpl@af0e7ce
查看结果
scala> temp.vertices.take();
// :: WARN executor.Executor: block locks were not released by TID = :
[rdd_60_0, rdd_77_0]
res51: Array[(org.apache.spark.graphx.VertexId, Int)] = Array((,), (,), (,), (,))
joinVertices从字面上看就是把两个数据集根据vertexId合并,集合的属性用右边的vertices,最后一个属性是0,是因为主的数据集没有vertexId与辅的对应,
 
outerJoinVertices
 
 
aggregateMessages

最新文章

  1. 使用Hystrix提高系统可用性
  2. 在VS的EF中连接MySQL
  3. 入手《C#入门经典(第6版)》,据说今天是读书日
  4. Android ActionBar的基本用法
  5. 将枚举定义生成SQL中的Case-When-then语句
  6. 动软商城系统可免费下载了,专业批发分销商城系统,ASP.NET商城系统
  7. Unity 中的协同程序
  8. nginx的配置总结
  9. sql server 2008 基础知识
  10. ASP.NET的错误处理机制之二(实例log4net)
  11. SQL Server T-SQL高级查询【转】
  12. JQ简单图片轮播
  13. 【HeadFirst 设计模式总结】1.策略模式
  14. MySQL SET TRANSACTION 设置事务隔离级别
  15. ios自定义UIButton内部空间Rect
  16. PHPMailer发送邮件失败:SMTP connect failed
  17. 你需要知道的这几种 asp.net core 修改默认端口的方式
  18. iTOP-4418开发板Ubuntu系统烧写方法分享
  19. Python - 列联表的独立性检验(卡方检验)
  20. nodejs 学习三 异步和同步

热门文章

  1. 51nod 1021 石子归并 【区间DP】
  2. 使用GEANT4的模拟技术1
  3. POJ 3977:Subset(折半枚举+二分)
  4. static静态变量-投票案例
  5. 韩国研发AI武器遭抵制,武器自带“头脑”将多可怕
  6. 用CSS3产生动画效果
  7. selenium清空元素时,.clear不执行
  8. 获得Oracle当前日期的年或月的第一天和最后一天
  9. android Broadcast 总结
  10. How to Clear setInterval() without Knowing the ID