鸡肋的JdbcRDD

今天准备将mysql的数据倒腾到RDD。非常早曾经就知道有一个JdbcRDD。就想着使用一下，结果发现却是鸡肋一个。

首先，看看JdbcRDD的定义：

 * An RDD that executes an SQL query on a JDBC connection and reads results.

 * For usage example, see test case JdbcRDDSuite.

 *

 * @param getConnection a function that returns an open Connection.

 *   The RDD takes care of closing the connection.

 * @param sql the text of the query.

 *   The query must contain two ? placeholders for parameters used to partition the results.

 *   E.g. "select title, author from books where ? <= id and id <= ?"

 * @param lowerBound the minimum value of the first placeholder

 * @param upperBound the maximum value of the second placeholder

 *   The lower and upper bounds are inclusive.

 * @param numPartitions the number of partitions.

 *   Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2,

 *   the query would be executed twice, once with (1, 10) and once with (11, 20)

 * @param mapRow a function from a ResultSet to a single row of the desired result type(s).

 *   This should only call getInt, getString, etc; the RDD takes care of calling next.

 *   The default maps a ResultSet to an array of Object.

 */

class JdbcRDD[T: ClassTag](

    sc: SparkContext,

    getConnection: () => Connection,

    sql: String,

    lowerBound: Long,

    upperBound: Long,

    numPartitions: Int,

    mapRow: (ResultSet) => T = JdbcRDD.resultSetToObjectArray _)

附上个样例：

package test

import java.sql.{Connection, DriverManager, ResultSet}

import org.apache.spark.rdd.JdbcRDD

import org.apache.spark.{SparkConf, SparkContext}

object spark_mysql {

  def main(args: Array[String]) {

    //val conf = new SparkConf().setAppName("spark_mysql").setMaster("local")

    val sc = new SparkContext("local","spark_mysql")

    def createConnection() = {

      Class.forName("com.mysql.jdbc.Driver").newInstance()

      DriverManager.getConnection("jdbc:mysql://192.168.0.15:3306/wsmall", "root", "passwd")

    }

    def extractValues(r: ResultSet) = {

      (r.getString(1), r.getString(2))

    }

    val data = new JdbcRDD(sc, createConnection, "SELECT id,aa FROM bbb where ?

<= ID AND ID <= ?", lowerBound = 3, upperBound =5, numPartitions = 1, mapRow = extractValues)

    println(data.collect().toList)

    sc.stop()

  }

}

使用的MySQL表的数据例如以下：

执行结果例如以下：

能够看出：JdbcRDD的sql參数要带有两个？的占位符，而这两个占位符是给參数lowerBound和參数upperBound定义where语句的边界的，假设不过这种话，还能够接受；但悲催的是參数lowerBound和參数upperBound都是Long类型的，

，不知道如今作为keyword或做查询的字段有多少long类型呢？不过參照JdbcRDD的源代码，用户还是能够写出符合自己需求的JdbcRDD，这算是不幸中之大幸了。

近期一直忙于炼数成金的spark课程。没多少时间整理博客。

特意给想深入了解spark的朋友推荐一位好友的博客http://www.cnblogs.com/cenyuhai/ 。里面有不少源代码博文，利于理解spark的内核。

巴特西

鸡肋的JdbcRDD

最新文章

热门文章