hbase 操作

视频随笔
视频地址：hbase教程

1.与传统关系型数据库的区别

hbase 传统
分布式   单机
列动态增减   建表时候指定
只有字符串一种数据类型   数值，字符
空值不被存储   存储
不支持SQL
查询方式单一，通过rowkey，或rowkey范围，或全表扫描
列式   行式
非结构化，json 结构化

2.hbase特点：
分布式
快速随机写，基于key简单读是否支持单挑更新？
亿级行，百万列关系型数据库对列数有限制
列式存储
不支持sql，java api，（套一个壳通过SQL访问）

3.hbase能否替代关系型数据库
不支持事务，交易数据mysql
不能提供丰富的查询，join等
只能作为补充

4.hmaster作用
1.管理regionserver
2.管理ddl，源数据定义

5.regionserver作用
1.dml
2.wal（write ahead log）

6.简单概念：
DML(Data Manipulation Language)数据操纵语言命令使用户能够查询数据库以及操作已有数据库中的数据。
如insert,delete,update,select等都是DML.

DDL语句用语定义和管理数据库中的对象，如Create,Alter和Drop.

7.hbhbase逻辑视图；

类似sortedMap，其中key 是（rowkey,column,version）组成的三维坐标，查询时候必须提供rowkey，根据查询粒度，column和version可选

8.hbase的物理存储：
1.table = n个region 按照rowkey水平切分
2.Region = n store 一个column family 一个store
3.store = 1个 memstore (内存) + n 个 hfile(hdfs文件) ，memstore 中的数据flush一次会产生一个hfile

9.hbase 设计建议
1.自己定义一个anmespace(database)
2.定义合理的schema
3.建表时设置合理预分区 pre-split auto-split force-split
4.选择合适的字段做rowkey，比如手机号，imsi
5.column family 和column的名字短一些，节省存储空间
6.设置合适的版本数量，建议保留3份

10.hbase 的操作
1.put 单条/批量操作，无update方法，类似map
2.delete 单条/批量操作

11.操作演练：
./hbase shell
1).简单状态查询

hbase(main)::> status

 active master,  backup masters,  servers,  dead, 1.0000 average load

Took 0.0175 seconds 

hbase(main)::> whoami

hadoop (auth:SIMPLE)

    groups: hadoop

Took 0.0006 seconds

2).查看某一具体命令用法

hbase(main)::> help "status"

Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The

default is 'summary'. Examples:

  hbase> status

  hbase> status 'simple'

  hbase> status 'summary'

  hbase> status 'detailed'

  hbase> status 'replication'

  hbase> status 'replication', 'source'

  hbase> status 'replication', 'sink'

hbase(main)::>

3)查看namespace 可以用tab补全功能

hbase(main)::> list_namespace

NAMESPACE

default

hbase

 row(s)

Took 0.1524 seconds

hbase(main)::>

4).创建namespace

reate             create_namespace

hbase(main)::> create_namespace 'gp'

Took 0.2463 seconds

hbase(main)::>

hbase(main)::> list_namespace

NAMESPACE

default

gp

hbase

 row(s)

Took 0.0270 seconds

5)创建带预分区的表：

create ‘namespace：表名’,'列族'，...

hbase(main)::>  create 'gp:test','info',{NUMREGIONS => , SPLITALGO => 'HexStringSplit'}

Created table gp:test

Took 2.6835 seconds

=> Hbase::Table - gp:test

hbase(main)::> desc 'gp:test'

Table gp:test is ENABLED

gp:test

COLUMN FAMILIES DESCRIPTION

{NAME => 'info', VERSIONS => '', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_

BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'fals

e', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '', REPLIC

ATION_SCOPE => '', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_ME

MORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'f

alse', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => ''}

 row(s)

Took 0.3126 seconds

hbase(main)::>

6)修改表属性，将存储的version由一个改为 3个

hbase(main)::> alter 'gp:test',{NAME=>'info',VERSIONS=>''}

Updating all regions with the new schema...

/ regions updated.

Done.

Took 2.3734 seconds

hbase(main)::> desc 'gp:test'

Table gp:test is ENABLED

gp:test

COLUMN FAMILIES DESCRIPTION

{NAME => 'info', VERSIONS => '', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_

BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'fals

e', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '', REPLIC

ATION_SCOPE => '', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_ME

MORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'f

alse', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => ''}

 row(s)

Took 0.0597 seconds

hbase(main)::>

7)插入数据：

语法 put ‘namespace：tablename’，‘rowkey’，‘columnfamily：column’，‘value’，version（版本可不指定，默认是时间戳）

hbase(main)::>  put 'gp:test','','info:col1','v1'

Took 0.2623 seconds

hbase(main)::> scan 'gp:test'

ROW                   COLUMN+CELL

                   column=info:col1, timestamp=, value=v1

 row(s)

Took 0.1840 seconds

8)用get查询数据：

hbase(main)::>  put 'gp:test','','info:col1','v2',

Took 0.0188 seconds

hbase(main)::> scan 'gp:test'

ROW                   COLUMN+CELL

                   column=info:col1, timestamp=, value=v1

                   column=info:col1, timestamp=, value=v2

 row(s)

Took 0.0526 seconds

hbase(main)::> get 'gp:test',''

COLUMN                CELL

 info:col1            timestamp=, value=v1

 row(s)

Took 0.0783 seconds

hbase(main)::>

9)get rowkey=‘123’ 的指定列

hbase(main)::>  put 'gp:test','','info:col2','v3'

Took 0.0487 seconds

hbase(main)::> get 'gp:test','','info:col1'

COLUMN                CELL

 info:col1            timestamp=, value=v1

 row(s)

Took 0.0104 seconds

hbase(main)::>

10)删除某一行的指定列：

hbase(main)::> delete 'gp:test','','info:col1'

hbase(main)::> scan 'gp:test'

ROW                   COLUMN+CELL

                   column=info:col2, timestamp=, value=v3

                   column=info:col1, timestamp=, value=v2

 row(s)

Took 0.0606 seconds

hbase(main)::>

11)删除整行记录：

hbase(main)::> deleteall 'gp:test',''

Took 0.0225 seconds

hbase(main)::> scan 'gp:test'

ROW                   COLUMN+CELL

                   column=info:col2, timestamp=, value=v3

 row(s)

Took 0.0687 seconds

hbase(main)::> 

执行delete操作之后并未马上删除数据，只是打上了delete标志

可以通过如下命令查看

hbase(main)::> scan 'gp:test', {RAW => true, VERSIONS => }

ROW                   COLUMN+CELL

                   column=info:col1, timestamp=, type=Delete

                   column=info:col1, timestamp=, value=v1

                   column=info:col2, timestamp=, value=v3

                   column=info:, timestamp=, type=DeleteFamily

                   column=info:col1, timestamp=, value=v2

 row(s)

Took 0.1143 seconds

hbase(main)::>

delete其实是一个put操作，插入了type=Deletexxx

目前数据还在memstore 中，未flush到hfile中

12)执行flush，major_compact后数据会被删掉

hbase(main)::> flush 'gp:test'

Took 0.8562 seconds

hbase(main)::> scan 'gp:test', {RAW => true, VERSIONS => }

ROW                   COLUMN+CELL

                   column=info:col1, timestamp=, type=Delete

                   column=info:col2, timestamp=, value=v3

                   column=info:, timestamp=, type=DeleteFamily

 row(s)

Took 0.0718 seconds

hbase(main)::> major_compact 'gp:test'

Took 0.3532 seconds

hbase(main)::> scan 'gp:test', {RAW => true, VERSIONS => }

ROW                   COLUMN+CELL

                   column=info:col2, timestamp=, value=v3

 row(s)

Took 0.8065 seconds

hbase(main)::>

生产中很少进行compact ，会阻塞读写

13)清空表和namespace

hbase(main)::> truncate 'gp:test'

Truncating 'gp:test' table (it may take a while):

Disabling table...

Truncating table...

Took 2.1177 seconds

hbase(main)::> scan 'gp:test'

ROW                   COLUMN+CELL

 row(s)

Took 1.1058 seconds

hbase(main)::> disable 'gp:test'

Took 0.5193 seconds

hbase(main)::> scan 'gp:test'

ROW                   COLUMN+CELL

org.apache.hadoop.hbase.TableNotEnabledException: gp:test is disabled.

 at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:)

 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:)

 at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:)

 at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:)

 at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:)

 at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:)

 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:)

 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:)

 at java.lang.Thread.run(Thread.java:)

ERROR: Table gp:test is disabled!

For usage try 'help "scan"'

Took 0.1323 seconds

hbase(main)::> drop 'gp:test'

Took 0.3581 seconds

hbase(main)::> drop

drop             drop_all         drop_namespace

hbase(main)::> list

list                         list_deadservers

list_labels                  list_locks

list_namespace               list_namespace_tables

list_peer_configs            list_peers

list_procedures              list_quota_snapshots

list_quota_table_sizes       list_quotas

list_regions                 list_replicated_tables

list_rsgroups                list_security_capabilities

list_snapshot_sizes          list_snapshots

list_table_snapshots

hbase(main)::> list_namespace

list_namespace          list_namespace_tables

hbase(main)::> list_namespace 'gp'

NAMESPACE

gp

 row(s)

Took 0.1517 seconds

hbase(main)::> drop

drop             drop_all         drop_namespace

hbase(main)::> drop_namespace 'gp'

Took 0.2719 seconds

hbase(main)::> list

list                         list_deadservers

list_labels                  list_locks

list_namespace               list_namespace_tables

list_peer_configs            list_peers

list_procedures              list_quota_snapshots

list_quota_table_sizes       list_quotas

list_regions                 list_replicated_tables

list_rsgroups                list_security_capabilities

list_snapshot_sizes          list_snapshots

list_table_snapshots

hbase(main)::> list_namespace

list_namespace          list_namespace_tables

hbase(main)::> list_namespace

NAMESPACE

default

hbase

 row(s)

Took 0.0322 seconds

hbase(main)::>

巴特西

hbase 操作

最新文章

热门文章