內容簡介
本書使用大數據存儲常用工具與真實場景案例相結合的方式,以項目任務式為導向,較為全面地介紹了HBase分散式資料庫與Hive分散式數據倉庫的相關知識。全書共9個項目,內容包括認識資料庫與數據倉庫、安裝與配置HBase列存儲資料庫、使用HBase Shell構建部落格資料庫系統、使用HBase Java API實現部落格資料庫系統的套用開發、安裝與配置Hive結構化數據倉庫、使用Hive實現數據定義操作、使用Hive Shell實現用戶優惠券數據分析及處理、使用Hive Java API實現用戶優惠券分析的套用開發、以及如何綜合使用Hive與HBase存儲技術實現電信運營商流失用戶分析。本書大部分項目都設定了任務實訓及課後習題,通過練習和操作實踐,可以幫助讀者鞏固所學的內容,快速掌握書中所介紹的HBase與Hive存儲工具的操作。本書可以作為高校大數據技術相關專業的教材,也可作為大數據技術或資料庫愛好者的自學用書。希望通過學習本書內容,讀者在提升大數據存儲技術的套用能力的同時,也能夠養成自主學習的意識,提高發現問題、分析問題和解決問題的能力,具備良好的問題分析素養和獨立思考能力,並養成敬業、精益、專注的工匠精神。
圖書目錄
項目1 認識資料庫與數據倉庫 ············································································.1
【教學目標】 ······························································································.1
【背景描述】 ······························································································.1
任務 1 了解大數據 ·····················································································.2
【任務描述】 ···················································································.2
【任務要求】 ···················································································.2
【相關知識】 ···················································································.2
1.1.1 大數據的概念及發展歷程 ····················································.2
1.1.2 大數據的數據類型 ·····························································.2
1.1.3 大數據的特點 ···································································.3
1.1.4 大數據的行業套用 ·····························································.4
1.1.5 大數據的技術體系 ·····························································.5
任務 2 了解大數據存儲技術 ·········································································.7
【任務描述】 ···················································································.7
【任務要求】 ···················································································.7
【相關知識】 ···················································································.7
1.2.1 大數據存儲簡介 ································································.7
1.2.2 基於檔案系統的數據存儲 ····················································.8
1.2.3 基於資料庫的數據存儲 ·······················································.9
1.2.4 基於數據倉庫的數據存儲 ··················································.10
項目總結 ·································································································.10
課後習題 ·································································································.11
項目2 安裝與部署HBase ················································································.12
【教學目標】 ····························································································.12
【背景描述】 ····························································································.12
任務 1 搭建完全分散式 Hadoop 集群 ····························································.13
【任務描述】 ·················································································.13
【任務要求】 ·················································································.13
【相關知識】 ·················································································.13
2.1.1 Hadoop 簡介 ···································································.13
2.1.2 Hadoop 的核心組件 ··························································.14
2.1.3 Hadoop 生態系統 ·····························································.17
2.1.4 搭建 Hadoop 集群前的準備工作 ··········································.19
【任務實施】 ·················································································.19
【任務實訓】 ·················································································.47
任務 2 安裝 ZooKeeper 集群 ·······································································.47
【任務描述】 ·················································································.47
【任務要求】 ·················································································.47
【相關知識】 ·················································································.48
2.2.1 ZooKeeper 簡介 ·······························································.48
2.2.2 ZooKeeper 的架構 ····························································.48
【任務實施】 ·················································································.49
任務 3 安裝與配置 HBase 集群 ···································································.51
【任務描述】 ·················································································.51
【任務要求】 ·················································································.51
【相關知識】 ·················································································.52
2.3.1 HBase 簡介 ····································································.52
2.3.2 HBase 的核心功能模組 ·····················································.53
2.3.3 HBase 的讀/寫流程 ··························································.54
【任務實施】 ·················································································.55
【任務實訓】 ·················································································.58
項目總結 ·································································································.59
課後習題 ·································································································.59
項目3 使用HBase Shell 構建部落格資料庫系統 ······················································.60
【教學目標】 ····························································································.60
【背景描述】 ····························································································.60
任務 1 設計 HBase 表 ···············································································.61
【任務描述】 ·················································································.61
【任務要求】 ·················································································.61
【相關知識】 ·················································································.61
3.1.1 HBase 的數據模型 ···························································.61
3.1.2 HBase 表的結構設計原則 ··················································.62
3.1.3 HBase 的檢索方式 ···························································.63
3.1.4 RowKey 設計原則 ····························································.63
3.1.5 熱點問題 ·······································································.63
3.1.6 列族設計原則 ·································································.64
【任務實施】 ·················································································.65
任務 2 創建 HBase 表 ···············································································.66
【任務描述】 ·················································································.66
【任務要求】 ·················································································.66
【相關知識】 ·················································································.66
3.2.1 命名空間 ·······································································.66
3.2.2 創建表 ··········································································.67
3.2.3 查看錶結構 ····································································.68
3.2.4 修改表 ··········································································.69
3.2.5 刪除表 ··········································································.69
【任務實施】 ·················································································.69
【任務實訓】 ·················································································.70
任務 3 查詢 HBase 表數據 ·········································································.70
【任務描述】 ·················································································.70
【任務要求】 ·················································································.70
【相關知識】 ·················································································.71
3.3.1 插入數據 ·······································································.71
3.3.2 查詢數據 ·······································································.72
3.3.3 掃描全表數據 ·································································.72
3.3.4 刪除數據 ·······································································.73
3.3.5 清空數據 ·······································································.73
【任務實施】 ·················································································.73
【任務實訓】 ·················································································.75
任務 4 查詢符合指定條件的 HBase 表數據 ····················································.76
【任務描述】 ·················································································.76
【任務要求】 ·················································································.76
【相關知識】 ·················································································.77
3.4.1 HBase 高級查詢 ······························································.77
3.4.2 HBase 的抽象操作符 ························································.77
3.4.3 HBase 的比較器 ······························································.77
3.4.4 HBase 的過濾器 ······························································.78
【任務實施】 ·················································································.79
【任務實訓】 ·················································································.79
項目總結 ·································································································.80
課後習題 ·································································································.80
項目4 使用HBase Java API 開發部落格資料庫系統 ·················································.82
【教學目標】 ····························································································.82
【背景描述】 ····························································································.82
任務 1 搭建 HBase 開發環境 ······································································.83
【任務描述】 ·················································································.83
【任務要求】 ·················································································.83
【任務實施】 ·················································································.83
任務 2 插入並查詢數據 ·············································································.95
【任務描述】 ·················································································.95
【任務要求】 ·················································································.95
【相關知識】 ·················································································.95
4.2.1 HBase Java API 的主要接口與類 ··········································.95
4.2.2 使用 HBase Java API 創建命名空間和表 ································.98
4.2.3 使用 HBase Java API 插入數據 ············································.99
4.2.4 使用 HBase Java API 查詢數據 ···········································.100
4.2.5 使用 HBase Java API 進行全表查詢 ·····································.101
【任務實施】 ················································································.102
【任務實訓】 ················································································.111
任務 3 查詢符合指定條件的數據 ································································.112
【任務描述】 ················································································.112
【任務要求】 ················································································.112
【相關知識】 ················································································.112
4.3.1 Hbase 過濾器API ···························································.112
【任務實施】 ················································································.114
【任務實訓】 ················································································.120
任務 4 實現 MapReduce 與 HBase 表的集成 ··················································.120
【任務描述】 ················································································.120
【任務要求】 ················································································.121
【相關知識】 ················································································.121
4.4.1 Hadoop 集群運行 MapReduce 程式 ······································.121
4.4.2 將數據導入 Hbase 表中 ····················································.122
4.4.3 導出 HBase 表中的數據 ···················································.123
【任務實施】 ················································································.123
【任務實訓】 ················································································.128
項目總結 ································································································.129
課後習題 ································································································.129
項目5 安裝與配置Hive 結構化數據倉庫 ···························································.131
【教學目標】 ···························································································.131
【背景描述】 ···························································································.131
任務 1 安裝與配置 Hive ···········································································.132
【任務描述】 ················································································.132
【任務要求】 ················································································.132
【相關知識】 ················································································.132
5.1.1 Hive 的起源與發展 ·························································.132
5.1.2 Hive 與傳統資料庫的對比 ················································.132
5.1.3 Hive 的系統架構 ····························································.133
5.1.4 Hive 的工作原理 ····························································.134
5.1.5 安裝前的準備工作 ··························································.135
【任務實施】 ················································································.135
【任務實訓】 ················································································.142
任務 2 在 Hive CLI 界面執行 Shell 命令和 dfs 命令 ·············································.143
【任務描述】 ················································································.143
【任務要求】 ················································································.143
【相關知識】 ················································································.143
5.2.1 在檔案中執行 Hive 查詢 ···················································.143
5.2.2 在 Hive 中執行 Linux Shell 命令 ·········································.145
5.2.3 在 Hive 中使用 Hadoop 的 dfs 命令 ·····································.146
5.2.4 在 Hive 腳本中進行注釋 ···················································.147
【任務實施】 ················································································.148
【任務實訓】 ················································································.149
項目總結 ································································································.150
課後習題 ································································································.150
項目6 使用Hive 定義優惠券數據 ····································································.152
【教學目標】 ···························································································.152
【背景描述】 ···························································································.152
任務 1 創建 Hive 表 ·················································································.153
【任務描述】 ················································································.153
【任務要求】 ················································································.153
【相關知識】 ················································································.153
6.1.1 Hive 的數據類型 ····························································.153
6.1.2 創建與管理數據倉庫 ·······················································.154
6.1.3 創建表 ·········································································.156
6.1.4 修改表 ·········································································.160
【任務實施】 ················································································.161
【任務實訓】 ················································································.162
任務 2 向Hive 表中導入數據 ·····································································.163
【任務描述】 ················································································.163
【任務要求】 ················································································.163
【相關知識】 ················································································.163
6.2.1 導入數據 ······································································.163
6.2.2 導出數據 ······································································.167
【任務實施】 ················································································.168
【任務實訓】 ················································································.169
項目總結 ································································································.170
課後習題 ································································································.170
項目7 使用Hive Shell 實現優惠券消費數據的分析及處理 ·····································.172
【教學目標】 ···························································································.172
【背景描述】 ···························································································.172
任務 1 查詢領取了優惠券的用戶信息 ··························································.173
【任務描述】 ················································································.173
【項目要求】 ················································································.173
【相關知識】 ················································································.173
7.1.1 select 基本查詢 ······························································.173
7.1.2 limit 結果限制 ································································.175
7.1.3 distinct 去重查詢 ····························································.176
7.1.4 where 條件查詢 ······························································.176
7.1.5 Hive 內置運算符 ····························································.177
7.1.6 正則表達式 ···································································.179
【任務實施】 ················································································.180
【任務實訓】 ················································································.182
任務 2 構建用戶標籤列 ············································································.183
【任務描述】 ················································································.183
【任務要求】 ················································································.183
【相關知識】 ················································································.183
7.2.1 case……when……語句的使用 ···········································.183
7.2.2 group by 分組查詢 ··························································.184
7.2.3 having 條件篩選 ·····························································.185
【任務實施】 ················································································.185
【任務實訓】 ················································································.186
任務 3 構建用戶特徵欄位 ·········································································.187
【任務描述】 ················································································.187
【任務要求】 ················································································.187
【相關知識】 ················································································.187
7.3.1 Hive 內置函式 ·······························································.187
7.3.2 排序查詢 ······································································.193
【任務實施】 ················································································.193
【任務實訓】 ················································································.197
任務 4 連線用戶特徵欄位 ·········································································.198
【任務描述】 ················································································.198
【任務要求】 ················································································.198
【相關知識】 ················································································.198
7.4.1 union 結果集合併 ···························································.198
7.4.2 join 連線表數據 ······························································.200
【任務實施】 ················································································.201
【任務實訓】 ················································································.202
項目總結 ································································································.202
課後習題 ································································································.203
項目8 使用Hive Java API 開發優惠券消費數據分析套用 ······································.205
【教學目標】 ···························································································.205
【背景描述】 ···························································································.205
任務 1 搭建 Hive 開發環境 ········································································.206
【任務描述】 ················································································.206
【任務要求】 ················································································.206
【任務實施】 ················································································.206
任務 2 編寫自定義函式統計優惠券折扣 ·······················································.212
【任務描述】 ················································································.212
【任務要求】 ················································································.212
【相關知識】 ················································································.213
8.2.1 Hive 自定義函式 ····························································.213
8.2.2 UDF 函式 ·····································································.213
8.2.3 UDAF 函式 ···································································.217
8.2.4 UDTF 函式 ···································································.218
【任務實施】 ················································································.220
【任務實訓】 ················································································.221
任務 3 構建及合併特徵欄位 ······································································.222
【任務描述】 ················································································.222
【任務要求】 ················································································.222
【相關知識】 ················································································.222
8.3.1 Hive Java API 的主要類 ····················································.222
8.3.2 執行 SQL 語句的方法 ······················································.224
【任務實施】 ················································································.225
【任務實訓】 ················································································.229
項目總結 ································································································.230
課後習題 ································································································.230
項目9 基於HBase 和Hive 的電信運營商用戶數據分析實戰 ··································.233
【教學目標】 ···························································································.233
【背景描述】 ···························································································.233
任務 1 案例背景和需求分析 ······································································.233
【任務描述】 ················································································.233
【任務要求】 ················································································.234
【任務實施】 ················································································.234
任務 2 數據預處理 ··················································································.236
【任務描述】 ················································································.236
【任務要求】 ················································································.236
【任務實施】 ················································································.236
任務 3 用戶數據的基本查詢 ······································································.240
【任務描述】 ················································································.240
【任務要求】 ················································································.240
【任務實施】 ················································································.240
任務 4 分析用戶通話情況 ·········································································.241
【任務描述】 ················································································.241
【任務要求】 ················································································.242
【任務實施】 ················································································.242
任務 5 將 Hive 的數據導入 HBase 中 ···························································.244
【任務描述】 ················································································.244
【任務要求】 ················································································.244
【任務實施】 ················································································.244
項目總結 ································································································.253
附錄 大數據組件的常用連線埠及其說明 ·······························································.254
參考文獻 ······································································································.256