構建數據湖倉

構建數據湖倉

《構建數據湖倉》是2023年清華大學出版社出版的圖書,作者是(美)比爾·恩門(Bill Inmon)、(美)瑪麗·萊文斯(Mary Levins)、(美)蘭吉特·斯里瓦斯塔瓦(Ranjeet Srivastava)。

基本介紹

  • 中文名:構建數據湖倉
  • 作者:(美)比爾·恩門(Bill Inmon)、(美)瑪麗·萊文斯(Mary Levins)、(美)蘭吉特·斯里瓦斯塔瓦(Ranjeet Srivastava)
  • 出版時間:2023年3月1日
  • 出版社:清華大學出版社
  • ISBN:9787302624479 
  • 定價:68 元
內容簡介,圖書目錄,

內容簡介

在數據湖倉的所有新增要素中,排名第一的就是可以利於數據分析和機器學習所用的分析基礎設施。分析基礎設施包括一眾大家廣為熟悉的東西,當然也包括一些可能對大家還有些陌生或略帶新鮮感的概念。比如包括:元數據、數據血緣、 數據體量的度量 、數據創建的歷史記錄、數據轉換描述。
數據湖倉的第二個新增要素,是識別和使用通用連線器。通用連線器允許合併和比較所有不同來源的數據。如果沒有通用連線器,就很難(實際上是幾乎不可能)將數據湖倉中的不同數據關聯起來。但有了這箇中西,就可以關聯任何類型的數據。使用數據湖倉,就有可能實現以往任何其它方式都不可行或不可能實現的某種程度的數據分析和機器學習。 但與其它架構一樣,我們需要理解數據湖倉的架構以及它的能力,以便於我們基於這種架構創建數據分析藍圖和開展數據分析規劃。

圖書目錄

目 錄
引 言
第一章 向數據湖倉演進
1. 技術的演進 ······································································3
2. 組織內的全部數據 ······························································8
3. 商業價值在哪裡? ··························································· 12
4. 數據湖 ··········································································· 13
5. 當前數據架構的挑戰 ························································· 14
6. 數據湖倉的出現 ······························································· 15
第二章 數據科學家和終端用戶
1. 數據湖 ·········································································· 20
2. 分析基礎設施 ································································· 21
3. 不同的客群 ····································································· 21
4. 分析工具不同 ·································································· 22
5. 分析目的不同 ·································································· 23
6. 分析方法不同 ·································································· 24
7. 數據類型不同 ·································································· 24
第三章 數據湖倉中的不同類型數據
1. 數據的類型 ····································································· 28
2. 不同數據的容量 ······························································· 31
3. 跨越不同類型數據的關聯數據 ············································· 32
4. 基於訪問機率對數據進行分片 ············································· 33
5. 模擬和物聯網環境中的關聯數據 ·········································· 33
6. 分析基礎設施 ································································· 35
第四章 開放的湖倉環境
1. 開放系統的演進 ······························································· 38
2. 與時俱進的創新 ······························································ 39
3. 建立在開放、標準檔案格式之上的非結構化湖倉 ······················ 39
4. 開源數據湖倉軟體 ···························································· 40
5. 數據湖倉提供超越 SQL 的開放 API······································· 41
6. 數據湖倉支持開放數據共享 ················································ 42
7. 數據湖倉支持開放數據探索 ················································ 43
8. 數據湖倉通過開放數據目錄簡化數據發現 ······························ 44
9. 利用雲原生架構的數據湖倉 ················································ 45
10. 向開放的數據湖倉演進 ···················································· 46
第五章 機器學習和數據湖倉
1. 機器學習 ········································································ 47
2. 機器學習需要湖倉提供什麼? ············································· 48
3. 從數據中挖掘出新價值 ····················································· 48
4. 解決這個難題 ·································································· 48
5. 非結構化數據問題 ··························································· 49
6. 開源的重要性 ·································································· 51
7. 發揮雲的彈性優勢 ··························································· 51
8. 為數據平台設計“MLOps”··················································52
9. 案例:運用機器學習對胸透 X 光片進行分類 ··························· 53
10. 數據湖倉的非結構化組件的演進 ········································· 55
第六章 數據湖倉中的分析基礎設施
1. 元數據 ··········································································· 58
2. 數據模型 ······································································· 59
3. 數據質量 ······································································· 60
4. ETL ·············································································· 61
5. 文本 ETL········································································ 62
6. 分類標準 ········································································ 62
7. 數據體量 ······································································· 63
8. 數據血緣 ········································································ 64
9. KPI ··············································································· 65
10. 數據的粒度 ··································································· 66
11. 事務 ············································································ 66
12. 鍵 ··············································································· 66
13. 處理計畫 ······································································ 67
14. 匯總數據 ····································································· 67
15. 最低要求 ······································································ 68
第七章 數據湖倉中的數據融合
1. 湖倉和數據湖倉 ······························································ 69
2. 數據的源頭 ···································································· 70
3. 不同類型的分析 ······························································ 70
4. 通用標識符 ····································································· 72
5. 結構化標識符 ································································· 72
6. 重複數據 ······································································· 73
7. 文本環境中的標識符 ························································ 74
8. 文本數據和結構化數據的融合 ············································· 76
9. 匹配的重要性 ································································· 81
第八章 跨數據湖倉架構的分析類型
1. 已知查詢 ········································································ 83
2. 啟發式分析 ····································································· 85
第九章 數據湖倉倉務管理
1. 數據集成和互操作 ···························································· 92
2. 數據湖倉的主數據及參考數據 ············································· 94
3. 數據湖倉的隱私、保密和數據保護 ········································ 96
4. 數據湖倉中面向未來的數據 ················································ 97
5. 面向未來的數據的五個階段 ··············································· 101
6. 數據湖倉的例行維護 ························································ 108
第十章 可視化
1. 將數據轉化為信息 ··························································· 110
2. 什麼是數據可視化?為什麼它很重要? ································· 112
3. 數據可視化、數據分析和數據解釋之間的差異 ························ 113
4. 數據可視化的優勢 ··························································· 115
第十一章 數據湖倉架構中的數據血緣
1. 計算鏈 ·········································································· 124
2. 數據選取 ······································································· 126
3. 算法差異 ······································································· 126
4. 文本數據血緣 ································································· 127
5. 其他非結構化環境的數據血緣 ············································ 128
6. 數據血緣 ······································································· 129
第十二章 數據湖倉架構中的訪問機率
1. 數據的高效排列 ······························································ 131
2. 數據的訪問機率 ······························································ 131
3. 數據湖倉中不同的數據類型 ··············································· 133
4. 數據量的相對差異 ··························································· 133
5. 數據分片的優勢 ······························································ 134
6. 使用大容量存儲 ······························································ 134
7. 附加索引 ······································································· 135
第十三章 跨越鴻溝
1. 合併數據 ······································································· 136
2. 不同種類的數據 ······························································ 137
3. 不同的業務需求 ······························································ 137
4. 跨越鴻溝 ······································································· 137
第十四章 數據湖倉中的海量數據
1. 海量數據的分布 ······························································ 145
2. 高性能、大容量的數據存儲 ··············································· 146
3. 附加索引和摘要 ······························································ 146
4. 周期性的數據過濾 ··························································· 148
5. 數據標記法 ···································································· 148
6. 分離文本和資料庫 ··························································· 149
7. 歸檔存儲 ······································································· 149
8. 監測活動 ······································································· 150
9. 並行處理 ······································································· 151
第十五章 數據治理與數據湖倉
1. 數據治理的目的 ······························································ 152
2. 數據生命周期管理 ··························································· 154
3. 數據質量管理 ································································· 156
4. 元數據管理的重要性 ························································ 157
5. 隨著時間推移的數據治理 ·················································· 157
6. 數據治理的類型 ······························································ 158
7. 貫穿數據湖倉的數據治理 ·················································· 159
8. 數據治理的注意事項 ························································ 160
第十六章 現代數據倉庫
1. 應用程式的普及 ······························································ 162
2. 信息孤島 ······································································· 163
3. 複雜網路環境 ································································· 164
4. 數據倉庫 ······································································· 165
5. 數據倉庫的定義 ······························································ 166
6. 歷史數據 ······································································· 167
7. 關係模型 ······································································· 167
8. 數據的本地形式 ······························································ 168
9. 集成數據的需要 ······························································ 169
10. 時過境遷 ····································································· 170
11. 當今世界 ····································································· 170
12. 不同體量的數據····························································· 172
13. 數據與業務的關係 ·························································· 173
14. 將數據納入數據倉庫 ······················································· 173
15. 現代數據倉庫 ······························································· 174
16. 什麼時候我們不再需要數據倉庫? ····································· 175
17. 數據湖 ········································································ 176
18. 以數據倉庫作為基礎 ······················································· 177
19. 數據堆疊 ····································································· 178

相關詞條

熱門詞條

聯絡我們