大數據套用基礎教程

大數據套用基礎教程

《大數據套用基礎教程》是2023年清華大學出版社出版的圖書,作者是佀同光、張仲妹、王文、徐秀傑、陳佳麗、劉夏、盧文鋒、呂長遠、李永鵬。

基本介紹

  • 中文名:大數據套用基礎教程
  • 作者:佀同光、張仲妹、王文、徐秀傑、陳佳麗、劉夏、盧文鋒、呂長遠、李永鵬
  • 出版時間:2023年8月1日
  • 出版社:清華大學出版社
  • ISBN:9787302633211 
  • 定價:79 元
內容簡介,圖書目錄,

內容簡介

《大數據套用基礎教程》旨在培養大學低年級本科生的大數據套用能力,使其初步建立數據思維,以滿足“新 工科”“新醫科”“新農科”“新文科”建設背景下高校培養學生數據素養能力的新要求。 全書共 3 篇。基礎篇(第 1、2 章)主要內容為大數據概述、Python 及常用類庫;數據分析篇(第 3 ~ 7 章)重點闡述數據獲取、存儲、預處理、可視化和分析方法;大數據平台篇(第 8 ~ 11 章) 著重介紹 Linux 作業系統基礎、大數據管理平台、分散式存儲和分散式處理。全書提供了大量 套用實例,每章後附有習題。為了便於讀者在單機條件下構建分散式環境,附錄中介紹了基於 虛擬機的 Linux 系統安裝、Hadoop 及 Spark 安裝。

圖書目錄

目 錄
CONTENTS
基礎篇
第1章 大數據概述 ··································································003
1.1 數據和大數據 ·········································································003
1.1.1 數據的高速增長 ·······························································003
1.1.2 大數據 ···········································································004
1.1.3 科學的範式 ·····································································006
1.2 大數據從哪裡來 ······································································007
1.3 大數據的套用場景 ···································································008
1.4 大數據對思維方式的影響 ··························································010
1.5 數據挖掘與機器學習 ································································011
1.6 數據科學項目的基本流程 ··························································012
1.7 數據安全和大數據倫理 ·····························································013
1.7.1 數據安全 ········································································013
1.7.2 大數據倫理 ·····································································015
1.8 國家層面的大數據問題 ·····························································016
1.8.1 數據主權 ········································································016
1.8.2 大數據與國家治理 ····························································017
1.8.3 大數據重塑世界新格局 ······················································018
1.8.4 中國國家大數據戰略 ·························································019
1.9 雲計算 ··················································································020
1.9.1 雲計算的特徵 ··································································022
1.9.2 雲計算的典型服務模式 ······················································022
1.9.3 雲計算服務部署的環境 ······················································023
1.9.4 雲計算和大數據的關係 ······················································023
1.10 物聯網 ·················································································023
1.11 數字經濟 ··············································································025
1.11.1 大數據與數字經濟 ··························································026
1.11.2 進一步推動我國數字經濟發展 ···········································029
本章小結 ·······················································································030
習題 ·····························································································032
第2章 Python及常用類庫 ························································033
2.1 Python簡介 ···········································································033
2.1.1 Python的誕生 ·································································033
2.1.2 Python社區 ····································································034
2.1.3 Python的版本 ·································································034
2.1.4 使用Python進行數據分析的原因 ······································036
2.2 Python的安裝與運行 ·······························································037
2.2.1 Anaconda簡介及安裝 ························································037
2.2.2 Python的運行 ·································································041
2.2.3 小結 ··············································································046
2.3 Python語言基礎 ·····································································046
2.3.1 數據結構 ········································································046
2.3.2 代碼結構 ········································································058
2.3.3 小結 ··············································································069
2.4 Python數據分析的常用類庫 ······················································069
2.4.1 NumPy簡介 ····································································069
2.4.2 pandas簡介 ·····································································076
2.4.3 小結 ··············································································095
本章小結 ·······················································································095
習題 ·····························································································096
數據分析篇
第3章 數據獲取 ·····································································101
3.1 數據來源 ···············································································101
3.2 網路數據爬取 ·········································································103
3.2.1 網路爬蟲概述 ··································································103
3.2.2 網頁訪問的基礎知識 ·························································104
3.2.3 網頁數據爬取 ··································································109
3.2.4 網頁內容解析 ··································································111
3.2.5 常見的“爬取與反爬”攻防策略 ··········································115
3.3 網路數據採集器 ······································································118
3.3.1 常見採集器 ·····································································118
3.3.2 八爪魚採集案例 ·······························································118
3.4 使用Selenium獲取數據 ···························································122
3.4.1 安裝Selenium ··································································122
3.4.2 使用Selenium獲取頁面元素 ···············································124
3.4.3 Selenium套用:鏈家二手房數據獲取 ····································126
本章小結 ·······················································································130
習題 ·····························································································130
第4章 數據存儲 ·····································································131
4.1 檔案 ·····················································································131
4.2 傳統資料庫技術 ······································································133
4.2.1 資料庫管理系統 ·······························································133
4.2.2 資料庫的概念模型 ····························································134
4.2.3 關係型資料庫 ··································································135
4.2.4 結構化查詢語言SQL ························································136
4.2.5 MySQL資料庫管理 ··························································137
4.2.6 基於MySQL monitor的基本資料庫操作 ································141
4.2.7 基於HeidiSQL的基本資料庫操作 ········································145
4.3 NoSQL資料庫 ········································································148
4.3.1 NoSQL的發展背景 ···························································148
4.3.2 NoSQL資料庫的類型 ························································149
本章小結 ·······················································································152
習題 ·····························································································152
第5章 數據預處理 ··································································153
5.1 數據質量問題 ·········································································153
5.1.1 現實世界的“髒”數據 ······················································153
5.1.2 數據質量問題的產生原因 ···················································155
5.1.3 數據質量審核 ··································································156
5.2 數據預處理技術 ······································································158
5.2.1 數據清洗 ········································································158
5.2.2 數據集成 ········································································159
5.2.3 數據變換 ········································································160
5.2.4 數據歸約 ········································································161
5.3 預處理案例 ············································································162
本章小結 ·······················································································166
習題 ·····························································································166
第6章 數據可視化 ··································································167
6.1 數據可視化概述 ······································································167
6.1.1 什麼是數據可視化 ····························································167
6.1.2 常用的數據可視化工具 ······················································168
6.1.3 Python可視化工具庫 ························································169
6.2 Matplotlib數據可視化 ·······························································170
6.2.1 Matplotlib繪圖基礎 ··························································170
6.2.2 Matplotlib常用繪圖 ··························································172
6.2.3 使用mplot3d繪製3D圖形 ·················································180
6.3 pandas數據可視化 ··································································185
6.3.1 pandas繪圖基礎 ·······························································185
6.3.2 pandas常用繪圖 ·······························································186
6.4 seaborn數據可視化 ·································································191
6.4.1 seaborn繪圖基礎 ······························································191
6.4.2 seaborn常用繪圖 ······························································197
6.5 pyecharts數據可視化 ·······························································201
6.5.1 pyecharts繪圖基礎 ···························································201
6.5.2 pyecharts常用繪圖 ···························································201
本章小結 ·······················································································208
習題 ·····························································································208
第7章 數據分析方法 ·······························································211
7.1 數據分析方法的數學基礎 ··························································211
7.1.1 理解複合函式求導 ····························································211
7.1.2 理解多元函式偏導 ····························································212
7.1.3 理解最小二乘法 ·······························································212
7.1.4 理解梯度 ········································································213
7.1.5 理解機率 ········································································213
7.1.6 理解條件機率 ··································································214
7.1.7 理解貝葉斯公式 ·······························································214
7.2 回歸 ·····················································································215
7.2.1 回歸的基本概念及方法 ······················································215
7.2.2 回歸預測的性能度量 ·························································217
7.2.3 線性回歸 ········································································218
7.3 分類 ·····················································································227
7.3.1 分類的基本方法 ·······························································227
7.3.2 分類任務的性能度量 ·························································228
7.3.3 邏輯回歸 ········································································229
7.3.4 支持向量機 ·····································································240
7.3.5 決策樹理論 ·····································································254
7.3.6 樸素貝葉斯 ·····································································258
7.3.7 k-近鄰(k-NN)算法 ·························································262
7.4 聚類 ·····················································································266
7.4.1 聚類算法 ········································································266
7.4.2 K-means聚類算法 ····························································267
7.4.3 K-means聚類案例 ····························································268
7.5 文本分析 ···············································································276
7.5.1 文本分析的基本步驟 ·························································277
7.5.2 文本分析的基本概念 ·························································277
7.5.3 文本分析案例 ··································································278
本章小結 ·······················································································286
習題 ·····························································································286
大數據平台篇
第8章 Linux作業系統基礎 ······················································289
8.1 Linux作業系統簡介··································································289
8.1.1 作業系統 ········································································289
8.1.2 Linux作業系統 ································································290
8.1.3 大數據平台基於Linux作業系統的原因 ·································293
8.2 Linux基本命令········································································293
8.2.1 目錄與檔案操作命令 ·························································293
8.2.2 文本過濾與處理 ·······························································298
8.2.3 Shell輸入輸出命令 ···························································300
8.2.4 進程管理命令 ··································································301
8.2.5 日常操作命令 ··································································303
本章小結 ·······················································································306
習題 ·····························································································306
第9章 大數據管理平台 ····························································307
9.1 套用場景 ···············································································307
9.2 發展歷程 ···············································································309
9.3 技術體系 ···············································································311
9.3.1 數據收集層 ·····································································312
9.3.2 數據存儲層 ·····································································313
9.3.3 資源管理層 ·····································································315
9.3.4 計算引擎層 ·····································································315
9.3.5 數據分析層 ·····································································317
9.3.6 數據可視化層 ··································································317
9.3.7 大數據管理平台技術棧 ······················································318
本章小結 ·······················································································319
習題 ·····························································································319
第10章 分散式存儲 ································································321
10.1 HDFS介紹 ···········································································321
10.2 HDFS基本架構 ·····································································323
10.3 HDFS Shell訪問 ···································································325
本章小結 ·······················································································328
習題 ·····························································································328
第11章 分散式處理 ································································329
11.1 分散式計算思想 ·····································································329
11.2 MapReduce ··········································································333
11.2.1 MapReduce介紹 ·····························································333
11.2.2 MapReduce編程模型 ·······················································334
11.2.3 MapReduce程式案例 ·······················································335
11.3 Spark ··················································································341
11.3.1 Spark介紹 ·····································································341
11.3.2 Spark編程模型 ·······························································342
11.3.3 Spark程式案例 ·······························································345
11.4 Spark相對於Hadoop的優勢 ···················································352
本章小結 ·······················································································353
習題 ·····························································································353
參考文獻 ·················································································355
附錄A 基於虛擬機的Linux系統安裝 ··········································359
A.1 虛擬機技術概述 ······································································359
A.2 虛擬機託管軟體安裝 ································································360
A.3 虛擬機Linux安裝 ···································································362
附錄B Hadoop及Spark安裝 ····················································371
B.1 集群基礎配置 ·········································································371
B.2 Hadoop安裝 ··········································································375
B.3 Spark安裝·············································································380

相關詞條

熱門詞條

聯絡我們