《數據準備和特徵工程——數據工程師必知必會技能》是2020年3月電子工業出版社出版的圖書,作者是齊偉。
基本介紹
- 中文名:數據準備和特徵工程——數據工程師必知必會技能
- 作者:齊偉
- ISBN:9787121382635
- 頁數:208頁
- 定價:45元
- 出版社:電子工業出版社
- 出版時間:2020年3月
- 開本:16開
內容簡介,圖書目錄,
內容簡介
本書詳細地介紹了大數據、人工智慧等項目中不可或缺的環節和內容:數據準備和特徵工程。書中的每節首先以簡明方式介紹了基本知識;然後通過實際案例演示了基本知識的實際套用,並提供了針對性練習項目,將“知識、案例、練習”融為一體;最後以“擴展探究”方式引導讀者進入更深廣的領域。本書既適合作為大學相關專業的教材,也適合作為大數據、人工智慧等領域的開發人員的參考讀物。
圖書目錄
目錄
第1 章 感知數據 ·································.001
1.0 了解數據科學項目 ································001
1.1 檔案中的數據 ··································003
1.1.1 CSV檔案 ····································003
1.1.2 Excel檔案 ···································009
1.1.3 圖像檔案 ···································015
1.2 資料庫中的數據 ·································019
1.3 網頁上的數據 ··································029
1.4 來自API 的數據 ·································039
第2 章 數據清理 ··································044
2.0 基本概念 ····································045
2.1 轉化數據類型 ··································046
2.2 處理重複數據 ··································054
2.3 處理缺失數據 ··································057
2.3.1 檢查缺失數據 ·································058
2.3.2 用指定值填補 ·································063
2.3.3 根據規律填補 ·································069
2.4 處理離群數據 ··································076
第3 章 特徵變換 ···································083
3.0 特徵的類型 ···································084
3.1 特徵數值化 ···································085
3.2 特徵二值化 ···································088
3.3 OneHot編碼 ···································093
3.4 數據變換 ····································098
3.5 特徵離散化 ···································104
3.5.1 無監督離散化 ·································104
3.5.2 有監督離散化 ·································110
3.6 數據規範化 ···································113
第4 章 特徵選擇 ···································124
4.0 特徵選擇簡述 ··································124
4.1 封裝器法 ····································127
4.1.1 循序特徵選擇 ·································127
4.1.2 窮舉特徵選擇 ·································135
4.1.3 遞歸特徵消除 ·································140
4.2 過濾器法 ····································144
4.3 嵌入法 ·····································149
第5 章 特徵抽取 ···································154
5.1 無監督特徵抽取··································154
5.1.1 主成分分析 ··································154
5.1.2 因子分析 ···································161
5.2 有監督特徵抽取 ·································167
附錄A Jupyter簡介 ·································173
附錄B NumPy簡介 ··································176
附錄C Pandas簡介 ··································185
附錄D Matplotlib簡介 ································194
後記 ········································199