《數據科學(影印版)(英文版)》將會告訴你所需要了解的一切。它富有深刻見解,是根據哥倫比亞大學的數據科學課程的講義整理而成。現在人們已經意識到數據可以讓選舉或者商業模式變得不同,數據科學作為一項職業正在不斷發展。但是你應該如何在這樣一個廣闊而又錯綜複雜的交叉學科領域中開展工作呢?
基本介紹
- 書名:數據科學
- 作者:舒特 (Rachel Schutt) 奧尼爾 (Cathy 0'Neil)
- 出版社:東南大學出版社
- 頁數:375頁
- 開本:16
- 外文名:Doing Data Science
- 類型:科技
- 出版日期:2014年9月1日
- 語種:簡體中文, 英語
- 品牌:南京東南大學出版社
基本介紹
內容簡介
作者簡介
舒特(Rachel Schutt),新聞集團數據科學高級副總裁,是哥倫比亞大學的統計學兼職教授,也是數據科學和工程學院教育委員會的創始會員。
奧尼爾(Cathy O'Neil),Johnson研究實驗室的高級數據科學家,具有哈佛大學的數學博士學位,是麻省理工學院數學系的博士後,曾經是巴納德學院的教授。
圖書目錄
1.Introduction: What Is Data Science?
Big Data and Data Science Hype
Getting Past the Hype
Why Now?
Datafication
The Current Landscape (with a Little History)
Data Science lobs
A Data Science Profile
Thought Experiment: Meta—Definition
OK, So What Is a Data Scientist, Really?
In Academia
In Industry
2.Statistical Inference, Exploratory Data Analysis, and the Data Science
Process
Statistic.al Thinking in the Age of Big Data
Statistical Inference
Populations and Samples
Populations and Samples of Big Data
Big Data Can Mean Big Assumptions
Modeling
Exploratory Data Analysis
Philosophy of Exploratory Data Analysis
Exercise: EDA
The Data Science Process
A Data Scientist's Role in This Process
Thought Experiment: How Would You Simulate Chaos?
Case Study: RealDirect
How Does RealDirect Make Money?
Exercise: RealDirect Data Strategy
3.Algorithms
Machine Learning Algorithms
Three Basic Algorithms
Linear Regression
k—Nearest Neighbors (k—NN)
k—means
Exercise: Basic Machine Learning Algorithms
Solutions
Summing It All Up
Thought Experiment: Automated Statistician
4.Spare Filters, Naive Bayes, and Wrangling
Thought Experiment: Learning by Example
Why Won't Linear Regression Work for Filtering Spare?
How About k—nearest Neighbors?
Naive Bayes
Bayes Law
A Spare Filter for Individual Words
A Spam Filter That Combines Words: Naive Bayes
Fancy It Up: Laplace Smoothing
Comparing Naive Bayes to k—NN
Sample Code in bash
Scraping the Web: APIs and Other Tools
Jake's Exercise: Naive Bayes for Article Classification
Sample R Code for Dealing with the NYT API
5.Logistic Regression
Thought Experiments
Classifiers
Runtime
You
Interpretability
Scalability
M6D Logistic Regression Case Study
Chck Models
The Underlying Math
Estimating α and β
Newton's Method
Stochastic Gradient Descent
Implementation
Evaluation
Media 6 Degrees Exercise
Sample R Code
6.1ime Stamps and Financial Modeling
Kyle Teague and GetGlue
Timestamps
Exploratory Data Analysis (EDA)
Metrics and New Variables or Features
What's Nextl
Cathy O'Neil
Thought Experiment
Financial Modeling
In—Sample, Out—of—Sample, and Causality
Preparing Financial Data
Log Returns
Example: The S&P Index
Working out a Volatility Measurement
Exponential Downweighting
The Financial Modeling Feedback Loop
Why Regression?
Adding Priors
A Baby Model
Exercise: GetGlue and Timestamped Event Data
Exercise: Financial Data
7.Extracting Meaning from Data
William Cukierski
Background: Data Science Competitions
Background: Crowdsourcing
The Kaggle Model
A Single Contestant
Their Customers
Thought Experiment: What Are the Ethicallmplications of a Robo—Grader?
Feature Selection
Example: User Retention
Filters
Wrappers
Embedded Methods: Decision Trees
Entropy
The Decision Tree Algorithm
Handling Continuous Variables in Decision Trees
Random Forests
User Retention: Interpretability Versus Predictive Power
David Huffaker: Google's Hybrid Approach to Social Research
Moving from Descriptive to Predictive
Social at Google
Privacy
Thought Experiment: What Is the Best Way to Decrease Concern and Increase Understanding and Control?
8.Recommendation Engines:Building a User—Facing Data Product at Scale
A Real—World Recommendation Engine
Nearest Neighbor Algorithm Review
Some Problems with Nearest Neighbors
Beyond Nearest Neighbor: Machine Learning Classification
The Dimensionality Problem
Singular Value Decomposition (SVD)
Important Properties of SVD
Principal Component Analysis (PCA)
Alternating Least Squares
Fix V and Update U
Last Thoughts on These Algorithms
Thought Experiment: Filter Bubbles
Exercise: Build Your Own Recommendation System
Sample Code in Python
9.Data Visualization and Fraud Detection
Data Visualhation History
Gabriel Tarde
Mark's Thought Experiment
What Is Data Science, Redux?
Processing
Franco Moretti
A Sample of Data Visualization Projects
Mark's Data Visualization Projects
New York Times Lobby: Moveable Type
Project Cascade: Lives on a Screen
Cronkite Plaza
eBay Transactions and Books
Public Theater Shakespeare Machine
Goals of These Exhibits
Data Science and Risk
About Square
The Risk Challenge
The Trouble with Performance Estimation
Model Building Tips
Data Visualization at Square
Ian's Thought Experiment
Data Visualization for the Rest ofUs
Data Visualization Exercise
……
10.Social Networks and Data Journalism
11.Causality
12.Epidemiology
13.Lessons Learned from Data Competitions:Data Leakage and Model Evaluation
14.Data Engineering:MapReduce,Pregel,and Hadoop
15.The Students Speak
16.Next—Generation Data Scientists,Hubris,and Ethics
Index