《Web數據挖掘:挖掘Web內容模式、結構和用途》是2007年Wiley Blackwell出版的圖書,作者是Zdravko Markov。
基本介紹
- 書名:Web數據挖掘:挖掘Web內容模式、結構和用途
- 作者:Zdravko Markov
- 出版社:Wiley Blackwell
- 出版時間:2007年04月01日
圖書信息
圖書簡介
內容簡介
內容截圖
目錄
- PREFACE
PART I: WEB STRUCTURE MINING
1 INFORMATION RETRIEVAL AND WEB SEARCH
Web Challenges
Web Search Engines
Topic Directories
Semantic Web
Crawling the Web
Web Basics
Web Crawlers
Indexing and Keyword Search
Document Representation
Implementation Considerations
Relevance Ranking
Advanced Text Search
Using the HTML Structure in Keyword Search
Evaluating Search Quality
Similarity Search
Cosine Similarity
Jaccard Similarity
Document Resemblance
References
Exercises
2 HYPERLINK-BASED RANKING
Introduction
Social Networks Analysis
PageRank
Authorities and Hubs
Link-Based Similarity Search
Enhanced Techniques for Page Ranking
References
Exercises
PART II: WEB CONTENT MINING
3 CLUSTERING
Introduction
Hierarchical Agglomerative Clustering
k-Means Clustering
Probabilty-Based Clustering
Finite Mixture Problem
Classification Problem
Clustering Problem
Collaborative Filtering (Recommender Systems)
References
Exercises
4 EVALUATING CLUSTERING
Approaches to Evaluating Clustering
Similarity-Based Criterion Functions
Probabilistic Criterion Functions
MDL-Based Model and Feature Evaluation.
Minimum Description Length Principle.
MDL-Based Model Evaluation
Feature Selection
Classes-to-Clusters Evaluation
Precision, Recall, and F-Measure
Entropy
References
Exercises
5 CLASSIFICATION
General Setting and Evaluation Techniques
Nearest-Neighbor Algorithm
Feature Selection
Naive Bayes Algorithm
Numerical Approaches
Relational Learning
References
Exercises
PART III: WEB USAGE MINING
6 INTRODUCTION TO WEB USAGE MINING
Definition of Web Usage Mining
Cross-Industry Standard Process for Data Mining
Clickstream Analysis
Web Server Log Files
Remote Host Field
Date/Time Field
HTTP Request Field
Status Code Field
Transfer Volume (Bytes) Field
Common Log Format
Identification Field
Authuser Field
Extended Common Log Format
Referrer Field
User Agent Field
Example of a Web Log Record
Microsoft IIS Log Format
Auxiliary Information
References
Exercises
7 PREPROCESSING FOR WEB USAGE MINING
Need for Preprocessing the Data
Data Cleaning and Filtering
Page Extension Exploration and Filtering
De-Spidering the Web Log File
User Identification
Session Identification
Path Completion
Directories and the Basket Transformation
Further Data Preprocessing Steps
References
Exercises
8 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING
Introduction
Number of Visit Actions
Session Duration
Relationship between Visit Actions and Session Duration
Average Time per Page
Duration for Individual Pages
References
Exercises
9 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION
Introduction
Modeling Methodology
Definition of Clustering
The BIRCH Clustering Algorithm
Affinity Analysis and the A Priori Algorithm
Discretizing the Numerical Variables: Binning
Applying the A Priori Algorithm to the CCSU Web Log Data
Classification and Regression Trees
The C4.5 Algorithm
References
Exercises
INDEX