自然語言標註:用於機器學習

自然語言標註:用於機器學習

《自然語言標註:用於機器學習》是2013年東南大學出版社出版的圖書,作者是普斯特若夫斯基、斯塔布斯。

基本介紹

  • 書名:自然語言標註:用於機器學習
  • 作者:普斯特若夫斯基 (James Pustejovsky) 斯塔布斯 (Amber Stubbs)
  • 出版社:東南大學出版社
  • 出版時間:2013年6月1日
  • 頁數:324 頁
  • 開本:16 開
  • ISBN:9787564142810 
  • 外文名:Natural Language Annotation for Machine Learning
  • 類型:計算機與網際網路
  • 語種:簡體中文, 英語
圖書目錄,內容簡介,作者簡介,名人推薦,

圖書目錄

Preface
1. The Basics
The Importance of Language Annotation
The Layers of Linguistic Description
What Is Natural Language Processing?
A Brief History of Corpus Linguistics
What Is a Corpus?
Early Use of Corpora
Corpora Today
Kinds of Annotation
Language Data and Machine Learning
Classification
Clustering
Structured Pattern Induction
The Annotation Development Cycle
Model the Phenomenon
Annotate with the Specification
Train and Test the Algorithms over the Corpus
Evaluate the Results
Revise the Model and Algorithms
Summary
2. Defining Your Goal and Dataset
Defining Your Goal
The Statement of Purpose
Refining Your Goal: Informativity Versus Correctness
Background Research
Language Resources
Organizations and Conferences
NLP Challenges
Assembling Your Dataset
The Ideal Corpus: Representative and Balanced
Collecting Data from the Internet
Eliciting Data from People
The Size of Your Corpus
Existing Corpora
Distributions Within Corpora
Summary
3. Corpus Analytics
Basic Probability for Corpus Analytics
/oint Probability Distributions
Bayes Rule
Counting Occurrences
Zipf's Law
N—grams
Language Models
Summary
4. Building Your Model and Specification
Some Example Models and Specs
Film Genre Classification
Adding Named Entities
Semantic Roles
Adopting (or Not Adopting) Existing Models
Creating Your Own Model and Specification: Generality Versus Specificity
Using Existing Models and Specifications
Using Models Without Specifications
Different Kinds of Standards
ISO Standards
Community—Driven Standards
Other Standards Affecting Annotation
Summary
5. Applying and Adopting Annotation Standards
Metadata Annotation: Document Classification
Unique Labels: Movie Reviews
Multiple Labels: Film Genres
Text Extent Annotation: Named Entities
Inline Annotation
Stand—off Annotation by Tokens
Stand—off Annotation by Character Location
Linked Extent Annotation: Semantic Roles
ISO Standards and You
Summary
6. Annotation and Adjudication
The Infrastructure of an Annotation Project
Specification Versus Guidelines
Be Prepared to Revise
Preparing Your Data for Annotation
Metadata
Preprocessed Data
Splitting Up the Files for Annotation
Writing the Annotation Guidelines
Example 1: Single Labels——Movie Reviews
Example 2: Multiple Labels——Film Genres
Example 3: Extent Annotations——Named Entities
Example 4: Link Tags——Semantic Roles
Annotators
Choosing an Annotation Environment
Evaluating the Annotations
Cohen's Kappa (K)
Fleiss's Kappa (K)
Interpreting Kappa Coefficients
Calculating K in Other Contexts
Creating the Gold Standard (Adjudication)
Summary
7. Training: Machine Learning
What Is Learning?
Defining Our Learning Task
Classifier Algorithms
Decision Tree Learning
Gender Identification
Naive Bayes Learning
Maximum Entropy Classifiers
Other Classifiers to Know About
Sequence Induction Algorithms
Clustering and Unsupervised Learning
Semi—Supervised Learning
Matching Annotation to Algorithms
Summary
8. Testinq and Evaluation
9. Revising and Reporting
10. Annotation: TimeML
11. Automatic Annotation: Generating TimeML
A. List of Available Corpora and Specifications
B. List of Software Resources
C. MAE UserGuide.
D. MAI UserGuide
E. Bibliography
Index

內容簡介

可以手把手地指導你一種經驗證的標註開發周期一一把元語添加到你的訓練語料庫中來幫助機器學習算法更有效工作的過程。你無需任何編程或者語言學方面的經驗就可以上手。《自然語言標註:用於機器學習(影印版)》通過每一步中的詳細示例,你將學到“標註開發過程”是如何幫助你建模、標註、訓練、測試、評估和修正你的訓練語料庫。你也將了解到一個實際標註項目的完整演示。

作者簡介

作者:(美國)普斯特若夫斯基(James Pustejovsky) (美國)斯塔布斯(Amber Stubbs)
是Brandeis大學的教授,他在該大學的計算機科學系講解和研究人工智慧及計算語言學。剛剛獲得了Brandeis大學標註方法論的博士學位。她現在是SUNYAlbany大學的博士後

名人推薦

“語言標註是自然語言處理的關鍵環節,但是它很少在計算語言學課程中被提及。這是第一本手把手講解標註的書籍,從規範和設計到使用機器學習算法面面俱到。它必然成為本科和研究生的計算語言學課程的範本。”
——Nancy IdeVassar學院的計算機科學教授

相關詞條

熱門詞條

聯絡我們