自然語言標註：用於機器學習:圖書目錄,內容簡介,作者簡介,名人推薦,

自然語言標註：用於機器學習

《自然語言標註：用於機器學習》是2013年東南大學出版社出版的圖書，作者是普斯特若夫斯基、斯塔布斯。

基本介紹

書名：自然語言標註:用於機器學習
作者：普斯特若夫斯基 (James Pustejovsky) 斯塔布斯 (Amber Stubbs)
出版社：東南大學出版社
出版時間：2013年6月1日
頁數：324 頁
開本：16 開
ISBN：9787564142810
外文名：Natural Language Annotation for Machine Learning
類型：計算機與網際網路
語種：簡體中文, 英語

圖書目錄,內容簡介,作者簡介,名人推薦,

圖書目錄

Preface

1. The Basics

The Importance of Language Annotation

The Layers of Linguistic Description

What Is Natural Language Processing？

A Brief History of Corpus Linguistics

What Is a Corpus？

Early Use of Corpora

Corpora Today

Kinds of Annotation

Language Data and Machine Learning

Classification

Clustering

Structured Pattern Induction

The Annotation Development Cycle

Model the Phenomenon

Annotate with the Specification

Train and Test the Algorithms over the Corpus

Evaluate the Results

Revise the Model and Algorithms

Summary

2. Defining Your Goal and Dataset

Defining Your Goal

The Statement of Purpose

Refining Your Goal： Informativity Versus Correctness

Background Research

Language Resources

Organizations and Conferences

NLP Challenges

Assembling Your Dataset

The Ideal Corpus： Representative and Balanced

Collecting Data from the Internet

Eliciting Data from People

The Size of Your Corpus

Existing Corpora

Distributions Within Corpora

Summary

3. Corpus Analytics

Basic Probability for Corpus Analytics

/oint Probability Distributions

Bayes Rule

Counting Occurrences

Zipf's Law

N—grams

Language Models

Summary

4. Building Your Model and Specification

Some Example Models and Specs

Film Genre Classification

Adding Named Entities

Semantic Roles

Adopting （or Not Adopting） Existing Models

Creating Your Own Model and Specification： Generality Versus Specificity

Using Existing Models and Specifications

Using Models Without Specifications

Different Kinds of Standards

ISO Standards

Community—Driven Standards

Other Standards Affecting Annotation

Summary

5. Applying and Adopting Annotation Standards

Metadata Annotation： Document Classification

Unique Labels： Movie Reviews

Multiple Labels： Film Genres

Text Extent Annotation： Named Entities

Inline Annotation

Stand—off Annotation by Tokens

Stand—off Annotation by Character Location

Linked Extent Annotation： Semantic Roles

ISO Standards and You

Summary

6. Annotation and Adjudication

The Infrastructure of an Annotation Project

Specification Versus Guidelines

Be Prepared to Revise

Preparing Your Data for Annotation

Metadata

Preprocessed Data

Splitting Up the Files for Annotation

Writing the Annotation Guidelines

Example 1： Single Labels——Movie Reviews

Example 2： Multiple Labels——Film Genres

Example 3： Extent Annotations——Named Entities

Example 4： Link Tags——Semantic Roles

Annotators

Choosing an Annotation Environment

Evaluating the Annotations

Cohen's Kappa （K）

Fleiss's Kappa （K）

Interpreting Kappa Coefficients

Calculating K in Other Contexts

Creating the Gold Standard （Adjudication）

Summary

7. Training： Machine Learning

What Is Learning？

Defining Our Learning Task

Classifier Algorithms

Decision Tree Learning

Gender Identification

Naive Bayes Learning

Maximum Entropy Classifiers

Other Classifiers to Know About

Sequence Induction Algorithms

Clustering and Unsupervised Learning

Semi—Supervised Learning

Matching Annotation to Algorithms

Summary

8. Testinq and Evaluation

9. Revising and Reporting

10. Annotation： TimeML

11. Automatic Annotation： Generating TimeML

A. List of Available Corpora and Specifications

B. List of Software Resources

C. MAE UserGuide.

D. MAI UserGuide

E. Bibliography

Index

內容簡介

可以手把手地指導你一種經驗證的標註開發周期一一把元語添加到你的訓練語料庫中來幫助機器學習算法更有效工作的過程。你無需任何編程或者語言學方面的經驗就可以上手。《自然語言標註:用於機器學習(影印版)》通過每一步中的詳細示例，你將學到“標註開發過程”是如何幫助你建模、標註、訓練、測試、評估和修正你的訓練語料庫。你也將了解到一個實際標註項目的完整演示。

作者簡介

作者：（美國）普斯特若夫斯基（James Pustejovsky）（美國）斯塔布斯（Amber Stubbs）

是Brandeis大學的教授，他在該大學的計算機科學系講解和研究人工智慧及計算語言學。剛剛獲得了Brandeis大學標註方法論的博士學位。她現在是SUNYAlbany大學的博士後

名人推薦

“語言標註是自然語言處理的關鍵環節，但是它很少在計算語言學課程中被提及。這是第一本手把手講解標註的書籍，從規範和設計到使用機器學習算法面面俱到。它必然成為本科和研究生的計算語言學課程的範本。”

——Nancy IdeVassar學院的計算機科學教授

自然語言標註：用於機器學習

基本介紹

圖書目錄

內容簡介

作者簡介

名人推薦

相關詞條

熱門詞條