語料庫語言學和計算語言學為促進自然語言處理技術快速發展的兩門基礎學科。《英語語料庫與自動語法分析》系這兩個領域的一本專著,它以國際英語語料庫為背景,著重探討大型語料庫的語法分析,尤其是英語口語材料給計算機自動處理帶來的一系列難題。
基本介紹
- 書名:英語語料庫與自動語法分析(精)
- 作者:方稱宇
- 出版社:商務印書館
- 出版時間:2007-11-1
基本信息,內容提要,編輯推薦,目錄,
基本信息
ISBN:10位[7100056594] 13位[9787100056595]
定價:¥38.00 元
內容提要
書中涉及基於機率的自動詞類識別和基於實例的自動句法分析這兩大技術,並有專門章節來探討句法分析的評測問題,對AUTASYS和The Survey Parser這兩個軟體系統的實際表現進行了深入的量化評測。此外,本書還探討了介詞短語的自動分析,特別是這類短語的句法功能的自動判定,並對自動語法分析在語音合成及語音識別中的套用做了相應的說明。
編輯推薦
本書的主要思路就是將已經分析過的語料庫變成一個句法知識庫,從中提取短語結構語法規則,並通過基於實例的手段,在知識庫中為待分析語句提取一棵最佳句法樹。本書對上述各個部分的研究進行了詳細的描述,對系統的實際表現進行了深入的量化評測,並有專門章節來探討句法分析的評測問題。除此之外,還探討了介詞短語的自動分析,特別是這類短語的句法功能的自動判定,因為這一研究和句法相似度分析有著密切的關係。同時,本書還就自動語法分析在語音合成及語音識別中的套用做了相應的介紹和說明,希望對讀者能有所幫助。
目錄
Preface
前言
List of Figures
List of Tables
Abstract
1. Introduction
1.1. What is Parsing?
1.2. The Introspective View
1.3. The Retrospective View
1.4. Data-Oriented Parsing
1.5. General Problems
1.6. The Proposed Research
1.6.1. Background to the Proposed Research
1.6.2. The Basic Approach of the Proposed Research
1.6.3. The Strengths and Novelties of the Proposed Approach
1.6.3.1. Automated Grammar Generation
1.6.3.2. De-Lexicalised Terminal Nodes
1.6.3.3. Global Parse with Subcategorisation Features
1.6.3.4. High-Quality Partial Parse
1.6.3.5. Intrinsic Ability to Learn
1.7. The Organisation of the Book
2. The Automatic Analysis of English Word Classes
2.1. An Overview of Word Class Tagging
2.2. Major Word Class Tagging Schemes
2.2.1. The Lancaster-Oslo/Bergen Tagging Scheme
2.2.1.1. The Lancaster-Oslo-Bergen Corpus
2.2.1.2. The Lancaster-Oslo-Bergen Tag Set
2.2.1.3. Summary
2.2.2. The International Corpus of English Tagging Scheme
2.2.2.1. The International Corpus of English
2.2.2.2. The International Corpus of English Tag Set
2.2.3. A Comparison of LOB and ICE
2.3. Word Class Tagging Methodologies
2.3.1. The Rule-Based Approach
2.3.2. The Probabilistic Approach
2.4. AUTASYS: A Hybrid Tagging System
2.4.1. A Probabilistic Approach Using the LOB Tag Set
2.4.1.1. The Tag Assignment Module
2.4.1.1.1. Tokenisation
2.4.1.1.2. The treatment of"."
2.4.1.1.3. The treatment of"'"
2.4.1.1.4. Sentence boundary markers
2.4.1.2. Orthographic Analysis
2.4.1.3. Lexicon Lookup
2.4.1.3.1. The lexicon
2.4.1.3.2. The coverage of the lexicon
2.4.1.4. Morphological Analysis
2.4.2. The Idiom Identification Module
2.4.3. The Probabilistic Tag Selection Module
2.4.3.1. The Bigram Probabilistic Matrix
2.4.3.2. Implementing Probabilistic Tag Selection
2.4.4. The Rule-Based Refinement Module
2.4.5. Empirical Evaluation
2.4.6. Permissive AUTASYS-LOB Disagreements
2.4.6.1. NNP-NPT
2.4.6.2. JJ-JJB
2.4.6.3. NNP-NPL
2.4.6.4. RB-NN
2.4.7. Summary
2.5. A Rule-Based Approach towards LOB to ICE Translation
2.5.1. Solutions for Verbs
2.5.1.1. Auxiliary vs. Lexical
2.5.1.2. Monotransitive vs. Complex Transitive
2.5.1.3. Finite vs. Nonfinite
2.5.2. Closed Sets
2.5.3. Initial Results
2.5.4. Problems
2.5.5. Summary
3. The Automatic Induction of a Formal Grammar
4. Robust Practical Analogy-Based Parsing
5. Extensive Evaluations of the Survey Parser
6. The Resolution of Prepositional Phrases
7. Conclusions and Further Work
References
Appendix A: A List of LOB Tags
Appendix B: A List of ICE Tags
Appendix C: A List of AUTASYS Idioms
Appendix D: A List of ICE Parsing Symbols
Appendix E: A List of ICE Prepositions in Descending Frequency Order
Appendix F: A Distributional Profile of ICE-GB Prepositions
Index