圖書詳細信息,圖書簡介,Preface,Contents,
圖書詳細信息
System Parameter Identification: Information Criteria and Algorithms(系統參數辨識的信息準則及算法)
作者:陳霸東、朱煜、胡金春、[美]Jose C. Principe
作者:陳霸東、朱煜、胡金春、[美]Jose C. Principe
ISBN:9787302359418
定價:120元
印次:1-1
裝幀:精裝
印刷日期:2014-4-21
定價:120元
印次:1-1
裝幀:精裝
印刷日期:2014-4-21
圖書簡介
本書系統地介紹系統參數辨識信息準則及算法的最新研究成果,主要內容包括資訊理論基本概念、基於資訊理論的參數估計、基於最小誤差熵準則的系統辨識、基於最小信息距離的系統辨識、基於互信息準則的系統辨識。
本書在2011年出版的同名中文版圖書基礎上進行了修訂和增補,適合高等院校或研究機構從事系統辨識、信號處理、機器學習等研究工作的師生以及其他科技工作者閱讀參考。
Preface
System identification is a common method for building the mathematical model of a physical plant, which is widely utilized in practical engineering situations. In general, the system identification consists of three key elements, i.e., the data, the model, and the criterion. The goal of identification is then to choose one from a set of candidate models to fit the data best according to a certain criterion. The criterion function is a key factor in system identification, which evaluates the con-sistency of the model to the actual plant and is, in general, an objective function for developing the identification algorithms. The identification performances, such as the convergence speed, steady-state accuracy, robustness, and the computational complexity, are directly related to the criterion function.
Well-known identification criteria mainly include the least squares (LS) crite-rion, minimum mean square error (MMSE) criterion, and the maximum likelihood (ML) criterion. These criteria provide successful engineering solutions to most practical problems, and are still prevalent today in system identification. However, they have some shortcomings that limit their general use. For example, the LS and MMSE only consider the second-order moment of the error, and the identification performance would become worse when data are non-Gaussian distributed (e.g., with multimodal, heavy-tail, or finite range). The ML criterion requires the knowledge of the conditional probability density function of the observed samples, which is not available in many practical situations. In addition, the computational complexity of the ML estimation is usually high. Thus, selecting a new criterion beyond second-order statistics and likelihood function is attractive in problems of system identification.
In recent years, criteria based on information theoretic descriptors of entropy and dissimilarity (divergence, mutual information) have attracted lots of attentions and become an emerging area of study in signal processing and machine learning domains. Information theoretic criteria (or briefly, information criteria) can capture higher order statistics and information content of signals rather than simply their energy. Many studies suggest that information criteria do not suffer from the limita-tion of Gaussian assumption and can improve performance in many realistic sce-narios. Combined with nonparametric estimators of entropy and divergence, many adaptive identification algorithms have been developed, including the practical gradient-based batch or recursive algorithms, fixed-point algorithms (no step-size), or other advanced search algorithms. Although many elegant results and techniques have been developed over the past few years, till now there is no book devoted to a systematic study of system identification under information theoretic criteria. The
primary focus of this book is to provide an overview of these developments, with emphasis on the nonparametric estimators of information criteria and gradient-based identification algorithms. Most of the contents of this book originally appeared in the recent papers of the authors.
The book is divided into six chapters: the first chapter is the introduction to the information theoretic criteria and the state-of-the-art techniques; the second chapter presents the definitions and properties of several important information measures; the third chapter gives an overview of information theoretic approaches to parameter estimation; the fourth chapter discusses system identification under minimum error entropy criterion; the fifth chapter focuses on the minimum infor-mation divergence criteria; and the sixth chapter changes the focus to the mutual information-based criteria.
It is worth noting that the information criteria can be used not only for system parameter identification but also for system structure identification (e.g., model selection). The Akaike’s information criterion (AIC) and the minimum description length (MDL) are two famous information criteria for model selection. There have been several books on AIC and MDL, and in this book we don’t discuss them in detail. Although most of the methods in this book are developed particularly for system parameter identification, the basic principles behind them are universal. Some of the methods with little modification can be applied to blind source sepa-ration, independent component analysis, time series prediction, classification and pattern recognition.
This book will be of interest to graduates, professionals, and researchers who are interested in improving the performance of traditional identification algorithms and in exploring new approaches to system identification, and also to those who are interested in adaptive filtering, neural networks, kernel methods, and online machine learning.
The authors are grateful to the National Natural Science Foundation of China and the National Basic Research Program of China (973 Program), which have funded this book. We are also grateful to the Elsevier for their patience with us over the past year we worked on this book. We also acknowledge the support and encouragement from our colleagues and friends.
Well-known identification criteria mainly include the least squares (LS) crite-rion, minimum mean square error (MMSE) criterion, and the maximum likelihood (ML) criterion. These criteria provide successful engineering solutions to most practical problems, and are still prevalent today in system identification. However, they have some shortcomings that limit their general use. For example, the LS and MMSE only consider the second-order moment of the error, and the identification performance would become worse when data are non-Gaussian distributed (e.g., with multimodal, heavy-tail, or finite range). The ML criterion requires the knowledge of the conditional probability density function of the observed samples, which is not available in many practical situations. In addition, the computational complexity of the ML estimation is usually high. Thus, selecting a new criterion beyond second-order statistics and likelihood function is attractive in problems of system identification.
In recent years, criteria based on information theoretic descriptors of entropy and dissimilarity (divergence, mutual information) have attracted lots of attentions and become an emerging area of study in signal processing and machine learning domains. Information theoretic criteria (or briefly, information criteria) can capture higher order statistics and information content of signals rather than simply their energy. Many studies suggest that information criteria do not suffer from the limita-tion of Gaussian assumption and can improve performance in many realistic sce-narios. Combined with nonparametric estimators of entropy and divergence, many adaptive identification algorithms have been developed, including the practical gradient-based batch or recursive algorithms, fixed-point algorithms (no step-size), or other advanced search algorithms. Although many elegant results and techniques have been developed over the past few years, till now there is no book devoted to a systematic study of system identification under information theoretic criteria. The
primary focus of this book is to provide an overview of these developments, with emphasis on the nonparametric estimators of information criteria and gradient-based identification algorithms. Most of the contents of this book originally appeared in the recent papers of the authors.
The book is divided into six chapters: the first chapter is the introduction to the information theoretic criteria and the state-of-the-art techniques; the second chapter presents the definitions and properties of several important information measures; the third chapter gives an overview of information theoretic approaches to parameter estimation; the fourth chapter discusses system identification under minimum error entropy criterion; the fifth chapter focuses on the minimum infor-mation divergence criteria; and the sixth chapter changes the focus to the mutual information-based criteria.
It is worth noting that the information criteria can be used not only for system parameter identification but also for system structure identification (e.g., model selection). The Akaike’s information criterion (AIC) and the minimum description length (MDL) are two famous information criteria for model selection. There have been several books on AIC and MDL, and in this book we don’t discuss them in detail. Although most of the methods in this book are developed particularly for system parameter identification, the basic principles behind them are universal. Some of the methods with little modification can be applied to blind source sepa-ration, independent component analysis, time series prediction, classification and pattern recognition.
This book will be of interest to graduates, professionals, and researchers who are interested in improving the performance of traditional identification algorithms and in exploring new approaches to system identification, and also to those who are interested in adaptive filtering, neural networks, kernel methods, and online machine learning.
The authors are grateful to the National Natural Science Foundation of China and the National Basic Research Program of China (973 Program), which have funded this book. We are also grateful to the Elsevier for their patience with us over the past year we worked on this book. We also acknowledge the support and encouragement from our colleagues and friends.
Xi’an
P.R. China March 2013
P.R. China March 2013
Contents
About the Authors i Preface iii Symbols and Abbreviations v
1 Introduction 1
1.1 Elements of System Identification 1
1.2 Traditional Identification Criteria 3
1.3 Information Theoretic Criteria 4
1.3.1 MEE Criteria 6
1.3.2 Minimum Information Divergence Criteria 7
1.3.3 Mutual Information-Based Criteria 7
1.4 Organization of This Book 8 Appendix A: Unifying Framework of ITL 9
2 Information Measures 13
2.1 Entropy 13
2.2 Mutual Information 19
2.3 Information Divergence 21
2.4 Fisher Information 23
2.5 Information Rate 24 Appendix B: α-Stable Distribution 26 Appendix C: Proof of (2.17) 26 Appendix D: Proof of Cramer-Rao Inequality 27
3 Information Theoretic Parameter Estimation 29
3.1 Traditional Methods for Parameter Estimation 29
3.1.1 Classical Estimation 29
3.1.2 Bayes Estimation 31
3.2 Information Theoretic Approaches to Classical Estimation 34
3.2.1 Entropy Matching Method 34
3.2.2 Maximum Entropy Method 35
3.2.3 Minimum Divergence Estimation 37
3.3 Information Theoretic Approaches to Bayes Estimation 40
3.3.1 Minimum Error Entropy Estimation 40
3.3.2 MC Estimation 51
3.4 Information Criteria for Model Selection 56 Appendix E: EM Algorithm 57 Appendix F: Minimum MSE Estimation 58 Appendix G: Derivation of AIC Criterion 58
4 System Identification Under Minimum Error Entropy Criteria 61
4.1 Brief Sketch of System Parameter Identification 61
4.1.1 Model Structure 62
4.1.2 Criterion Function 65
4.1.3 Identification Algorithm 65
4.2 MEE Identification Criterion 72
4.2.1 Common Approaches to Entropy Estimation 73
4.2.2 Empirical Error Entropies Based on KDE 76
4.3 Identification Algorithms Under MEE Criterion 82
4.3.1 Nonparametric Information Gradient Algorithms 82
4.3.2 Parametric IG Algorithms 86
4.3.3 Fixed-Point Minimum Error Entropy Algorithm 91
4.3.4 Kernel Minimum Error Entropy Algorithm 93
4.3.5 Simulation Examples 95
4.4 Convergence Analysis 104
4.4.1 Convergence Analysis Based on Approximate Linearization 104
4.4.2 Energy Conservation Relation 106
4.4.3 Mean Square Convergence Analysis Based on Energy Conservation Relation 111
4.5 Optimization of φ-Entropy Criterion 122
4.6 Survival Information Potential Criterion 129
4.6.1 Definition of SIP 129
4.6.2 Properties of the SIP 131
4.6.3 Empirical SIP 136
4.6.4 Application to System Identification 139
4.7 Δ-Entropy Criterion 143
4.7.1 Definition of Δ-Entropy 145
4.7.2 Some Properties of the Δ-Entropy 148
4.7.3 Estimation of Δ-Entropy 152
4.7.4 Application to System Identification 157
4.8 System Identification with MCC 161 Appendix H: Vector Gradient and Matrix Gradient 164
5 System Identification Under Information Divergence Criteria 167
5.1 Parameter Identifiability Under KLID Criterion 167
5.1.1 Definitions and Assumptions 168
5.1.2 Relations with Fisher Information 169
Contents
5.1.3 Gaussian Process Case 173
5.1.4 Markov Process Case 176
5.1.5 Asymptotic KLID-Identifiability 180
5.2 Minimum Information Divergence Identification with Reference PDF 186
5.2.1 Some Properties 188
5.2.2 Identification Algorithm 196
5.2.3 Simulation Examples 198
5.2.4 Adaptive Infinite Impulsive Response Filter with Euclidean Distance Criterion 201
6 System Identification Based on Mutual Information Criteria 205
6.1 System Identification Under the MinMI Criterion 205
6.1.1 Properties of MinMI Criterion 207
6.1.2 Relationship with Independent Component Analysis 211
6.1.3 ICA-Based Stochastic Gradient Identification Algorithm 212
6.1.4 Numerical Simulation Example 214
6.2 System Identification Under the MaxMI Criterion 216
6.2.1 Properties of the MaxMI Criterion 217
6.2.2 Stochastic Mutual Information Gradient Identification Algorithm 222
6.2.3 Double-Criterion Identification Method 227 Appendix I: MinMI Rate Criterion 238
References 239
1 Introduction 1
1.1 Elements of System Identification 1
1.2 Traditional Identification Criteria 3
1.3 Information Theoretic Criteria 4
1.3.1 MEE Criteria 6
1.3.2 Minimum Information Divergence Criteria 7
1.3.3 Mutual Information-Based Criteria 7
1.4 Organization of This Book 8 Appendix A: Unifying Framework of ITL 9
2 Information Measures 13
2.1 Entropy 13
2.2 Mutual Information 19
2.3 Information Divergence 21
2.4 Fisher Information 23
2.5 Information Rate 24 Appendix B: α-Stable Distribution 26 Appendix C: Proof of (2.17) 26 Appendix D: Proof of Cramer-Rao Inequality 27
3 Information Theoretic Parameter Estimation 29
3.1 Traditional Methods for Parameter Estimation 29
3.1.1 Classical Estimation 29
3.1.2 Bayes Estimation 31
3.2 Information Theoretic Approaches to Classical Estimation 34
3.2.1 Entropy Matching Method 34
3.2.2 Maximum Entropy Method 35
3.2.3 Minimum Divergence Estimation 37
3.3 Information Theoretic Approaches to Bayes Estimation 40
3.3.1 Minimum Error Entropy Estimation 40
3.3.2 MC Estimation 51
3.4 Information Criteria for Model Selection 56 Appendix E: EM Algorithm 57 Appendix F: Minimum MSE Estimation 58 Appendix G: Derivation of AIC Criterion 58
4 System Identification Under Minimum Error Entropy Criteria 61
4.1 Brief Sketch of System Parameter Identification 61
4.1.1 Model Structure 62
4.1.2 Criterion Function 65
4.1.3 Identification Algorithm 65
4.2 MEE Identification Criterion 72
4.2.1 Common Approaches to Entropy Estimation 73
4.2.2 Empirical Error Entropies Based on KDE 76
4.3 Identification Algorithms Under MEE Criterion 82
4.3.1 Nonparametric Information Gradient Algorithms 82
4.3.2 Parametric IG Algorithms 86
4.3.3 Fixed-Point Minimum Error Entropy Algorithm 91
4.3.4 Kernel Minimum Error Entropy Algorithm 93
4.3.5 Simulation Examples 95
4.4 Convergence Analysis 104
4.4.1 Convergence Analysis Based on Approximate Linearization 104
4.4.2 Energy Conservation Relation 106
4.4.3 Mean Square Convergence Analysis Based on Energy Conservation Relation 111
4.5 Optimization of φ-Entropy Criterion 122
4.6 Survival Information Potential Criterion 129
4.6.1 Definition of SIP 129
4.6.2 Properties of the SIP 131
4.6.3 Empirical SIP 136
4.6.4 Application to System Identification 139
4.7 Δ-Entropy Criterion 143
4.7.1 Definition of Δ-Entropy 145
4.7.2 Some Properties of the Δ-Entropy 148
4.7.3 Estimation of Δ-Entropy 152
4.7.4 Application to System Identification 157
4.8 System Identification with MCC 161 Appendix H: Vector Gradient and Matrix Gradient 164
5 System Identification Under Information Divergence Criteria 167
5.1 Parameter Identifiability Under KLID Criterion 167
5.1.1 Definitions and Assumptions 168
5.1.2 Relations with Fisher Information 169
Contents
5.1.3 Gaussian Process Case 173
5.1.4 Markov Process Case 176
5.1.5 Asymptotic KLID-Identifiability 180
5.2 Minimum Information Divergence Identification with Reference PDF 186
5.2.1 Some Properties 188
5.2.2 Identification Algorithm 196
5.2.3 Simulation Examples 198
5.2.4 Adaptive Infinite Impulsive Response Filter with Euclidean Distance Criterion 201
6 System Identification Based on Mutual Information Criteria 205
6.1 System Identification Under the MinMI Criterion 205
6.1.1 Properties of MinMI Criterion 207
6.1.2 Relationship with Independent Component Analysis 211
6.1.3 ICA-Based Stochastic Gradient Identification Algorithm 212
6.1.4 Numerical Simulation Example 214
6.2 System Identification Under the MaxMI Criterion 216
6.2.1 Properties of the MaxMI Criterion 217
6.2.2 Stochastic Mutual Information Gradient Identification Algorithm 222
6.2.3 Double-Criterion Identification Method 227 Appendix I: MinMI Rate Criterion 238
References 239