Kernel Optimization for Support Vector Machines: Application to Speaker Verification
Hatch, Andrew Oliver
Technical Report Identifier: EECS-2006-187
December 18, 2006
Abstract: In this dissertation, we examine the problem of kernel optimization for binary classification tasks where the training data are partitioned into multiple, disjoint classes. The dissertation focuses specifically on the field of speaker verification, which can be framed as a one-versus-all (OVA) decision task involving a target speaker and a set of impostor speakers.
The main result of this dissertation is a new framework for optimizing generalized linear kernels of the form, k(x1,x2) = x^T/1 Rx2, where x1 and x2 are input feature vectors, and R is a positive semidefinite parameter matrix. Our framework is based on using first and second-order statistics from each class (i.e., speaker) in the data to construct an upper bound on classification error in a linear classifier. Minimizing this bound leads directly to a new, modified formulation of the 1-norm, soft-margin support vector machine (SVM). This modified formulation is identical to the conventional SVM developed by Vapnik, except that it implicitly prescribes a solution for the R parameter matrix in a generalized linear kernel. We refer to this new, modified SVM formulation as the "adaptive, multicluster SVM" (AMC-SVM). Unlike most other kernel learning techniques in the literature, the AMC-SVM uses information about clusters that reside within the given target and impostor data to obtain tighter bounds on classification error than those obtained in conventional SVM-based approaches. This use of cluster information makes the AMC-SVM particularly well-suited to tasks that involve binary classification of multiclass data -- for example, the speaker verification task -- where each class (i.e., speaker) can be treated as a separate cluster.
In OVA training settings, we show that the AMC-SVM can, under certain conditions, be formulated to yield a single, fixed kernel function that applies universally to any choice of target speaker. Since this kernel function is linear, we can implement it by applying a single linear feature transformation to the input feature space. This feature transformation performs what we refer to as "within-class covariance normalization" (WCCN) on the input feature vectors. We describe a set of experiments where WCCN yields large reductions in classification error over other normalization techniques on a state-of-the-art SVM-based speaker verification system.