Bayesian Optimization of Machine Learning Models

by Max Kuhn: Director, Nonclinical Statistics, Pfizer Many predictive and machine learning models have structural or tuning parameters that cannot be directly estimated from the data. For example, when using K-nearest neighbor model, there is no analytical estimator for K (the number of neighbors). Typically, resampling is used to get good performance estimates of the model for a given set of values for K and the one associated with the best results is used. This is basically a grid search procedure. However, there are other approaches that can be used. I’ll demonstrate how Bayesian optimization and Gaussian process models can be used as an alternative. To demonstrate, I’ll use the regression simulation system of Sapp et al. (2014) where the predictors (i.e. x’s) are independent Gaussian random variables with mean…


Link to Full Article: Bayesian Optimization of Machine Learning Models