机器学习项目实战——人脸识别--粉丝服务平台-粉丝头条-fensifuwu.com

机器学习项目实战——人脸识别

科技 09-06 来源：知乎

原标题：机器学习项目实战——人脸识别大学本科数学系在读

(素材来自《数据科学手册》和《机器学习经典实例》)

sklearn的Wild数据集中公开了几千张人脸照片from sklearn.datasets import fetch_lfw_peoplefaces = fetch_lfw_people(min_faces_per_person = 60)print(faces.target_names)print(faces.images.shape)

可以看到一共有八个人的1277张照片，每张照片是62*47像素。把这些人的照片打印出来看看：importmatplotlib.pyplotaspltfig,ax=plt.subplots(3,5)fori,axiinenumerate(ax.flat):axi.imshow(faces.images[i],cmap='gray')axi.set(xticks=[],yticks=[],xlabel=faces.target_names[faces.target[i]])

一般来讲图片中很多像素是没有意义的，所以这里先用PCA降维，再用SVM分类。还是那个套路，管道plus网格搜索：from sklearn.svm import SVCfrom sklearn.decomposition import PCAfrom sklearn.pipeline import make_pipeline #make_pipelin是Pipeline类的简单实现 pca = PCA(n_components = 150, whiten = True) #因为SVM要求数据必须预处理，所以whiten=Truesvc = SVC(kernel = 'rbf')model = make_pipeline(pca, svc) from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(faces.data, faces.target) from sklearn.model_selection import GridSearchCVparam = {'svc__C': [1, 5, 10, 50, 75, 90], 'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}grid = GridSearchCV(model, param, cv = 5)grid.fit(X_train, y_train)

注意，svc中的惩罚参数C和径向基函数核带宽gamma高度相关，两者一定要一起调！

打印最优参数和最佳得分：print(grid.best_params_)print("最高分{:.3f}".format(grid.best_score_))

注意，如果最优参数落在边缘地带，则需要扩大网格搜索的范围。

现在我们用模型进行预测，并将结果可视化。y_pre = grid.predict(X_test)fig, ax = plt.subplots(4,6)for i, axi in enumerate(ax.flat): axi.imshow(X_test[i].reshape(62,47), cmap = 'gray') axi.set(xticks = [], yticks = []) axi.set_xlabel(faces.target_names[y_pre[i]], color = 'k' if y_pre[i] == y_test[i] else 'r')