🖼️Praktikum 2
Klasifikasi SVM Pada Data Citra
Last updated
Klasifikasi SVM Pada Data Citra
Last updated
dataset yang digunakan adalah Labeled Faces in the Wild dataset (sebuah dataset yang berisi ribuan wajah publik figur).
Link dataset: https://www.kaggle.com/datasets/jessicali9530/lfw-dataset
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=60)
print(faces.target_names)
print(len(faces.target_names))
print(faces.images.shape)
['Ariel Sharon' 'Colin Powell' 'Donald Rumsfeld' 'George W Bush'
'Gerhard Schroeder' 'Hugo Chavez' 'Junichiro Koizumi' 'Tony Blair']
8
(1348, 62, 47)
# contoh wajah yang digunakan
fig, ax = plt.subplots(3, 5)
for i, axi in enumerate(ax.flat):
axi.imshow(faces.images[i], cmap='bone')
axi.set(xticks=[], yticks=[],
xlabel=faces.target_names[faces.target[i]])
from sklearn.svm import SVC
from sklearn.decomposition import PCA as RandomizedPCA
from sklearn.pipeline import make_pipeline
pca = RandomizedPCA(n_components=150, whiten=True, random_state=42)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)
# pemisahan data training dan data testing
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(faces.data, faces.target,
random_state=42)
from sklearn.model_selection import GridSearchCV
param_grid = {'svc__C': [1, 5, 10, 50],
'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)
%time grid.fit(Xtrain, ytrain)
print(grid.best_params_)
print(grid.best_score_)
CPU times: user 1min 6s, sys: 1.29 s, total: 1min 7s
Wall time: 38.4 s
{'svc__C': 50, 'svc__gamma': 0.005}
0.7448080768668
prediksi label untuk data testing
model = grid.best_estimator_
yfit = model.predict(Xtest)
# hasil label pada data testing
fig, ax = plt.subplots(4, 6)
for i, axi in enumerate(ax.flat):
axi.imshow(Xtest[i].reshape(62, 47), cmap='bone')
axi.set(xticks=[], yticks=[])
axi.set_ylabel(faces.target_names[yfit[i]].split()[-1],
color='black' if yfit[i] == ytest[i] else 'red')
fig.suptitle('Predicted Names; Incorrect Labels in Red', size=14)
Text(0.5, 0.98, 'Predicted Names; Incorrect Labels in Red')
contoh di atas hanya menunjukkan satu data dengan label salah. Secara pengukuran, performa klasifikasi dapat diketahui sebagai berikut
from sklearn.metrics import classification_report
print(classification_report(ytest, yfit,
target_names=faces.target_names))
precision recall f1-score support
Ariel Sharon 0.91 0.67 0.77 15
Colin Powell 0.85 0.85 0.85 68
Donald Rumsfeld 0.72 0.58 0.64 31
George W Bush 0.76 0.92 0.83 126
Gerhard Schroeder 0.75 0.65 0.70 23
Hugo Chavez 1.00 0.45 0.62 20
Junichiro Koizumi 0.91 0.83 0.87 12
Tony Blair 0.80 0.76 0.78 42
accuracy 0.80 337
macro avg 0.84 0.71 0.76 337
weighted avg 0.80 0.80 0.79 337
# bentuk confusion matrix
from sklearn.metrics import confusion_matrix
mat = confusion_matrix(ytest, yfit)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,
xticklabels=faces.target_names,
yticklabels=faces.target_names)
plt.xlabel('true label')
plt.ylabel('predicted label')
Text(91.68, 0.5, 'predicted label')