๐งฎPraktikum 3
Spliting Data
Random Split
Langkah 1 - Load Data
import pandas as pd
df = pd.read_csv('data/Titanic-Dataset-selected.csv')
df.head()Langkah 2 - Split Data
# Split data
from sklearn.model_selection import train_test_split
# Split data training dan dan lainnya
# data lainnya, akan kita split lagi menjadi validasi dan testing.
# Rasio yang akan kita gunakan adalah 8:1:1
df_train, df_unseen = train_test_split(df, test_size=0.2, random_state=0)
# Split lagi antara validasi dan testing
df_val, df_test = train_test_split(df_unseen, test_size=0.5, random_state=0)
# Cek masing-masing ukuran data
print(f'Jumlah data asli: {df.shape[0]}')
print(f'Jumlah data train: {df_train.shape[0]}')
print(f'Jumlah data val: {df_val.shape[0]}')
print(f'Jumlah data test: {df_test.shape[0]}')
# Cek rasio tiap label
print('=========')
print(f'Jumlah label data asli:\n{df.Survived.value_counts()}')
print(f'Jumlah label data train:\n{df_train.Survived.value_counts()}')
print(f'Jumlah label data val:\n{df_val.Survived.value_counts()}')
print(f'Jumlah label data test:\n{df_test.Survived.value_counts()}')Stratified Split
Langkah 1 - Load Data
Langkah 2 - Split Data
Cross Validation 1
Langkah 1 - Load Data
Langkah 2 - Split Data
Cross Validation 2
Langkah 1 - Load Data
Langkah 2 - Split Data
Kode Lengkap
Last updated
Was this helpful?