ML&DL : train, validation, test 데이터로 나누기

code - 1

import sklearn

def data_split(examples, labels, train_frac, random_state=None):
    ''' https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
    param data:       Data to be split
    param train_frac: Ratio of train set to whole dataset

    Randomly split dataset, based on these ratios:
        'train': train_frac
        'valid': (1-train_frac) / 2
        'test':  (1-train_frac) / 2

    Eg: passing train_frac=0.8 gives a 80% / 10% / 10% split
    '''

    assert train_frac >= 0 and train_frac <= 1, "Invalid training set fraction"

    X_train, X_tmp, Y_train, Y_tmp = sklearn.model_selection.train_test_split(
                                        examples, labels, train_size=train_frac, random_state=random_state)

    X_val, X_test, Y_val, Y_test   = sklearn.model_selection.train_test_split(
                                        X_tmp, Y_tmp, train_size=0.5, random_state=random_state)

    return X_train, X_val, X_test,  Y_train, Y_val, Y_test

code - 2

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test 
    = train_test_split(X, y, test_size=0.2, random_state=1)

 X_train, X_val, y_train, y_val 
    = train_test_split(X_train, y_train, test_size=0.2, random_state=1)

크게 두 가지 방법이 존재한다.

하나는 code-1 처럼 새로 define 함수를 한다.

나머지는 code-2 처럼 sklearn.cross_validation 의 train_test_split 함수를 이용해서,

처음에 train과 test 데이터로 나누고, 그 다음에는 train변수를 train과 validation으로 나눈다.

저작자표시 비영리 동일조건

'AI' 카테고리의 다른 글

ML : Model : (Gaussian) Naive Bayes Classifier (0)	2020.02.09
ML : 오차 vs 잔차 (0)	2020.02.01
ML&DL : 정규성, 독립성, 등분산성 검증 (0)	2020.02.01
Statistics : 14-1, 2 : 분산분석 (0)	2020.01.22
Statistics : 5-5 : 베이즈정리 심화 (0)	2020.01.18

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

월곡동 로봇팔의 대학원일지

ML&DL : train, validation, test 데이터로 나누기

code - 1

code - 2

'AI' 카테고리의 다른 글

댓글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

ML&DL : train, validation, test 데이터로 나누기

code - 1

code - 2

'AI' 카테고리의 다른 글

관련글

댓글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역