Keras 훑어보기¶

1. 전처리¶

from tensorflow.keras.preprocessing.text import Tokenizer
t = Tokenizer()
fit_text = "The earth is an awesome place live"
t.fit_on_texts([fit_text])

test_text = "The earth is an great place live"
sequences = t.texts_to_sequences([test_text])[0]

print("sequences : ",sequences) # great는 단어 집합(vocabulary)에 없으므로 출력되지 않는다.
print("word_index : ",t.word_index) # 단어 집합(vocabulary) 출력

sequences :  [1, 2, 3, 4, 6, 7]
word_index :  {'the': 1, 'earth': 2, 'is': 3, 'an': 4, 'awesome': 5, 'place': 6, 'live': 7}

pad_sequences(a,maxlen=,padding=)

첫번째인자 : 패딩을 진행할 데이터
maxlen : 모든 데이터에 대해서 정규화 할 길이
padding : 'pre'를 선택하면 앞에 0을 채우고 'post'를 선택하면 뒤에 0을 채움.

from tensorflow.keras.preprocessing.sequence import pad_sequences
pad_sequences([[1,2,3],[3,4,5,6],[7,8]], maxlen=3, padding='pre')

array([[1, 2, 3],
       [4, 5, 6],
       [0, 7, 8]])

2. 워드 임베딩 (Word Embedding)¶

	원-핫 벡터	임베딩 벡터
차원	고차원(단어 집합의 크기)	저차원
다른 표현	희소 벡터의 일종	밀집 벡터의 일종
표현 방법	수동훈련	데이터로부터 학습함
값의 타입	1과 0	실수

from tensorflow.keras.layers import Embedding
# 문장 토근화와 단어 토큰화
text = [['Hope', 'to', 'see', 'you', 'soon'],['Nice', 'to', 'see', 'you', 'again']]

# 각 단어에 대한 정수 인코딩
text = [[0,1,2,3,4],[5,1,2,3,6]]

# # 위 데이터가 아래의 임베딩 층의 입력이 된다.
"""
7 은 단어의 갯수, 단어의 집합 크기이다.
2 은 임베딩한 후의 벡터의 크기이다.
5 은 각 입력 시퀀스의 길이이다. input하는 list의 길이.
"""
Embedding(7,2, input_length = 5)

<tensorflow.python.keras.layers.embeddings.Embedding at 0x268d5ec9288>

3. modeling 모델링¶

Sequential : 입력층, 은닉층, 출력층을 생성하기 위해 만드는 클래스
Dense : 전결합층을 말한다.

첫번째 인자 = 출력 뉴런의 수.
input_dim = 입력 뉴런의 수. (입력의 차원)
activation = 활성화 함수.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential()
model.add(Dense(8,input_dim = 4, activation='relu'))
# 그 다음 add할 때는 input_dim 이 필요하지 않다.
model.add(Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 8)                 40        
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 9         
=================================================================
Total params: 49
Trainable params: 49
Non-trainable params: 0
_________________________________________________________________

4. 컴파일 및 훈련 (Compile & Training)¶

optimizer : 훈련 과정을 설정하는 옵티마이저를 설정합니다. 'adam'이나 'sgd'와 같이 문자열로 지정할 수도 있습니다.
loss : 훈련 과정에서 사용할 손실 함수(loss function)를 설정합니다.
metrics : 훈련을 모니터링하기 위한 지표를 선택합니다.

from tensorflow.keras.layers import SimpleRNN, Embedding, Dense
from tensorflow.keras.models import Sequential

max_features = 10000

model2 = Sequential()
# 10000개의 단어집합을 넣고 32차원으로 출력, input_length는 None
model2.add(Embedding(max_features, 32))
model2.add(SimpleRNN(32))
model2.add(Dense(1, activation='sigmoid'))
model2.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

validation_data(x_val, y_val) = 검증 데이터(validation data)를 사용합니다.
검증 데이터를 사용하면 각 에포크마다 검증 데이터의 정확도도 함께 출력되는데, 이 정확도는 훈련이 잘 되고 있는지를 보여줄 뿐이며 실제로 모델이 검증 데이터를 학습하지는 않습니다. 검증 데이터의 loss가 낮아지다가 높아지기 시작하면 이는 과적합(overfitting)의 신호입니다.
validation_split= X_train과 y_train에서 일정 비율을 분리하여 이를 검증 데이터로 사용합니다.
verbose = 학습 중 출력되는 문구를 설정합니다.
[ ] 0 : 아무 것도 출력하지 않습니다.
[ ] 1 : 훈련의 진행도를 보여주는 진행 막대를 보여줍니다.
[ ] 2 : 미니 배치마다 손실 정보를 출력합니다.

model2.fit(X_train, y_train, epochs=10, batch_size=32, verbose=0, validation_data(X_val, y_val))

# model2.fit(X_train, y_train, epochs=10, batch_size=32, verbose=0, validation_split=0.2)

5. 평가 & 예측 (Evaluation & Prediction)¶

# test_data를 통해 학습한 모델
model2.evaluate(X_test, y_test, batch_size=32)
# 임의의 입력에 대한 모델의 출력값
model2.predict(X_input, batch_size=32)
# model2.save
model2.save("model_name.h5")

# model.load
from tensorflow.keras.models import load_model
model = load_model('model_name.h5')
model

<tensorflow.python.keras.engine.sequential.Sequential at 0x268d623fc08>

DL : Keras : Sequential vs Functional API (0)	2020.03.10
DL : Keras texts_to_matrix 이해하기 (0)	2020.03.10
ML & DL : 오류 정리 (0)	2020.03.08
ML & DL : 오류를 막는 방법 (0)	2020.03.08
DL : Deep Learning : 개요 : 학습 방법 (0)	2020.03.08

월곡동 로봇팔의 대학원일지

DL : Keres 기초

Keras 훑어보기¶

1. 전처리¶

2. 워드 임베딩 (Word Embedding)¶

3. modeling 모델링¶

4. 컴파일 및 훈련 (Compile & Training)¶

5. 평가 & 예측 (Evaluation & Prediction)¶

'AI' 카테고리의 다른 글

댓글

티스토리툴바

DL : Keres 기초

Keras 훑어보기¶

1. 전처리¶

2. 워드 임베딩 (Word Embedding)¶

3. modeling 모델링¶

4. 컴파일 및 훈련 (Compile & Training)¶

5. 평가 & 예측 (Evaluation & Prediction)¶

'AI' 카테고리의 다른 글

관련글

댓글

티스토리툴바