What? - Self-attention은 어디에 정의?

A high-level look

Untitled

Untitled

transformer도 역시 encoder-decoder구조를 가진다.

Untitled

각 encoder는 encoder layer가 여러개 stack. 각 decoder는 decoder layer가 여러개 스택.

Untitled

각 encoder/decoder레이어는, self-attention + FFN으로 구성이 된다

그렇다면, Self-Attention 레이어는 어떻게 정의될까?