Thursday, May 28, 2020

Encoder Decoder Explained

Neural Machine Translation

x1,x2,....,xTx - input sentence
y1,y2,....yTy - output sentence
Tx!=Ty, which means the length of the input sentence can be different from the output sentence


The problem with regular encoder decoder architectures arise when we have long sentences because RNNs dont do well in these scenarios. For eg : while translating a long sentence we probably dont read the whole sentence and then translate it. The human mind probably reads parts of the sentence and then processes the translation. This leads us to attention models. While translating a word, it weighs in the inputs to the word differently. We will cover attention models in a separate post. We will explore another encoder decoder architecture where the input is an image, hence the encoder produces an image encoding. 



Image Caption Generation

Encoder : Alexnet or any other Computer Vision model can generate the image encoding
Decoder : RNN like architecture can decode the encoding to create the image caption
y1, y2..., yT - image encoding


No comments:

Post a Comment

Books I am reading