silicon valley stories: Attention Explained

Thursday, May 28, 2020

Attention Explained

Screenshot of Andrew Ng's explanation of attention model from the deeplearning.ai course

The problem with regular encoder decoder architectures arise when we have long sentences because RNNs dont do well in these scenarios. For eg : while translating a long sentence humans probably dont read the whole sentence and then translate it. The human mind probably reads parts of the sentence and then processes the translation for that part. This leads us to attention models. While translating a word, it weighs in the inputs to the word differently.

References

Visualizing a NMT Model with attention
Attention is all you need
Deeplearning.ai - Andrew Ng's explanation to attention models

silicon valley stories

Pages

Thursday, May 28, 2020

Attention Explained

No comments:

Post a Comment

Books I am reading