Attention & Transformers — Interview Q&A
Self-Attention
What is self-attention?
Self-attention allows each token in a sequence to weigh the relevance of all other tokens when computing its representation.
One-line:
Self-attention lets each token dynamically focus on other tokens to model global context.
Confusing self-attention with cross-attention.