Attention & Transformers — Interview Q&A

Self-Attention

What is self-attention?

Self-attention allows each token in a sequence to weigh the relevance of all other tokens when computing its representation.

One-line:
Self-attention lets each token dynamically focus on other tokens to model global context.

Trap:
Confusing self-attention with cross-attention.