Transformers

Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads

Visual perception tasks are predominantly solved by Vision Transformer (ViT) architectures, which, despite their effectiveness, encounter a computational bottleneck due to the quadratic complexity of computing self-attention. This inefficiency is …

UnitNorm: Rethinking Normalization for Transformers in Time Series

Normalization techniques are crucial for enhancing Transformer models’ performance and stability in time series analysis tasks, yet traditional methods like batch and layer normalization often lead to issues such as token shift, attention shift, and …