We introduce the Momentum Transformer, an attention-based deep learning architecture which outperforms benchmark momentum and mean-reversion trading strategies. Unlike state-of-the-art Long Short-Term Memory (LSTM) architectures, which are sequential in nature, the attention mechanism provides our architecture with a direct connection to all previous time-steps. Our architecture enables us to learn longer-term dependencies, improves performance when considering returns net of transaction costs and naturally adapts to new market regimes, such as during the SARS-CoV-2 crisis. The Momentum Transformer is inherently interpretable, providing us with greater insights into our deep learning momentum trading strategy, including how it blends different classical strategies and the past time-steps which are of the greatest significance to the model.