AI This AI Paper Unveils the Cached Transformer: A Transformer Model with GRC (Gated Recurrent Cached) Attention for Enhanced Language and Vision Tasks