AI Researchers from McGill University Present the Pythia 70M Model for Distilling Transformers into Long Convolution Models