AI A simple vision-encoder text-decoder architecture for multimodal tasks – Ztoog Posted by AJ Piergiovanni and Anelia Angelova, Analysis Scientists, Google Analysis