AI Meet Marlin: A FP16xINT4 LLM Inference Kernel that can Achieve Near-Ideal ~4x Speedups up to Medium Batch Sizes of 16-32 Tokens
AI Sakana AI Introduces KAME: A Tandem Speech-to-Speech Architecture That Injects LLM Knowledge in Real Time