AI How Can We Efficiently Deploy Large Language Models in Streaming Applications? This AI Paper Introduces the StreamingLLM Framework for Infinite Sequence Lengths