AI Researchers from the University of Oregon and Adobe Introduce CulturaX: A Multilingual Dataset with 6.3T Tokens in 167 Languages Tailored for Large Language Model (LLM) Development