This post recaps a session at the SingleStore NOW 2024 Conference, held at the Chase Center on October 3, 2024. To view the entire session, check out the video at the bottom of the blog.

Overview
Jose Menendez, a software engineer at Groq, brought an engaging and down-to-earth perspective to the discussion of AI trends. His talk focused on the current state of AI technologies, the role of Large Language Models (LLMs) and the critical importance of speed and inference in modern AI applications.
About the speaker
Jose Menendez is a software engineer at Groq, where he specializes in optimizing the performance of AI inference. Groq is known for pushing the boundaries of AI speed, particularly with their custom-designed Language Processing Units (LPUs). Jose’s insights are drawn from hands-on experience with both open-source and proprietary models, making him a leading voice in the AI community.
Key takeaways
Jose's presentation shed light on several emerging trends and technical concepts in AI, with a focus on performance and efficiency:
- Understanding tokens and inference. Jose explained the concept of tokens in the context of LLMs, highlighting how tokens represent data being processed during interactions with models like ChatGPT. He emphasized the importance of efficiency in managing tokens, as they are the basis for charging and performance in many AI services. This is especially relevant for developers working to optimize costs and response times in AI applications.
- The role of LPUs in AI speed. Groq has developed specialized hardware, the LPU, designed specifically for AI inference tasks. Jose contrasted LPUs with GPUs, which were originally designed for graphics but are now repurposed for AI. LPUs, by design, focus solely on language tasks, enabling Groq to achieve significant speed improvements — with their latest benchmarks surpassing 3,000 tokens per second.
- Retrieval Augmented Generation (RAG) techniques. Jose provided a simple yet effective explanation of RAG, a method that combines traditional model outputs with specific, retrieved data to enhance accuracy. By using RAG, companies can ensure AI models produce more accurate and contextually relevant responses, reducing the risk of "hallucinations" in which models generate incorrect or irrelevant information.
- AI use cases beyond chatbots. Jose highlighted the diversity of AI applications, which are moving beyond chatbots to encompass more sophisticated tools like study plan generators and customized learning applications. He stressed that while conversational AI is prominent, there are countless other ways LLMs and similar models can be leveraged to create impactful solutions.
- Community-driven development. The Groq developer community has grown to over half a million registered users. This community-driven approach allows Groq to innovate rapidly, with ideas and feedback from developers worldwide. Jose encouraged enterprises to tap into their own developer communities for similar benefits.
Take it to the next level
Are you curious to see how cutting-edge AI tools can transform your business? Explore the latest insights from the conference and get started with a free SingleStore Helios® trial today. Discover how Groq’s high-speed AI solutions and SingleStore’s real-time data capabilities can power your next AI project.









