As Artificial Intelligence systems increasingly integrate into our daily routines, from personal assistants to enterprise solutions, the stakes are higher than ever for these systems to not only comprehend but also retain information. However, efficiently managing vast amounts of data while ensuring rapid response times remains an insurmountable challenge. Enter Zep, a groundbreaking innovation that aims to revolutionize how AI interacts with data.
The Challenge
Today’s AI faces a paradox: the need for vast amounts of knowledge and the computational burden of processing this information. Increasing the data context window—essentially the amount of information an AI can consider at once—has been one approach. However, this method quickly becomes inefficient, leading to high costs and slowed performance. In real-world applications where split-second decisions are critical, latency and power consumption are serious concerns.
Introducing Zep: A New Paradigm in AI Memory
Zep offers a radical solution by creating atemporal knowledge graph—a memory system that continuously updates based on new interactions. Inspired by GraphRAG, Zep incorporates changes in user interactions and business data, maintaining a fluid understanding of information over time. This means that instead of processing all data all the time, AI can focus on what’s important—when it matters.
Our Evaluation Methodology
To truly measure Zep’s effectiveness, their recent published paper undertook a rigorous evaluation using a simplified version of the LongMemEval team’s proposed test methodology. Like our previous work with the DMR benchmark, we used the full chat history as a baseline for comparison, which involved placing this history into the context window.
For Zep, the evaluation began by ingesting chat history datasets through its built-in message load capabilities, utilizing gpt-4o-mini internally to construct knowledge graphs. These were subsequently queried using unmodified test questions, with Zep providing relevant memory context that supplemented the LLM’s input to enhance its responses.
An LLM-as-judge then evaluated these results against the established “Golden” standard, as detailed comprehensively in our paper.
Excelling at Enterprise-Critical Tasks
Zep’s ability to maintain multiple temporal versions of facts, trace the lineage of information changes, and automatically construct a historical narrative makes it uniquely different from existing memory systems in AI. Going beyond simple fact retrieval, Zep enables agents to reason about causality, track idea evolution, and comprehend the context of changes.
Their evaluations demonstrated Zep’s outstanding performance across various enterprise-critical tasks:
| QUESTION TYPE | MODEL | FULL-CONTEXT | ZEP | DELTA |
|---|---|---|---|---|
| single-session-preference | gpt-4o-mini | 30.0% | 53.3% | 77.7%↑ |
| single-session-assistant | gpt-4o-mini | 81.8% | 75.0% | 9.06%↓ |
| temporal-reasoning | gpt-4o-mini | 36.5% | 54.1% | 48.2%↑ |
| multi-session | gpt-4o-mini | 40.6% | 47.4% | 16.7%↑ |
| knowledge-update | gpt-4o-mini | 76.9% | 74.4% | 3.36%↓ |
| single-session-user | gpt-4o-mini | 81.4% | 92.9% | 14.1%↑ |
| single-session-preference | gpt-4o | 20.0% | 56.7% | 184%↑ |
| single-session-assistant | gpt-4o | 94.6% | 80.4% | 17.7%↓ |
| temporal-reasoning | gpt-4o | 45.1% | 62.4% | 38.4%↑ |
| multi-session | gpt-4o | 44.3% | 57.9% | 30.7%↑ |
| knowledge-update | gpt-4o | 78.2% | 83.3% | 6.52%↑ |
| single-session-user | gpt-4o | 81.4% | 92.9% | 14.1%↑ |
The results are especially significant for tasks requiring cross-session information synthesis and long-term context maintenance, proving Zep’s capability and effectiveness in real-world applications.
A Look at Zep’s Future
Zep’s improvement over baseline results increases with model capability. When paired with gpt-4o, an impressive aggregate improvement of 18.5% was observed compared to the 15.2% improvement with gpt-4o-mini. This highlights the potential of larger models to better utilize the richness and temporal complexity of Zep’s memory.
As AI models continue to evolve, we anticipate further enhancements in graph construction accuracy and relationship mapping, allowing us to leverage more sophisticated capabilities. At One AI, we are excited about these advancements and remain committed to integrating cutting-edge technologies like Zep to drive higher efficiency and smarter AI systems.
Join the Revolution
The introduction of Zep marks a new chapter in AI memory technology. For businesses and developers looking to stay at the forefront of innovation, Zep offers a pathway to more intelligent and responsive systems. Explore what Zep can do for your operations and elevate the impact of your AI interactions.








