GyriQAI Research Note · Findings of ACL 2026

SYNAPSE: Episodic-Semantic Memory for Long-Horizon LLM Agents

Your RAG agent may not be forgetting. It may simply fail to connect the right memory to the current question. This note summarizes SYNAPSE, our ACL 2026 Findings paper on long-term memory for LLM agents. The central claim is modest but important: memory-augmented agents should retrieve not only semantically similar text, but also structurally related evidence across time, entities, and events. SYNAPSE has already been used in EZCollegeApp, GyriQAI’s product for long-horizon U.S. undergraduate application planning, where an AI counselor needs to preserve evolving student context, document status, deadlines, and the reasoning behind earlier recommendations.

Query Graph Activation Context

Lexical and semantic anchors inject signal; activation then propagates through the memory graph before final context selection.

40.5 Weighted F1

Best weighted average on LoCoMo among compared memory systems.

+7.2 F1 over A-Mem

Improvement over A-Mem on the same GPT-4o-mini setting.

95% Fewer tokens

Compared with full-context methods on the LoCoMo evaluation.

About the author

Hanqi Jiang is co-founder of GyriQAI and a Computer Science Ph.D. candidate at the University of Georgia, advised by Distinguished Research Professor Tianming Liu. His research focuses on long-horizon agent memory, multimodal LLMs, quantum AI, medical image analysis, and brain-inspired AI. He is also affiliated with the Center for Advanced Medical Computing and Analysis at Mass General Brigham / Harvard Medical School, where he works on radiology-related AI research. Through GyriQAI, he is interested in translating memory-augmented agent research into practical decision-support products, starting with EZCollegeApp for U.S. undergraduate application planning.

Core thesis

Long-term agents need memory that preserves relationships, not just similar text.

Vector search asks: “What past text looks like this question?” SYNAPSE asks a complementary question: what past evidence is connected to this situation?

Similarity-based retrieval

  1. Embed the user query.
  2. Retrieve nearby chunks.
  3. Stuff them into context.
  4. Rely on the model to infer the missing bridge.

This is effective for direct lookup, but weaker when the answer depends on causal, temporal, or transitive links.

SYNAPSE retrieval

  1. Build an episodic-semantic graph.
  2. Inject activation from lexical and semantic anchors.
  3. Let energy spread through temporal, abstraction, and association edges.
  4. Retrieve the subgraph that is structurally relevant.

The memory still uses semantic similarity, but it also lets graph structure contribute to relevance.

A common failure mode: contextual isolation

Imagine an assistant that has worked with a user for weeks. The user asks:

Why am I feeling anxious today?

A vector memory will retrieve messages close to “anxious”: stress, sleep, pressure, maybe recent complaints. But suppose the real cause was a scheduling conflict mentioned three weeks ago. That note never used the word “anxiety.” It is not close in embedding space. The agent misses it and gives a plausible, incomplete answer.

That is contextual isolation: the memory exists, but the retrieval system cannot connect it to the moment where it matters. SYNAPSE is designed for this class of failure.

Three mechanisms behind SYNAPSE

01

Unified graph

Dialogue turns become episodic nodes; extracted entities, goals, events, and preferences become semantic nodes. Temporal and semantic structure are represented in one graph.

02

Spreading activation

A query anchors the graph through BM25 and dense retrieval. Activation then propagates along temporal, abstraction, and association edges.

03

Uncertainty gating

If activation is too weak, the system can reject the memory claim before generation. This helps reduce memory hallucination.

SYNAPSE architecture overview showing dual triggers, spreading activation dynamics, episodic-semantic graph nodes, and triple hybrid scoring.
Figure 1. Overview of SYNAPSE. A query anchors the memory graph through lexical and semantic triggers, activation propagates across episodic and semantic nodes, and triple hybrid scoring selects the final context.
Ski trip
Mark
Later breakup

The bridge-node effect

If “Mark” appears in both a ski-trip conversation and a later dating conversation, Mark becomes the bridge. A question about “the guy from the ski trip” can activate the trip, then Mark, then the later relationship outcome. Pure vector search often misses that path.

Results on long-horizon conversational memory

Method Multi-Hop Temporal Open-Domain Single-Hop Weighted F1
A-Mem 27.0 45.9 12.1 44.7 33.3
AriGraph 28.5 43.2 14.5 45.1 33.7
MemoryOS 35.3 41.2 20.0 48.6 38.0
Zep 35.5 48.5 23.1 48.0 39.7
SYNAPSE 35.7 50.1 25.9 48.9 40.5

Low-similarity subset: when evidence is deliberately far from the question in embedding space, A-Mem drops by more than 50%; SYNAPSE drops by less than 8%. This supports the hypothesis that graph structure provides a complementary retrieval signal.

96.6

Adversarial F1. The gating layer helps the system reject questions about absent memories.

1.9s

Average latency on GPT-4o-mini in the reported efficiency evaluation.

$0.24

Estimated cost per 1K queries under the paper’s GPT-4o-mini cost profile.

Why this matters for GyriQAI

GyriQAI is building AI products for long-horizon decision support. Our first product direction, EZCollegeApp, focuses on U.S. undergraduate application planning for international students.

Admissions guidance is a memory-heavy workflow. A useful AI counselor must track a student’s academic profile, school list, document status, essay revisions, deadlines, constraints, and the reasoning behind prior recommendations. It also needs restraint: it should not fabricate experiences, pretend forecasts are guarantees, or replace official admissions requirements.

This is where SYNAPSE informs our product thinking. We are interested in memory-native AI products: systems that preserve evolving user context, retrieve grounded evidence at the right time, and know when the evidence is not available.

Design implications

01 Relationship-aware retrieval Model temporal, causal, and entity links instead of relying only on semantic similarity.
02 Grounded personalization Use explicit user records and workflow state, not generic profile assumptions.
03 Memory-aware uncertainty Let the system say when a fact is not in memory.