Back to blog
engineeringpostgresperformancearchitecture

pgGraph vs. Apache AGE: Architectural Choices and Performance

Damien Lim
Damien Lim
CTO
·
May 13, 2026
·
5 min read

When engineering teams decide to run graph workloads directly against their existing PostgreSQL data, the first question they inevitably ask is: "Why not just use Apache AGE?" It's an established, open-source project that implements openCypher directly inside Postgres. On paper, it seems like the obvious choice.

The short answer is architecture: different layer, different job.

Apache AGE and pgGraph (the open-source engine powering Evokoa) both bring graph-style querying to Postgres, but their underlying architectures are fundamentally different. AGE converts Cypher queries into recursive SQL calls inside Postgres. This works well for simple queries, but breaks down catastrophically as paths get deeper. In contrast, pgGraph creates a highly optimized virtual graph layer completely in-memory. Postgres remains your single source of truth and primary query interface, while pgGraph handles the deep relationship traversals (including 10+ hop paths) at bare-metal speeds.

Different Execution Models

To understand why this architectural divergence matters, we need to look at how traversals are executed.

Apache AGE translates Cypher queries into recursive work inside the Postgres execution planner. It stores graph topology using standard relational rows or JSONB columns. When you execute a traversal, the database has to perform complex recursive SQL joins, chasing pointers across B-tree indexes.

For simple, 2-hop or 3-hop queries, this relational emulation works fine. But AI agent workloads and enterprise operational intelligence do not stop at 3 hops. They require deep, multi-hop reasoning. They need to traverse from a patient, to a transcript, to an SOP, to a branch, to an outcome, and back again, continuously.

When you push AGE-style recursive calls to 10 or 15 hops across a multi-million edge dataset, the Postgres execution planner chokes. The recursive joins explode the working memory set. Latency degrades from milliseconds to seconds, and eventually, the queries simply timeout.

Built for Deep Traversal

When we built Evokoa, we realized that to get microsecond latency at 20-hop depth, we could not rely on the Postgres query planner to walk edges. Emulation was not enough. We needed a data structure built specifically for traversal.

Instead of emulating edges in tables, pgGraph compiles your Postgres relationships into a highly compact, memory-mapped virtual graph layer. We use a Compressed Sparse Row (CSR) array format—the same structure used in high-performance scientific computing and GPU algorithms.

In this in-memory virtual layer, finding a node's neighbors does not require an index lookup or a table join. It requires a single array offset calculation. There are no pointers to chase. There are no recursive SQL statements. The CPU simply walks contiguous blocks of memory in a hot loop.

Postgres Remains the Interface

We are open-sourcing pgGraph as a Postgres extension. This means builders can run complex graph queries directly inside Postgres, alongside their existing data. Your application keeps writing data the exact same way it always has, and Postgres remains the interface. Meanwhile, behind the scenes, pgGraph maintains the virtual relationship graph needed for massive scale.

pgGraph Performance at Scale

To demonstrate the power of the in-memory virtual graph layer, we ran extensive benchmarks on pgGraph using two distinct datasets: the PANAMA dataset (2M+ nodes) and the massive LDBC Social Network Benchmark (3.1M+ nodes, 34.5M+ edges).

Note: These benchmarks measure pgGraph's performance in isolation. We do not plot Apache AGE here, as deep traversals (10+ hops) on datasets of this size resulted in query timeouts under the recursive SQL model.

Methodology
  • Cold Run: Docker container restart before each cold query; excludes graph.build(); OS cache may remain warm depending on host.
  • Hot Run: one unrecorded warm-up pass, then repeated measured SQL in one persistent psycopg PostgreSQL backend.

Query Performance

Dataset: PANAMA (2,016,523 nodes, 5,802,586 edges)

Cold Run (ms)
Hot Run (ms)
Status
32.2 ms
900.2
32.2
Entity Search
353.9 ms
1005.6
353.9
Traverse Depth 2
117.3 ms
699.6
117.3
Shortest Path
4.0 ms
491.5
4.0
Component Stats
157.3 ms
651.2
157.3
Largest Component
613.2 ms
1124.4
613.2

Query Performance

Dataset: LDBC (3,181,724 nodes, 34,512,076 edges)

Cold Run (ms)
Hot Run (ms)
Status
27.2 ms
2870.4
27.2
Person Search
9.8 ms
2762.0
9.8
Friend Traversal Depth 1
34.1 ms
2806.0
34.1
Person Content Neighborhood
177.8 ms
3008.8
177.8
Forum Neighborhood
181.7 ms
3014.1
181.7
Post To Tag Path
6.5 ms
2825.5
6.5
Tag To Tagclass Path
7.0 ms
2979.0
7.0
Component Stats
428.4 ms
3425.9
428.4

Reflection and Analysis

The numbers speak for themselves. In the "Hot Run" execution path, where the in-memory graph layer is fully warmed up and served by a persistent psycopg PostgreSQL backend, pgGraph delivers microsecond and low-millisecond latencies across complex pathfinding and deep traversal queries.

Even for the massive 34.5 million edge LDBC dataset, a deep "Friend Traversal" executes in just 34.1 milliseconds, and "Post To Tag Path" queries complete in under 7 milliseconds.

pgGraph is not a drop-in replacement for AGE. We do not support openCypher, opting instead for specialized SQL-backed search patterns optimized for our memory model. If your primary goal is migrating an existing, lightweight Neo4j application directly to Postgres, AGE's Cypher compatibility is a strong asset.

Our ongoing conversations with engineers building AI applications and autonomous agents at scale have crystallized a core thesis: for AI workloads, you can strip away almost all of the bloated features of a traditional graph database. Agents don't need sprawling query languages or heavy transactional abstractions—they need raw, real-time structural context.

The evolution here mirrors the history of computer graphics. Traditional graph databases are like offline CGI rendering—incredibly feature-rich, capable of modeling anything, but fundamentally too slow for real-time interaction. What AI agents actually need is an Unreal Engine. They need a system designed from the ground up for a real-time hot loop, stripping away everything that doesn't serve the immediate traversal.

pgGraph applies that exact mindset to Postgres data. By discarding the heavy abstractions of relational emulation and compiling edges into a bare-metal CSR array, we achieve graph traversals at speeds that standard Postgres query planners physically cannot match.

If you are hitting performance ceilings because your application requires deep structural context, 10+ hop paths, and real-time reasoning loops, recursive SQL will not scale. You don't need a heavier database; you need a dedicated, hyper-optimized virtual graph layer. That is why we built pgGraph, and that is why we are open-sourcing it for the community.