Back to blog
postgresgraphengineeringarchitecture

Postgres as a Graph Database: Four Approaches Compared

Dale Everett
Dale Everett
COO
·
May 17, 2026
·
12 min read

Every relational database is a graph database in disguise. Foreign keys are edges. Tables are node collections. Join tables are hyperedges connecting multiple entities at once. The relationships are already there. The question is how you traverse them.

If you have searched for "Postgres as a graph database" you have probably found advice ranging from recursive CTEs to full graph database migrations. Most of that advice is fine for small datasets. It falls apart when your graph has millions of edges, your traversals go deeper than three hops, or your application needs answers in milliseconds.

This post compares the four main approaches to graph querying in Postgres, explains when each one works, and shows where each one breaks. We built pgGraph after hitting the ceiling of the first three, so we will be direct about why.

Your Postgres Schema is Already a Graph

Before we compare approaches, it is worth understanding what you already have. Consider a typical SaaS schema:

-- Users belong to organizations
CREATE TABLE organizations (id SERIAL PRIMARY KEY, name TEXT);
CREATE TABLE users (id SERIAL PRIMARY KEY, org_id INT REFERENCES organizations(id), name TEXT);

-- Users create tickets
CREATE TABLE tickets (id SERIAL PRIMARY KEY, creator_id INT REFERENCES users(id), assignee_id INT REFERENCES users(id), title TEXT);

-- Tickets have comments
CREATE TABLE comments (id SERIAL PRIMARY KEY, ticket_id INT REFERENCES tickets(id), author_id INT REFERENCES users(id), body TEXT);

This schema encodes a graph. Organizations connect to users. Users connect to tickets as creators and assignees. Tickets connect to comments. Comments connect back to users. You can draw this as a network diagram and it makes immediate sense.

The problem is that SQL was not designed to walk this network. It was designed to join two tables at a time. When you need to answer "show me every entity within five hops of this user," SQL starts fighting you.

Approach 1: Recursive CTEs

The simplest way to do graph traversal in Postgres is a recursive common table expression. No extensions. No additional software. Pure SQL.

WITH RECURSIVE connected AS (
  -- Base case: start from a specific user
  SELECT u.id, u.name, 0 AS depth
  FROM users u
  WHERE u.id = 42

  UNION ALL

  -- Recursive case: find connected users through tickets
  SELECT u2.id, u2.name, c.depth + 1
  FROM connected c
  JOIN tickets t ON t.creator_id = c.id OR t.assignee_id = c.id
  JOIN users u2 ON u2.id = t.creator_id OR u2.id = t.assignee_id
  WHERE c.depth < 3
    AND u2.id != c.id
)
SELECT DISTINCT id, name, MIN(depth) AS shortest_distance
FROM connected
GROUP BY id, name
ORDER BY shortest_distance;

When recursive CTEs work

  • Your graph has fewer than ~100,000 edges
  • You need traversals of 2 to 3 hops at most
  • Query latency in the hundreds of milliseconds is acceptable
  • You do not want to install any extensions

When recursive CTEs break

Recursive CTEs execute inside the Postgres query planner. Every hop generates a new intermediate result set that Postgres joins against the base tables. At depth 5 or beyond, the planner produces nested loop joins over exponentially growing working sets. On a table with a few million rows, a 5-hop recursive CTE can take seconds. At 10 hops, it typically times out or exhausts working memory.

The fundamental issue is that the query planner was not designed for this access pattern. It optimizes for set operations, not graph walks. Increasing depth does not increase query complexity linearly. It increases it exponentially.

Approach 2: pgRouting

pgRouting is a PostGIS extension that provides graph algorithms like Dijkstra shortest path and A* search. It was built for geospatial routing (finding the fastest path between two GPS coordinates) but can be repurposed for non-spatial graph problems.

-- Using pgRouting for shortest path
SELECT * FROM pgr_dijkstra(
  'SELECT id, source, target, cost FROM edges',
  1,   -- start node
  100  -- end node
);

When pgRouting works

  • You need shortest path or minimum cost algorithms specifically
  • Your graph is relatively small (thousands to low millions of edges)
  • You are already using PostGIS
  • Your use case maps cleanly to weighted edges with a cost function

When pgRouting breaks

pgRouting was designed for road networks, not arbitrary entity graphs. It expects a specific edge table format with source, target, and cost columns. Modeling complex enterprise relationships (where a single join table connects six entities simultaneously) requires decomposing those relationships into pairwise edges, which inflates the graph and distorts path semantics.

Performance is acceptable for GIS-scale problems but degrades on large, dense graphs. pgRouting still executes within the Postgres process, so it shares the same memory and CPU constraints as the database itself. There is no dedicated graph index structure. Deep traversals on million-edge graphs will hit the same planner bottlenecks as recursive CTEs, just with better algorithm implementations on top.

Approach 3: Apache AGE

Apache AGE brings the openCypher query language to Postgres. It lets you write graph queries using Cypher syntax while storing data in Postgres.

-- Apache AGE: Cypher inside Postgres
SELECT * FROM cypher('my_graph', $$
  MATCH (u:User {id: 42})-[:CREATED]->(t:Ticket)-[:ASSIGNED_TO]->(u2:User)
  RETURN u2.name, t.title
$$) AS (name agtype, title agtype);

When Apache AGE works

  • You are migrating from Neo4j and want to keep using Cypher
  • Your traversals stay within 2 to 4 hops
  • You value query language expressiveness over raw performance
  • Your dataset is in the low millions of edges

When Apache AGE breaks

Apache AGE translates Cypher into recursive SQL operations inside Postgres. The graph data is stored in relational tables. When you execute a multi-hop traversal, AGE generates recursive joins against those tables, which means you inherit the same exponential blowup as raw recursive CTEs, just wrapped in nicer syntax.

We have tested AGE extensively. At 3 to 4 hops on a dataset with a few million edges, performance is reasonable. At 10 hops, queries start timing out. At 15 or 20 hops, they do not return. The underlying problem is architectural: AGE does not build a dedicated traversal index. It emulates graph operations using the relational engine, and the relational engine was not built for deep walks.

For teams whose queries stay shallow, AGE is a solid choice. For teams building AI agents that need to reason across 10 or more relationship hops in real time, AGE hits a hard ceiling.

Approach 4: In-Memory Graph Layer (pgGraph)

This is the approach we built. pgGraph is an open-source Postgres extension that compiles your existing foreign key relationships into a Compressed Sparse Row (CSR) array held entirely in memory.

The key insight: separate topology from data. Postgres stores and serves rows. pgGraph stores and traverses relationships. When a traversal query arrives, pgGraph walks the in-memory integer array (not the Postgres tables) to find matching paths. Only after the path is resolved does it go back to Postgres to hydrate the actual row data.

How it works

  1. Schema scan. pgGraph reads your Postgres schema and identifies foreign key relationships automatically.
  2. Graph compilation. It builds a CSR array from those relationships. This is a flat integer array where finding a node's neighbors requires a single array offset calculation. No B-tree index lookups. No pointer chasing.
  3. In-memory traversal. When a query asks for paths, neighbors, or connected components, pgGraph walks the CSR array in a tight Rust loop. No garbage collection. No query planner overhead.
  4. Hydration. The traversal returns a set of node IDs. pgGraph fetches the corresponding rows from Postgres in a single batch query.

What the numbers look like

On the Panama Papers dataset (2 million nodes, 5.8 million edges):

  • Shortest path: 4ms (hot) vs 491ms (cold)
  • Depth-2 traversal: 117ms (hot)
  • Component statistics: 157ms (hot)

On the LDBC Social Network Benchmark (3.1 million nodes, 34.5 million edges):

  • Person search: 9.8ms (hot)
  • Friend traversal: 34.1ms (hot)
  • Post-to-tag path: 6.5ms (hot)

The memory footprint is roughly 34x smaller than loading the same dataset into Neo4j, because pgGraph stores only topology (integer IDs and edge types), not row payloads.

When pgGraph works

  • You need traversals deeper than 5 hops
  • Your dataset has millions of edges
  • Your application (or AI agent) needs sub-50ms response times
  • You want graph capabilities without migrating data out of Postgres
  • You are building agentic workflows that require real-time structural context

When pgGraph is not the right fit

  • You need openCypher compatibility (use AGE)
  • Your use case is strictly shortest-path on a road network (use pgRouting)
  • Your graph is tiny and a recursive CTE is fast enough

Comparison Summary

Recursive CTEspgRoutingApache AGEpgGraph
InstallNoneExtensionExtensionExtension
Query languageSQLSQL + pgr functionsCypherSQL
Max practical depth3-4 hops5-6 hops4-5 hops20+ hops
Million-edge performanceSeconds to timeoutSecondsSeconds to timeoutMilliseconds
Memory modelPostgres buffer poolPostgres buffer poolPostgres tablesDedicated CSR array
Data movementNoneNoneSchema requiredNone
Best forSimple queriesGIS routingCypher migrationDeep traversal at scale
LicensePostgreSQLGPL-2.0Apache 2.0Apache 2.0
PriceFreeFreeFreeFree

Why Depth Matters More Than It Used To

Two years ago, most graph queries in production were 2 to 3 hops. Find a user's friends. Find a product's related items. Find an account's recent transactions. Recursive CTEs handled this fine.

AI agents changed the equation. When an agent is trying to determine why a supply chain failed, it needs to traverse from the delay event, to the warehouse, to the shift manager, to the staffing policy, to the exception approval, to the budget owner. That is 6 hops minimum, and the agent might need to explore multiple branches at each step.

When an agent is building context for a support ticket, it needs to walk from the ticket to the customer, to their organization, to their contract, to the SLA terms, to similar past violations, to the resolution that worked. That is 7 or more hops.

At this depth, the difference between approaches is not incremental. It is binary. Either your infrastructure can walk the path in milliseconds, or the agent stalls and the user waits.

Getting Started

pgGraph is free, open source, and licensed under Apache 2.0. There is no paid tier, no usage cap, and no vendor lock-in. It is a Postgres extension you install and own.

If you are just exploring graph queries in Postgres and your dataset is small, start with recursive CTEs. They require nothing beyond standard SQL and will teach you how your data connects.

If you are hitting performance limits or building applications that need real-time traversal at depth, try pgGraph:

  1. Read the documentation at pggraph.com
  2. Star the repo on GitHub
  3. Point it at your Postgres schema and run a traversal

Your data is already a graph. The question is whether your infrastructure can traverse it fast enough to matter.