Knowledge Graphs
When you need one, when you just need graph thinking, and how to capture most of the value in Postgres.
← Back to Reference HubA schema gives the edges meaning. The model, not a product. You can adopt the model in Postgres in an afternoon — no migration, no new database.
- Best for: domains where the path between entities is the query — fraud, identity resolution, supply chain, agent memory, recommendation
- Not for: workloads that look like "rows where joined columns satisfy a predicate" — that is a relational problem dressed up
- Adoptable in Postgres without buying a platform
Native graph storage with index-free adjacency. Cypher or Gremlin as query languages. Strong for deep traversals and graph algorithms.
- Best for: 5+ hop hot-path queries on millions of edges, real-time recommendation walks, fraud rings, investigative platforms with unpredictable multi-hop queries
- Not for: small consumer apps, 1-3 hop application workloads, anywhere ops simplicity beats traversal speed
- Adds a second operational surface, a second query language, and a vendor or self-host relationship
W3C standards. Triples (subject, predicate, object), URIs, OWL inference, SPARQL. Built for ontology-driven reasoning and federation.
- Best for: life sciences, healthcare, government linked data, regulated domains needing SHACL validation or provable inference chains
- Not for: application development — ergonomics are painful, modeling overhead is significant, the community pipeline assumes a librarian
- If you are not sure you need OWL reasoning, you do not need it
Columns: from_id, to_id, edge_type, properties JSONB, valid_from, valid_to, source_doc_id, confidence. Recursive CTEs for traversal.
- Best for: most application workloads — handles 1-3 hops comfortably and 4-6 hops at meaningful scale before you start to feel it
- Not for: variable-length paths past 6 hops on hot data, native graph algorithms (PageRank, Louvain) at scale
- The single most important discipline: bake in provenance and temporal columns from day one. Backfilling them later is brutal
Same database, same connection pool, same backup story. Lets you write graph queries without leaving Postgres.
- Best for: teams who want Cypher ergonomics but cannot justify a second operational surface — a migration ramp toward Neo4j if you ever need it
- Not for: workloads that genuinely need native graph algorithms or 10+ hop traversals — AGE inherits Postgres performance characteristics
- Useful middle path before going full property graph DB
HNSW and IVFFlat indexes. Stores entity and edge embeddings for similarity search.
- Best for: semantic similarity search on entities, "find related things," and entity resolution that needs more than fuzzy string matching
- Not for: >10M-vector search at sub-50ms p99 — at that scale a dedicated vector store may pull ahead
- Most apps never reach that scale. Start with pgvector and only graduate when measured
JSON schema and function calling to extract entities and (subject, predicate, object) triples from unstructured text — PDFs, web content, transcripts, support tickets.
- Best for: bootstrapping a knowledge graph from unstructured sources — the 2026 replacement for Diffbot-style services
- Not for: production extraction without an eval harness — drift is real; pin a small labeled set and watch it
- Costs on the order of pennies per document with a small model. The historical "ingestion is the dominant cost" excuse for buying a KG platform mostly evaporated
Treating "we have a graph" as the goal
| Capability | Postgres + edge tables | Property graph DB | RDF triple store |
|---|---|---|---|
| 1-3 hop traversals | Excellent | Excellent | Excellent |
| 5+ hop traversals at scale | Limited | Excellent | Good |
| Graph algorithms (PageRank, Louvain) | Batch only | Native | Limited |
| OWL inference / SHACL | Not supported | Not supported | Native |
| Temporal validity on edges | Trivial | Possible, manual | Possible, manual |
| Provenance on edges | Trivial | Possible, manual | Native via reification |
| Operational overhead | One database | Second database | Second database |
| Team ramp-up cost | Low (SQL) | Medium (Cypher) | High (SPARQL/OWL) |
Edges as first-class statements with provenance and time
The single most underrated KG-flavored pattern, and the one teams most often skip. Make your edges table look like this from day one: from_id, to_id, edge_type, properties JSONB, valid_from, valid_to, source_doc_id, extracted_at, confidence. Index on (from_id, edge_type), (to_id, edge_type), and (valid_from, valid_to). Backfilling these columns onto an existing schema is one of the most painful migrations you can do; baking them in costs nothing.