Reference Guide

Knowledge Graphs

When you need one, when you just need graph thinking, and how to capture most of the value in Postgres.

← Back to Reference Hub

A schema gives the edges meaning. The model, not a product. You can adopt the model in Postgres in an afternoon — no migration, no new database.

  • Best for: domains where the path between entities is the query — fraud, identity resolution, supply chain, agent memory, recommendation
  • Not for: workloads that look like "rows where joined columns satisfy a predicate" — that is a relational problem dressed up
  • Adoptable in Postgres without buying a platform
PatternAlways Applicable

Native graph storage with index-free adjacency. Cypher or Gremlin as query languages. Strong for deep traversals and graph algorithms.

  • Best for: 5+ hop hot-path queries on millions of edges, real-time recommendation walks, fraud rings, investigative platforms with unpredictable multi-hop queries
  • Not for: small consumer apps, 1-3 hop application workloads, anywhere ops simplicity beats traversal speed
  • Adds a second operational surface, a second query language, and a vendor or self-host relationship
PlatformDeep Traversals

W3C standards. Triples (subject, predicate, object), URIs, OWL inference, SPARQL. Built for ontology-driven reasoning and federation.

  • Best for: life sciences, healthcare, government linked data, regulated domains needing SHACL validation or provable inference chains
  • Not for: application development — ergonomics are painful, modeling overhead is significant, the community pipeline assumes a librarian
  • If you are not sure you need OWL reasoning, you do not need it
StandardsSpecialist

Columns: from_id, to_id, edge_type, properties JSONB, valid_from, valid_to, source_doc_id, confidence. Recursive CTEs for traversal.

  • Best for: most application workloads — handles 1-3 hops comfortably and 4-6 hops at meaningful scale before you start to feel it
  • Not for: variable-length paths past 6 hops on hot data, native graph algorithms (PageRank, Louvain) at scale
  • The single most important discipline: bake in provenance and temporal columns from day one. Backfilling them later is brutal
Recommended DefaultPostgres

Same database, same connection pool, same backup story. Lets you write graph queries without leaving Postgres.

  • Best for: teams who want Cypher ergonomics but cannot justify a second operational surface — a migration ramp toward Neo4j if you ever need it
  • Not for: workloads that genuinely need native graph algorithms or 10+ hop traversals — AGE inherits Postgres performance characteristics
  • Useful middle path before going full property graph DB
Postgres ExtensionMiddle Path

HNSW and IVFFlat indexes. Stores entity and edge embeddings for similarity search.

  • Best for: semantic similarity search on entities, "find related things," and entity resolution that needs more than fuzzy string matching
  • Not for: >10M-vector search at sub-50ms p99 — at that scale a dedicated vector store may pull ahead
  • Most apps never reach that scale. Start with pgvector and only graduate when measured
Postgres ExtensionSimilarity

JSON schema and function calling to extract entities and (subject, predicate, object) triples from unstructured text — PDFs, web content, transcripts, support tickets.

  • Best for: bootstrapping a knowledge graph from unstructured sources — the 2026 replacement for Diffbot-style services
  • Not for: production extraction without an eval harness — drift is real; pin a small labeled set and watch it
  • Costs on the order of pennies per document with a small model. The historical "ingestion is the dominant cost" excuse for buying a KG platform mostly evaporated
Ingestion2026 Default

Treating "we have a graph" as the goal

A graph is not a deliverable. The deliverable is the question your graph answers — the multi-hop fraud query you cannot currently run, the temporal lookup you cannot currently express. Start from the query, not from the architectural diagram. Most teams who adopt a graph DB before having that query in hand discover, six months later, that they paid for capabilities they never use.