JanusGraph
JanusGraph is an open-source distributed graph database that runs on top of an existing wide-column store (Apache Cassandra, ScyllaDB, HBase, or BerkeleyDB) for storage and an external indexing system (Elasticsearch, Solr, or Lucene) for full-text and geo queries. JanusGraph is the spiritual successor to Titan, forked by IBM, Google, Hortonworks, and others in 2017. It uses Apache TinkerPop’s Gremlin as its query language and is the standard choice when Neo4j’s single-node scale is insufficient.
Key Features:
- Pluggable Backends. Storage on Cassandra/Scylla (recommended), HBase, or BerkeleyDB; indexes on Elasticsearch, Solr, or Lucene.
- Gremlin / TinkerPop. Query language is the open Apache TinkerPop standard — portable across many graph engines.
- OLTP + OLAP. Real-time traversals plus Spark / Hadoop integration for whole-graph analytics on stored data.
- Linear Scalability. Inherits the horizontal scalability of the underlying wide-column store; tested at billions of vertices and tens of billions of edges.
- Vertex Centric Indexes. Per-vertex secondary indexes accelerate traversals on supernodes (vertices with millions of edges).
- Schema Optional. Strict mode validates labels and properties at write time; lax mode allows freeform graphs.
JanusGraph vs. Neo4j:
- JanusGraph. Distributed across a Cassandra cluster, billions+ of edges, Gremlin query language. Operational complexity = Cassandra + Elasticsearch + JanusGraph.
- Neo4j. Single-node deep traversals, Cypher query language, simpler to run, smaller scale ceiling.
Use Cases:
- Massive knowledge graphs — tens of billions of edges across thousands of node labels.
- Identity-resolution graphs at carrier or social-network scale.
- Workloads where Gremlin / TinkerPop portability across engines is a hard requirement.
- Organizations already running Cassandra and wanting a graph-shaped query layer over the same data.