Netflix’s Graph Abstraction: Powering Personalized Experiences at Scale
Netflix has developed a high-throughput graph data platform, known as Graph Abstraction, to manage and analyze complex relationships across its vast ecosystem of services and users. This system, capable of handling 650 TB of graph data with millisecond latency, is crucial for powering features like personalized recommendations in Netflix Gaming and improving operational efficiency through real-time service dependency analysis.
The Challenge of Scale and Speed
Traditional graph databases often struggle to balance the require for expressive queries with predictable performance at scale. Although they excel at flexible data traversal, many operational workloads at Netflix demand extremely fast response times and high throughput. Netflix’s Graph Abstraction addresses this challenge by prioritizing speed and consistency, even if it means sacrificing some query flexibility. The system restricts traversal depth and typically requires a defined starting node for queries.
Key Components of Graph Abstraction
Graph Abstraction isn’t a standalone database but rather a layer built on top of Netflix’s existing data infrastructure. Here’s a breakdown of its core components:
- Key-Value Abstraction: The latest graph state is stored using a Key-Value store.
- TimeSeries Abstraction: Historical graph changes are recorded, enabling analytics, auditing and temporal queries.
- EVCache Integration: Leverages Netflix’s distributed caching layer, EVCache, to reduce latency.
- Schema Enforcement: Graph schemas are loaded into memory and strictly enforced to ensure data integrity and optimize query planning.
Optimizing for Performance
Several techniques contribute to Graph Abstraction’s high performance:
- Write-Aside Caching: Prevents duplicate edge writes.
- Read-Aside Caching: Accelerates access to node and edge properties.
- gRPC Traversal API: Exposes a gRPC API inspired by Gremlin, allowing services to chain traversal steps and apply filters.
- Asynchronous Replication: Ensures global availability through asynchronous replication across regions, providing eventual consistency.
Use Cases Across Netflix
Graph Abstraction supports a variety of critical use cases within Netflix:
- Real-Time Distributed Graph: Captures interactions across all services within the Netflix ecosystem.
- Social Graph (Netflix Gaming): Models user relationships to enhance engagement within Netflix Gaming.
- Service Topology Graph: Analyzes dependencies between services to accelerate root cause analysis during incidents.
Performance Metrics
In production, Graph Abstraction delivers impressive performance: single-digit millisecond latency for single-hop traversals and under 50 milliseconds for two-hop queries at the 90th percentile. This predictable performance is achieved through careful balancing of traversal planning and execution. According to LinkedIn posts, the system achieves 10 million operations per second.
The Future of Graph Abstraction
As Netflix expands into new areas like live content, gaming, and advertising, Graph Abstraction is poised to become even more vital. Its ability to model complex relationships between users, services, and content, while maintaining high throughput and low latency, will be essential for delivering personalized and engaging experiences at scale.