Skip to content

Reference Architecture

This guide describes recommended best practices for infrastructure architects and operators to follow when deploying VOR Stream in a production environment. This guide includes general guidance as well as specific recommendations for popular cloud infrastructure platforms.

Description of Services

The diagram below highlights the services that are part of an installation of VOR Stream.

Reference Architecture Image Reference Architecture Image

Identity Provider

VOR Stream leverages OpenID Connect (OIDC) with the authorization code flow, a secure and widely adopted authentication method. This approach enhances security by requiring the client to exchange an authorization code for an access token, significantly reducing the risk of token interception. It's compatible with a variety of identity providers, ensuring a secure and user-friendly login process. Notable supported identity providers include:

  • Microsoft Entra ID
  • GitHub
  • Google
  • Okta

This enables SSO, allowing users to access VOR Stream with a single set of credentials. Credentials are never shared with VOR Stream. This also allows support for MFA and other advanced security measures as defined by the identity provider, providing robust protection for user data.

To ensure seamless integration, identity providers must comply with the OpenID Connect (OIDC) specification, specifically supporting the authorization code flow.

Tip

If you require LDAP integration, VOR Stream offers the flexibility to utilize Vault as an OIDC provider for authenticating users against an LDAP server. This OIDC provider is automatically configured if necessary variables are set in the inventory file.

For guidance on integrating VOR Stream with an identity provider using the authorization code flow, please refer to the Registering VOR with an Authentication Provider section for comprehensive instructions and further details.

Hashicorp Vault

VOR Stream utilizes Hashicorp Vault for the following purposes:

  • Secrets Management: various Vault secrets engines are leveraged to securely store secrets for the application.
    • Database: Static secrets (with password rotation) and dynamically generated secrets for database connections.
    • Key/Value: OIDC client secrets for identity providers (used by the VOR Stream Midtier and Django Midtier), static database secrets, etc.
    • RabbitMQ: Dynamic credentials for RabbitMQ.
    • SSH Secrets Engine: Signed SSH key generation for access to compute hosts during scaling.
  • Authentication:

A Vault installation can be deployed as part of the Ansible deployment, or if it's desired to use Vault Enterprise features, VOR Stream can be configured to connect to an existing Vault Enterprise cluster. If deploying into an existing Vault Enterprise cluster, the cluster can either be self-managed or cloud-managed via HCP Vault Dedicated (other HCP Vault offerings are not supported as they do not have required capabilities used by VOR Stream).

For detailed instructions on deploying VOR Stream with an existing Vault Enterprise cluster, see the Deploying with Vault Enterprise guide.

Hashicorp Consul

Hashicorp Consul is used for service discovery, monitoring of service health, configuration, and storage of the VOR Stream Midtier's persistence. At a minimum, a single Consul agent server is deployed in a VOR Stream environment; however, it is recommended to have multiple, dedicated Consul servers so that Consul's high availability can be leveraged. This deployment is supported via Ansible.

VOR Stream Midtier

The VOR Stream Midtier acts as a controller for all VOR Stream processes. When a run is requested, it sets up the required RabbitMQ queues and executes the run on the VOR Stream Compute Server. While the run is executing, it monitors the status, which is available via the midtier's API.

Note

If the compute, SDK, and/or midtier services are deployed on separate hosts, all playpen directories must be mounted and identical paths across both midtier and compute hosts. For example, if a playpen directory is mounted at /data/playpen on the midtier, it must also be mounted at /data/playpen on all compute hosts.

VOR Stream Compute Server

The VOR Stream Compute Server is where VOR Stream nodes are executed. These can be Go, Python, or SAS nodes. Compute hosts communicate directly with the VOR Stream Midtier for status reporting and with RabbitMQ for sending and receiving records to and from queues.

Note

If the compute, SDK, and/or midtier services are deployed on separate hosts, all playpen directories must be mounted and identical paths across both midtier and compute hosts. For example, if a playpen directory is mounted at /data/playpen on the midtier, it must also be mounted at /data/playpen on all compute hosts.

Tip

If deploying in AWS or Azure, consider separating the compute service from other VOR Stream services by deploying onto dedicated hosts to enable the compute resources to be powered off when not in use, which can significantly reduce costs. See the Cloud Integrations page for more details. The SDK service can still be deployed alongside compute services without impacting the ability to power off unused compute resources.

VOR Stream SDK gRPC Server

The VOR Stream SDK gRPC server is used by VOR Stream developers in development of VOR Stream nodes. The primary function of the SDK is retrieval of risk objects from the Django API. The SDK libraries are available in both Go and Python.

Note

If the compute, SDK, and/or midtier services are deployed on separate hosts, all playpen directories must be mounted and identical paths across both midtier and compute hosts. For example, if a playpen directory is mounted at /data/playpen on the midtier, it must also be mounted at /data/playpen on all compute hosts.

RabbitMQ

RabbitMQ is the queueing system leveraged by VOR Stream.

A RabbitMQ installation can be deployed as part of the deployment process or VOR Stream can be configured to connect to a managed or existing RabbitMQ cluster.

Django Midtier

Django is deployed as the midtier for the workbench. Uploads and workbench risk objects are stored using its object-relational mapper. VOR Stream application security is controlled using its admin interface. Django also acts as a proxy to the VOR Stream midtier for the workbench, which allows users to execute and monitor VOR Stream processes from the workbench.

Caddy

A Caddy web server is deployed to serve the workbench web client and Django API service.

Database

A relational database management system (RDBMS) is used as the backend for the Django midtier. The following are supported:

  • PostgreSQL: PostgreSQL is the default database backend for VOR Stream. An installation can be deployed as part of the deployment process or VOR Stream can be configured to connect to an externally-managed PostgreSQL instance.
  • Microsoft SQL Server: VOR Stream can be configured to an externally-managed Microsoft SQL Server instances.

Warning

Django officially supports other databases backends, but they are not officially supported by VOR Stream.

System Requirements

This section contains specific hardware capacity recommendations, network requirements, and additional infrastructure considerations. Since every hosting environment is different and every organization's VOR Stream usage profile is different, these recommendations should only serve as a starting point from which each organization's operations staff may observe and adjust to meet the unique needs of each deployment.

Hardware Sizing

Sizing recommendations have been divided into two common deployment sizes.

  • Small deployments would be appropriate for most initial production deployments or for development and testing environments.

  • Large deployments are production environments with high compute and data processing requirements.

Service assignments to nodes have been grouped into the following categories:

  • Compute: Stream Compute Server, Stream Midtier, gRPC Server
  • RabbitMQ: RabbitMQ
  • Midtier: Caddy, Django, PostgreSQL
  • Service Catalog and Secrets: Consul, Vault

Compute

The VOR Stream Compute Server is the most resource-intensive service in the VOR Stream architecture. It is recommended to deploy the Compute Server on a dedicated host or virtual machine. The following table provides recommended hardware specifications for the Compute Server.

Size CPU Memory Disk Capacity Disk IO Disk Throughput
Small 4-16 core 8-64 GB RAM 100+ GB 3000+ IOPS 100+ MB/s
Large 32-64 core 128-256 GB RAM 500+ GB 3000+ IOPS 1000+ MB/s

For each deployment size, the following table gives recommended hardware specs for each major cloud infrastructure provider.

Provider Size Instance/VM Types Disk Volume Specs
AWS Small c8g.xlarge, c8g.2xlarge, c8g.4xlarge, m8g.large, m8g.2xlarge, m8g.4xlarge 100+GB gp3, 3000 IOPS, 125MB/s
AWS Large m8g.8xlarge, m8g.16xlarge, c8g.8xlarge, c8g.16xlarge 500+GB gp3, 3000 IOPS, 1000MB/s
Azure Small
Azure Large

Note

For AWS servers, AWS Graviton (denoted by "g" in the family name) instance types are recommended for their better price-performance ratio over x86-based instances. If x86 is preferred, the equivalent x86 instance family, e.g., c7a or c7i, can be used.

Tip

Disk sizing and performance requirements will vary based on the specific needs of the organization. For example, if the organization has a large number of nodes that require a large amount of data to be processed, the disk capacity and throughput requirements will be higher. Additionally, groups of volumes can be configured in a RAID 0 array to increase disk capacity and throughput. For more information, see the AWS documentation on RAID configuration.

RabbitMQ

Since VOR Stream relies on RabbitMQ for passing data between nodes, it is important to ensure that RabbitMQ has sufficient resources to handle the volume of data being processed. Memory optimized instances are recommended as RabbitMQ will first hold queue data in memory. Paging to disk only occurs when the configured memory limit is reached, significantly impacting performance. In additionally to scaling up, organizations may also consider scaling out RabbitMQ by deploying a RabbitMQ cluster, which is supported by the VOR Stream deployment. The following table provides recommended hardware specifications for RabbitMQ.

Size CPU Memory Disk Capacity Disk IO Disk Throughput
Small 1-2 core 8-16 GB RAM 30+ GB 3000+ IOPS 100+ MB/s
Large 4-8 core 32-64 GB RAM 100+ GB 3000+ IOPS 1000+ MB/s

For each deployment size, the following table gives recommended hardware specs for each major cloud infrastructure provider.

Provider Size Instance/VM Types Disk Volume Specs
AWS Small r8g.medium, r8g.large 30+GB gp3, 3000 IOPS, 125MB/s
AWS Large r8g.xlarge, r8g.2xlarge 100+GB gp3, 3000 IOPS, 1000MB/s
Azure Small
Azure Large

Midtier

The VOR Stream Midtier is the second-least resource-intensive group of services in the VOR Stream architecture. The following table provides recommended hardware specifications for the Midtier.

Warning

If deploying the PostgreSQL database on the same host as the Midtier, these sizing estimates assume the database is will only be used for the backend storage of the VOR Stream Django application. If the database will be used for other purposes, such as a data warehouse, the database's resource requirements can be significantly higher, and it is recommended to deploy the database on a separate host or virtual machine.

Size CPU Memory Disk Capacity Disk IO Disk Throughput
Small 2-4 core 4-8 GB RAM 100+ GB 3000+ IOPS 100+ MB/s
Large 4-8 core 16-32 GB RAM 100+ GB 3000+ IOPS 100+ MB/s

For each deployment size, the following table gives recommended hardware specs for each major cloud infrastructure provider.

Provider Size Instance/VM Types Disk Volume Specs
AWS Small c8g.medium, c8g.large 100+GB gp3, 3000 IOPS, 125MB/s
AWS Large m8g.large, m8g.2xlarge, c8g.2xlarge 100+GB gp3, 3000 IOPS, 125MB/s
Azure Small
Azure Large

Service Catalog and Secrets

VOR Stream's Consul and Vault services are the least resource intensive services. The small deployment size will be sufficient for most organizations, particularly if deploying multiple nodes in a Consul/Vault cluster.

Size CPU Memory Disk Capacity Disk IO Disk Throughput
Small 1-2 core 2-4 GB RAM 30+ GB 3000+ IOPS 75+ MB/s
Large 2-4 core 8-16 GB RAM 30+ GB 3000+ IOPS 75+ MB/s

For each deployment size, the following table gives recommended hardware specs for each major cloud infrastructure provider.

Provider Size Instance/VM Types Disk Volume Specs
AWS Small m8g.medium, c8g.medium, t4g.small, t4g.medium 30+GB gp3, 3000 IOPS, 125MB/s
AWS Large m8g.large, m8g.xlarge 30+GB gp3, 3000 IOPS, 125MB/s
Azure Small
Azure Large

Deployment Scenarios

Single-Tier

Multi-Tier

Cloud Multi-Tier with High-Availability and Managed Services