Deployment Troubleshooting¶
This section covers common issues encountered during VOR Stream deployment and their solutions.
RabbitMQ Connection Failures with TLS Certificate Errors¶
Problem Description¶
After completing a VOR Stream deployment, users may encounter the following error when attempting to run a job. This error typically appears immediately after deployment when attempting to run your first job:
FATAL Process "example_model_run" failed to run: Job Failed: unable to generate credentials: cannot read secret "rabbitmq/creds/vor-midtier-role" in Vault--secret does not exist or you do not have permission to view it: Error making API request.
URL: GET https://<hostname>:8200/v1/rabbitmq/creds/vor-midtier-role
Code: 500. Errors:
* 1 error occurred:
* failed to create a new user with the generated credentials
Additionally, the RabbitMQ logs will show repeated TLS certificate errors:
TLS server: In state certify received CLIENT ALERT: Fatal - Bad Certificate
The Super service logs will contain:
level=error msg="unable to generate credentials: cannot read secret \"rabbitmq/creds/vor-midtier-role\" in Vault--secret does not exist or you do not have permission to view it"
Root Cause¶
This issue occurs when the reverse DNS (PTR) records
are not properly configured for the RabbitMQ server, which is a required
system configuration. During deployment, Ansible uses the ansible_fqdn
variable (which performs a reverse DNS lookup) to determine the RabbitMQ
hostname. This hostname is then statically configured in:
- Consul service registration - The RabbitMQ service address registered in Consul
- Vault secrets engine - The RabbitMQ connection URL configured in Vault for dynamic credential generation
If the reverse DNS lookup returns an incorrect hostname or fails, the configured hostname won't match the hostname in the RabbitMQ server's TLS certificate, causing certificate validation failures.
The failure sequence is:
- Ansible deployment uses
ansible_fqdn(which internally calls Python'ssocket.getfqdn()) to get the RabbitMQ hostname - This hostname is configured in Consul and Vault during deployment
- When VOR attempts to generate RabbitMQ credentials through Vault, Vault uses the configured hostname to connect to RabbitMQ
- The TLS handshake fails because the certificate's Common Name (CN) or Subject Alternative Name (SAN) doesn't match the configured hostname
- Vault cannot establish a connection to RabbitMQ to create the dynamic user
- The job fails with the credential generation error
Solutions¶
There are two approaches to resolve this issue:
Option 1: Configure Reverse DNS (Recommended for Production)¶
Ensure that proper reverse DNS (PTR) records are configured for all servers in your deployment, particularly the RabbitMQ server. This should be done at the network/DNS infrastructure level. See the System Requirements section for DNS configuration requirements.
To verify reverse DNS is working correctly on the RabbitMQ server:
# Check what hostname Ansible will use (this is exactly what the deployment uses)
# Note: Replace /opt/vor with your actual VOR installation path if different
/opt/vor/venv/bin/python -c "import socket; print(socket.getfqdn())"
The returned hostname must match the hostname used in the RabbitMQ server's TLS certificate.
After configuring proper reverse DNS, you must re-run the Ansible deployment playbook for the configuration changes to take effect. This will update the RabbitMQ hostname configuration in both Consul and Vault.
Note
The paths shown in this document use /opt/vor as the installation
directory, which is the default value for vor_root. Your actual
installation path may differ based on your hosts.ini configuration.
Option 2: Explicitly Set RabbitMQ Hostname in Inventory File¶
If configuring reverse DNS is not immediately possible, you can override the
default behavior by explicitly setting the rabbitmq_host variable in your
Ansible inventory file (hosts.ini). This bypasses the reverse DNS lookup and
uses the specified hostname directly.
- Edit your
hosts.inifile -
In the
[all:vars]section, add or modify therabbitmq_hostvariable:[all:vars] # ... other variables ... # Explicitly set the RabbitMQ hostname to bypass reverse DNS lookup # This hostname must match the CN or SAN in the RabbitMQ TLS certificate rabbitmq_host=rabbitmq-server.example.com # ... other variables ...Where
rabbitmq-server.example.comshould be replaced with the actual hostname that matches your RabbitMQ server's TLS certificate. -
Re-run the Ansible deployment to apply the configuration change. This will update the RabbitMQ hostname configuration in both Consul and Vault.
See the Inventory File documentation for
more details on the rabbitmq_host variable.
Verification¶
After applying either solution, verify the fix by:
-
Inspect the RabbitMQ certificate to see what hostnames it contains:
# Note: Port 5672 is used here as an example, but the actual port may differ # based on your installation configuration. Check your hosts.ini or deployment # documentation for the correct RabbitMQ port. openssl s_client -connect <rabbitmq-host>:5672 -showcerts < /dev/null 2>/dev/null | openssl x509 -text | grep -A1 "Subject:"This will show you the Common Name (CN) in the certificate's Subject field, which should match the hostname you're using to connect to RabbitMQ.
-
Checking the configured RabbitMQ hostname in Consul:
# View the RabbitMQ hostname as registered in Consul vor show connections --dashboard --staticIn the output, find the
rabbitmqservice and confirm that its address matches the correct hostname. For more details about the Consul dashboard, see Service Monitoring. -
Testing that Vault can successfully connect to RabbitMQ (requires root authentication):
Authentication Required
This command requires authentication to Vault as the root user or a user with appropriate permissions to read from the RabbitMQ secrets engine. The authentication method will vary based on your deployment configuration.
# Test credential generation vault read rabbitmq/creds/vor-midtier-roleA successful command will output a new lease ID, username, and password. A failure will likely result in an error message similar to the one described in the 'Problem Description' section.
-
Running a test job through VOR:
vor run <job-name> -
Checking the Super service logs for successful RabbitMQ connections:
# Replace /opt/vor with your actual VOR installation path if different tail -f /opt/vor/log/super.log
You should no longer see TLS certificate errors or credential generation failures.
Prevention¶
To prevent this issue in future deployments:
-
Always verify DNS configuration before deployment:
- Ensure both forward (A/AAAA) and reverse (PTR) DNS records are configured
- Test DNS resolution from all servers in the deployment using Python's
socket.getfqdn()function
-
Document your infrastructure: Keep records of:
- All server hostnames and IP addresses
- DNS configuration requirements
- TLS certificate details (CN, SANs, expiration dates)
-
Use configuration management: If reverse DNS cannot be guaranteed in your environment, standardize on using the
rabbitmq_hostvariable in your inventory files -
Test connectivity during deployment by including verification steps in your deployment runbook