Skip to content

Deployment Troubleshooting

This section covers common issues encountered during VOR Stream deployment and their solutions.

RabbitMQ Connection Failures with TLS Certificate Errors

Problem Description

After completing a VOR Stream deployment, users may encounter the following error when attempting to run a job. This error typically appears immediately after deployment when attempting to run your first job:

FATAL Process "example_model_run" failed to run: Job Failed: unable to generate credentials: cannot read secret "rabbitmq/creds/vor-midtier-role" in Vault--secret does not exist or you do not have permission to view it: Error making API request.

URL: GET https://<hostname>:8200/v1/rabbitmq/creds/vor-midtier-role
Code: 500. Errors:

* 1 error occurred:
    * failed to create a new user with the generated credentials

Additionally, the RabbitMQ logs will show repeated TLS certificate errors:

TLS server: In state certify received CLIENT ALERT: Fatal - Bad Certificate

The Super service logs will contain:

level=error msg="unable to generate credentials: cannot read secret \"rabbitmq/creds/vor-midtier-role\" in Vault--secret does not exist or you do not have permission to view it"

Root Cause

This issue occurs when the reverse DNS (PTR) records are not properly configured for the RabbitMQ server, which is a required system configuration. During deployment, Ansible uses the ansible_fqdn variable (which performs a reverse DNS lookup) to determine the RabbitMQ hostname. This hostname is then statically configured in:

  1. Consul service registration - The RabbitMQ service address registered in Consul
  2. Vault secrets engine - The RabbitMQ connection URL configured in Vault for dynamic credential generation

If the reverse DNS lookup returns an incorrect hostname or fails, the configured hostname won't match the hostname in the RabbitMQ server's TLS certificate, causing certificate validation failures.

The failure sequence is:

  1. Ansible deployment uses ansible_fqdn (which internally calls Python's socket.getfqdn()) to get the RabbitMQ hostname
  2. This hostname is configured in Consul and Vault during deployment
  3. When VOR attempts to generate RabbitMQ credentials through Vault, Vault uses the configured hostname to connect to RabbitMQ
  4. The TLS handshake fails because the certificate's Common Name (CN) or Subject Alternative Name (SAN) doesn't match the configured hostname
  5. Vault cannot establish a connection to RabbitMQ to create the dynamic user
  6. The job fails with the credential generation error

Solutions

There are two approaches to resolve this issue:

Ensure that proper reverse DNS (PTR) records are configured for all servers in your deployment, particularly the RabbitMQ server. This should be done at the network/DNS infrastructure level. See the System Requirements section for DNS configuration requirements.

To verify reverse DNS is working correctly on the RabbitMQ server:

# Check what hostname Ansible will use (this is exactly what the deployment uses)
# Note: Replace /opt/vor with your actual VOR installation path if different
/opt/vor/venv/bin/python -c "import socket; print(socket.getfqdn())"

The returned hostname must match the hostname used in the RabbitMQ server's TLS certificate.

After configuring proper reverse DNS, you must re-run the Ansible deployment playbook for the configuration changes to take effect. This will update the RabbitMQ hostname configuration in both Consul and Vault.

Note

The paths shown in this document use /opt/vor as the installation directory, which is the default value for vor_root. Your actual installation path may differ based on your hosts.ini configuration.

Option 2: Explicitly Set RabbitMQ Hostname in Inventory File

If configuring reverse DNS is not immediately possible, you can override the default behavior by explicitly setting the rabbitmq_host variable in your Ansible inventory file (hosts.ini). This bypasses the reverse DNS lookup and uses the specified hostname directly.

  1. Edit your hosts.ini file
  2. In the [all:vars] section, add or modify the rabbitmq_host variable:

    [all:vars]
    # ... other variables ...
    
    # Explicitly set the RabbitMQ hostname to bypass reverse DNS lookup
    # This hostname must match the CN or SAN in the RabbitMQ TLS certificate
    rabbitmq_host=rabbitmq-server.example.com
    
    # ... other variables ...
    

    Where rabbitmq-server.example.com should be replaced with the actual hostname that matches your RabbitMQ server's TLS certificate.

  3. Re-run the Ansible deployment to apply the configuration change. This will update the RabbitMQ hostname configuration in both Consul and Vault.

See the Inventory File documentation for more details on the rabbitmq_host variable.

Verification

After applying either solution, verify the fix by:

  1. Inspect the RabbitMQ certificate to see what hostnames it contains:

    # Note: Port 5672 is used here as an example, but the actual port may differ
    # based on your installation configuration. Check your hosts.ini or deployment
    # documentation for the correct RabbitMQ port.
    openssl s_client -connect <rabbitmq-host>:5672 -showcerts < /dev/null 2>/dev/null | openssl x509 -text | grep -A1 "Subject:"
    

    This will show you the Common Name (CN) in the certificate's Subject field, which should match the hostname you're using to connect to RabbitMQ.

  2. Checking the configured RabbitMQ hostname in Consul:

    # View the RabbitMQ hostname as registered in Consul
    vor show connections --dashboard --static
    

    In the output, find the rabbitmq service and confirm that its address matches the correct hostname. For more details about the Consul dashboard, see Service Monitoring.

  3. Testing that Vault can successfully connect to RabbitMQ (requires root authentication):

    Authentication Required

    This command requires authentication to Vault as the root user or a user with appropriate permissions to read from the RabbitMQ secrets engine. The authentication method will vary based on your deployment configuration.

    # Test credential generation
    vault read rabbitmq/creds/vor-midtier-role
    

    A successful command will output a new lease ID, username, and password. A failure will likely result in an error message similar to the one described in the 'Problem Description' section.

  4. Running a test job through VOR:

    vor run <job-name>
    
  5. Checking the Super service logs for successful RabbitMQ connections:

    # Replace /opt/vor with your actual VOR installation path if different
    tail -f /opt/vor/log/super.log
    

You should no longer see TLS certificate errors or credential generation failures.

Prevention

To prevent this issue in future deployments:

  1. Always verify DNS configuration before deployment:

    • Ensure both forward (A/AAAA) and reverse (PTR) DNS records are configured
    • Test DNS resolution from all servers in the deployment using Python's socket.getfqdn() function
  2. Document your infrastructure: Keep records of:

    • All server hostnames and IP addresses
    • DNS configuration requirements
    • TLS certificate details (CN, SANs, expiration dates)
  3. Use configuration management: If reverse DNS cannot be guaranteed in your environment, standardize on using the rabbitmq_host variable in your inventory files

  4. Test connectivity during deployment by including verification steps in your deployment runbook