Skip to main content

Troubleshooting Guide Overview

This section provides comprehensive troubleshooting guides for common issues encountered when deploying and running OpenDSO applications.

Troubleshooting Sections

Docker & Containerization

Troubleshooting for Docker and Docker Compose issues, including:

  • Container build failures
  • Runtime errors
  • Network connectivity between containers
  • Volume and storage issues
  • Image management
  • Resource constraints

Use this guide when:

  • Building custom Docker images
  • Containers won't start or crash
  • Services can't communicate
  • Docker Compose commands fail

Local Deployment

Troubleshooting for local test deployments on development machines, including:

  • Certificate generation issues (mkcert)
  • Host file configuration
  • Port conflicts
  • Browser certificate warnings
  • Service accessibility on localhost
  • Resource allocation

Use this guide when:

  • Setting up a local test environment
  • Running OpenDSO on your laptop/workstation
  • Testing before production deployment
  • Training and demonstration setups

Production Deployment (AWS/Lab/Field)

Troubleshooting for production deployments on AWS EC2, including:

  • Terraform deployment failures
  • EC2 instance access issues
  • Security group configuration
  • Certificate generation (Let's Encrypt)
  • DNS configuration
  • VPN connectivity
  • Infrastructure drift
  • Multi-environment management

Use this guide when:

  • Deploying to AWS with Terraform
  • Managing Lab or Field environments
  • Production infrastructure issues
  • Certificate renewal problems

General Troubleshooting Approach

When encountering issues, follow this systematic approach:

1. Identify the Problem Layer

Determine which layer is causing the issue:

User/Browser

DNS/Network

Reverse Proxy/Load Balancer

Docker Containers

Application Code

Message Bus (NATS)

Database (MongoDB, Postgress, or Sqlite)

Infrastructure (AWS/Local)

2. Gather Information

Collect relevant diagnostic information:

# Check container status
docker ps -a

# View container logs
docker compose logs <service-name> --tail=100

# Check system resources
docker stats
df -h
free -h

# Check network connectivity
docker network ls
docker network inspect opendso_test

# View Terraform state (production)
terraform show

# Check DNS resolution
dig {domain_name}

3. Common Diagnostic Commands

Container Diagnostics

# Inspect container
docker inspect <container-name>

# Execute command in container
docker exec -it <container-name> /bin/sh

# Check container logs
docker logs <container-name> -f

# Restart container
docker restart <container-name>

Network Diagnostics

# Test connectivity between containers
docker exec <container-name> ping nats
docker exec <container-name> curl http://api:3000/health

# Check listening ports
netstat -tulpn | grep LISTEN
ss -tulpn | grep LISTEN

System Diagnostics

# Check disk space
df -h
du -sh /var/lib/docker/*

# Check memory
free -h
cat /proc/meminfo

# Check CPU
top
htop

4. Review Logs

Logs are your best source of information:

Container Logs:

# All services
docker compose logs -f

# Specific service
docker compose logs -f api

# With timestamps
docker compose logs -f --timestamps api

# Last N lines
docker compose logs --tail=50 api

System Logs:

# Docker service
sudo journalctl -u docker -f

# System messages
sudo journalctl -f

# Kernel messages
dmesg -T

Application Logs:

# Output directory (production)
ls -la ~/output/
tail -f ~/output/<service-name>/app.log

5. Isolate the Issue

Test components individually:

# Test specific profile
./run.sh -p nats -c
./run.sh -p api -c

# Test service independently
docker run --rm -it <image-name> /bin/sh

# Test network connectivity
docker run --network opendso_test --rm alpine ping nats

6. Check Configuration

Verify configuration files:

# Environment variables
cat config/docker/.env

# Docker Compose syntax
docker compose config

# OpenFMB adapter configs
cat models/circuit_name/recloser/Z492R.yaml

Getting Help

If you can't resolve the issue:

  1. Check existing documentation - Review relevant troubleshooting sections
  2. Gather diagnostics - Collect logs, configuration, and error messages
  3. Search for similar issues - Check if others have encountered this
  4. Contact support - Reach out to the OpenDSO software team with:
    • Detailed description of the issue
    • Steps to reproduce
    • Relevant logs and error messages
    • Environment information (local/lab/field)
    • What you've already tried

Common Issues Quick Reference

SymptomPossible CauseGuide
Containers won't startDocker resource limits, port conflictsDocker
Can't access web UICertificate issues, hosts file, DNSLocal
Terraform failsAWS permissions, network issuesProduction
Services can't communicateDocker network, NATS connectionDocker
Certificate errorsmkcert not installed, Let's Encrypt issuesLocal / Production
SSH connection refusedSecurity groups, VPN, key permissionsProduction
Out of disk spaceDocker images/volumes, logsDocker
Port already in useConflicting services, previous deploymentLocal

Prevention Best Practices

Avoid common issues by following these practices:

  1. Version Control - Track configuration changes in git
  2. Documentation - Document custom configurations and modifications
  3. Regular Backups - Backup MongoDB and critical data regularly
  4. Monitoring - Monitor resource usage and logs
  5. Testing - Test in local/lab before deploying to field/production
  6. State Management - Keep Terraform state synchronized with actual infrastructure
  7. Clean Up - Regularly clean up unused Docker resources
  8. Updates - Keep Docker, Docker Compose, and system packages updated

Next Steps

Choose the appropriate troubleshooting guide based on your deployment type: