Troubleshooting Guide Overview
This section provides comprehensive troubleshooting guides for common issues encountered when deploying and running OpenDSO applications.
Troubleshooting Sections
Docker & Containerization
Troubleshooting for Docker and Docker Compose issues, including:
- Container build failures
- Runtime errors
- Network connectivity between containers
- Volume and storage issues
- Image management
- Resource constraints
Use this guide when:
- Building custom Docker images
- Containers won't start or crash
- Services can't communicate
- Docker Compose commands fail
Local Deployment
Troubleshooting for local test deployments on development machines, including:
- Certificate generation issues (mkcert)
- Host file configuration
- Port conflicts
- Browser certificate warnings
- Service accessibility on localhost
- Resource allocation
Use this guide when:
- Setting up a local test environment
- Running OpenDSO on your laptop/workstation
- Testing before production deployment
- Training and demonstration setups
Production Deployment (AWS/Lab/Field)
Troubleshooting for production deployments on AWS EC2, including:
- Terraform deployment failures
- EC2 instance access issues
- Security group configuration
- Certificate generation (Let's Encrypt)
- DNS configuration
- VPN connectivity
- Infrastructure drift
- Multi-environment management
Use this guide when:
- Deploying to AWS with Terraform
- Managing Lab or Field environments
- Production infrastructure issues
- Certificate renewal problems
General Troubleshooting Approach
When encountering issues, follow this systematic approach:
1. Identify the Problem Layer
Determine which layer is causing the issue:
User/Browser
↓
DNS/Network
↓
Reverse Proxy/Load Balancer
↓
Docker Containers
↓
Application Code
↓
Message Bus (NATS)
↓
Database (MongoDB, Postgress, or Sqlite)
↓
Infrastructure (AWS/Local)
2. Gather Information
Collect relevant diagnostic information:
# Check container status
docker ps -a
# View container logs
docker compose logs <service-name> --tail=100
# Check system resources
docker stats
df -h
free -h
# Check network connectivity
docker network ls
docker network inspect opendso_test
# View Terraform state (production)
terraform show
# Check DNS resolution
dig {domain_name}
3. Common Diagnostic Commands
Container Diagnostics
# Inspect container
docker inspect <container-name>
# Execute command in container
docker exec -it <container-name> /bin/sh
# Check container logs
docker logs <container-name> -f
# Restart container
docker restart <container-name>
Network Diagnostics
# Test connectivity between containers
docker exec <container-name> ping nats
docker exec <container-name> curl http://api:3000/health
# Check listening ports
netstat -tulpn | grep LISTEN
ss -tulpn | grep LISTEN
System Diagnostics
# Check disk space
df -h
du -sh /var/lib/docker/*
# Check memory
free -h
cat /proc/meminfo
# Check CPU
top
htop
4. Review Logs
Logs are your best source of information:
Container Logs:
# All services
docker compose logs -f
# Specific service
docker compose logs -f api
# With timestamps
docker compose logs -f --timestamps api
# Last N lines
docker compose logs --tail=50 api
System Logs:
# Docker service
sudo journalctl -u docker -f
# System messages
sudo journalctl -f
# Kernel messages
dmesg -T
Application Logs:
# Output directory (production)
ls -la ~/output/
tail -f ~/output/<service-name>/app.log
5. Isolate the Issue
Test components individually:
# Test specific profile
./run.sh -p nats -c
./run.sh -p api -c
# Test service independently
docker run --rm -it <image-name> /bin/sh
# Test network connectivity
docker run --network opendso_test --rm alpine ping nats
6. Check Configuration
Verify configuration files:
# Environment variables
cat config/docker/.env
# Docker Compose syntax
docker compose config
# OpenFMB adapter configs
cat models/circuit_name/recloser/Z492R.yaml
Getting Help
If you can't resolve the issue:
- Check existing documentation - Review relevant troubleshooting sections
- Gather diagnostics - Collect logs, configuration, and error messages
- Search for similar issues - Check if others have encountered this
- Contact support - Reach out to the OpenDSO software team with:
- Detailed description of the issue
- Steps to reproduce
- Relevant logs and error messages
- Environment information (local/lab/field)
- What you've already tried
Common Issues Quick Reference
| Symptom | Possible Cause | Guide |
|---|---|---|
| Containers won't start | Docker resource limits, port conflicts | Docker |
| Can't access web UI | Certificate issues, hosts file, DNS | Local |
| Terraform fails | AWS permissions, network issues | Production |
| Services can't communicate | Docker network, NATS connection | Docker |
| Certificate errors | mkcert not installed, Let's Encrypt issues | Local / Production |
| SSH connection refused | Security groups, VPN, key permissions | Production |
| Out of disk space | Docker images/volumes, logs | Docker |
| Port already in use | Conflicting services, previous deployment | Local |
Prevention Best Practices
Avoid common issues by following these practices:
- Version Control - Track configuration changes in git
- Documentation - Document custom configurations and modifications
- Regular Backups - Backup MongoDB and critical data regularly
- Monitoring - Monitor resource usage and logs
- Testing - Test in local/lab before deploying to field/production
- State Management - Keep Terraform state synchronized with actual infrastructure
- Clean Up - Regularly clean up unused Docker resources
- Updates - Keep Docker, Docker Compose, and system packages updated
Next Steps
Choose the appropriate troubleshooting guide based on your deployment type: