Skip to main content

Production Deployment on AWS EC2

This guide provides step-by-step instructions for deploying OpenDSO in a production environment using AWS EC2 instances with Terraform Infrastructure as Code (IaC). This will review some steps that exist in the Local Deployment page, but will delve deeper into common IaC structures and how your team and OES has deployed OpenDSO in your production environment.

Overview

The production deployment uses Terraform to automate the provisioning and configuration of AWS infrastructure. The deployment:

  • Provisions EC2 instances with Amazon Linux
  • Configures security groups and networking
  • Automatically installs Docker and dependencies on the EC2 instances
  • Downloads versioned release archives (zips) of opendso, config, and models from OES's GitHub releases via setup-opendso.sh
  • Sets up TLS certificates using Let's Encrypt with DNS-01 challenge
  • Supports multiple environments (Lab/Test and Field)

Repository Overview

The Terraform IaC projects are usually named as: \{client_project\}-iac

Key Files:

  • main.tf - Main Terraform configuration defining AWS resources
  • variables.tf - Input variable definitions
  • outputs.tf - Output definitions (SSH key, IP address)
  • run.sh - Helper script for Terraform operations
  • *.tfvars - Environment-specific variable files
  • assets/ - Provisioning scripts and configuration

Environment Files:

  • \{environment_name\}.tfvars

Prerequisites

Local Machine Requirements

  • Terraform >= 1.2.0
  • AWS CLI v2 (configured with credentials)
  • GitHub Personal Access Token (for downloading releases)
  • SSH client
  • git

Install Terraform

# Linux (Ubuntu/Debian)
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform

# macOS (Homebrew)
brew tap hashicorp/tap
brew install hashicorp/tap/terraform

# Verify installation
terraform --version

Configure AWS CLI

# Install AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

# Configure credentials
aws configure
# Enter: AWS Access Key ID, Secret Access Key, Region, Output format

GitHub Personal Access Token

Create a GitHub PAT with repo scope for downloading OpenDSO releases:

  1. Go to GitHub Settings → Developer settings → Personal access tokens → Tokens (classic)
  2. Click "Generate new token (classic)"
  3. Select scope: repo (Full control of private repositories)
  4. Generate and save the token securely

AWS Infrastructure Overview

The usual Terraform configuration creates the following AWS resources. Note: The actual deployed infrastructure may differ from the Terraform configuration due to manual modifications.

EC2 Instance (Actual Deployed Configuration)

  • AMI: Amazon Linux
  • Instance Type: t2.large (2 vCPU, 8GB RAM)
  • Root Volume: 8GB or more EBS
  • Connection: SSH via generated RSA key pair (4096-bit RSA)
  • Tags: Name=OpenDSO, CreatedBy=terraform, autoShutdown=Off

Security Group

  • Name: oes-sg
  • Ingress Rules (from actual EC2 console):
    • Port 22 (SSH) from 0.0.0.0/0
    • Port 443 (HTTPS) from 0.0.0.0/0
    • Port 8080 (HTTP Alt) from 0.0.0.0/0
    • Port 3389 (RDP) from 0.0.0.0/0
  • Egress: All traffic to 0.0.0.0/0 (IPv4) and ::/0 (IPv6)

SSH Key Pair

  • Algorithm: RSA 4096-bit
  • Generated by Terraform: Automatic key generation
  • Output: Private key should be saved to ~/.ssh/tf_id_rsa.pem

Network Configuration

  • Uses existing VPC and subnet (specified in tfvars)
  • Private IP address assigned from subnet
  • Requires VPN or jump host for access

Deployment Process

Step 1: Clone the IaC Repository

git clone https://github.com/openenergysolutions/\{client_project\}-iac.git

Step 2: Review Environment Configuration

Choose the appropriate environment file:

OES Test Example Environment:

cat oes-test.tfvars
aws_region    = "example"
aws_subnet_id = "example
aws_vpc_id = "example"
sshkey_name = "tf_id_rsa"

Step 3: Review Deployment Configuration

The config.json file defines which GitHub repositories setup-opendso.sh will download release archives from, and what local directory each archive should be unzipped into. On a deployed host, both config.json and setup-opendso.sh live in the home directory next to the unzipped opendso/, config/, and models/ folders:

cat assets/config.json
{
"organization": "openenergysolutions",
"releases": [
{
"repositoryName": "opendso-docker-compose",
"displayName": "opendso"
},
{
"repositoryName": "\{client_project\}-docker-compose",
"displayName": "models"
},
{
"repositoryName": "\{client_project\}-config-docker-compose",
"displayName": "config"
}
]
}

How this is used at deploy time:

For each entry, setup-opendso.sh calls https://api.github.com/repos/{organization}/{repositoryName}/releases/latest to find the most recent published release, downloads its first attached asset (the config.zip, models.zip, or opendso.zip produced by each repo's tag-archive workflow), and unzips it into ~/{displayName}/. See Versioned Release Archives for the workflow that publishes those zips.

Important: setup-opendso.sh always pulls the latest release. To roll a deployment forward you push a new git tag on the config or models repo (which triggers the release workflow) and then re-run setup-opendso.sh on the host. See Tagging Versioned Updates below.

Step 4: Deploy Infrastructure

Use the run.sh helper script to deploy:

# Deploy to environment
./run.sh -i ./oes-test.tfvars -p -s

Script Options:

  • -i <file> - Specify tfvars file (default: dev.tfvars)
  • -p - Prompt for GitHub PAT (Personal Access Token)
  • -s - Setup (create and provision infrastructure)
  • -t - Teardown (destroy infrastructure)
  • -h - Display help

What Happens During Deployment:

  1. Terraform Initialization

    • Downloads AWS provider plugins
    • Initializes backend
  2. Infrastructure Creation

    • Creates security group with SSH and HTTPS rules
    • Generates SSH key pair (RSA 4096-bit)
    • Provisions EC2 instance
  3. Automatic Provisioning

    • Copies provisioning scripts to EC2 instance
    • Runs init.sh - Installs Docker, Docker Compose, golang
    • Runs setup-opendso.sh - Downloads the latest tagged release archives (opendso.zip, config.zip, models.zip) from GitHub and unzips them into ~/opendso/, ~/config/, and ~/models/
  4. Output

    • SSH private key saved to ~/.ssh/tf_id_rsa.pem
    • EC2 instance private IP displayed

Step 5: Access the Instance

The deployment outputs the EC2 private IP address. Since the instance is in a private subnet, you need VPN or jump host access:

# View the private IP (output is marked sensitive)
terraform output -raw apphost_ip

# Or extract from terraform output
PRIVATE_IP=$(terraform output -raw apphost_ip)
echo $PRIVATE_IP

# SSH to the instance (requires VPN connection to the VPC)
ssh -i ~/.ssh/tf_id_rsa.pem ec2-user@$PRIVATE_IP

Note: The apphost_ip output is marked as sensitive. To view it, use the -raw flag or terraform show.

Post-Deployment Configuration

Once connected to the EC2 instance, complete the setup:

Verify Installation

# Check Docker installation
docker --version
docker compose version

# Check downloaded repositories
ls -la ~/
# Should see: opendso/, models/, config/ directories

Configure DNS

The deployment uses Let's Encrypt with DNS-01 challenge for TLS certificates. The domain is configured in setup-certs.sh:

Update DNS records to point to your instance:

  1. Create A record for \{client_address_project_name\}.oesinc.dev pointing to instance IP (or jump host IP)
  2. Create wildcard A record for *.\{client_address_project_name\}.oesinc.dev pointing to same IP

Set Up Certificates

The setup-certs.sh script uses certbot with Google Cloud DNS for Let's Encrypt certificates.

Prerequisites:

  1. Google Cloud DNS credentials file at ~/.secrets/
  2. Domain delegation to Google Cloud DNS

Run Certificate Setup:

# Install certbot and Google DNS plugin
sudo yum install -y certbot python3-certbot-dns-google

# Place Google Cloud credentials
mkdir -p ~/.secrets
# Upload your Google Cloud DNS credentials JSON file

# Run certificate generation
chmod +x setup-certs.sh
./setup-certs.sh

What This Does:

  • Generates Let's Encrypt certificates using DNS-01 challenge
  • Creates certificates for both domains
  • Copies certificates to ~/certs/ directory

Generated Certificates:

~/certs/
├── fullchain.pem # Full certificate chain
├── server-cert.pem # Server certificate
├── server-key.pem # Private key
├── chain.pem # Intermediate chain
├── rootCA.pem # Root CA
├── \{other_certs\}.pem # Depending on the client deployment, you may generate additional certs

Configure Database Credentials

Before deploying services, configure database usernames and passwords for the various OpenDSO services. Database credentials are typically stored in environment files within the config repositories.

Common Database Credential Locations:

  1. GMS API MongoDB Credentials - config/gms-api/env/production.env:

    MONGODB_DOMAIN = mongodb:27017
    MONGODB_DBNAME = settings_api
    MONGODB_USERNAME = SettingsAPIUser
    MONGODB_PASSWORD = your_secure_password
  2. Docker Environment Variables - config/docker/.env:

    # PostgreSQL credentials for various services
    CITUS_PGUSER="postgres"
    CITUS_PGPASSWORD="your_secure_password"

    HISTORIAN_PGUSER="historian"
    HISTORIAN_PGPASSWORD="your_secure_password"

    DER_DISPATCH_PGUSER="opendso"
    DER_DISPATCH_PGPASSWORD="your_secure_password"

    # Keycloak admin credentials
    KEYCLOAK_ADMIN="admin"
    KEYCLOAK_PASSWORD="your_secure_password"

Security Best Practices:

  • Change Default Passwords: Always replace default passwords before production deployment
  • Use Strong Passwords: Generate complex passwords with sufficient entropy
  • Restrict File Permissions: Limit access to environment files containing credentials
    chmod 600 ~/config/gms-api/env/production.env
    chmod 600 ~/config/docker/.env
  • Consider Secrets Management: For enhanced security, consider using AWS Secrets Manager or similar services instead of plain-text environment files
  • Avoid Version Control: Ensure .env files with real credentials are in .gitignore

Note: The environment files shown here contain example credentials from project templates. Update these with your actual secure credentials appropriate for your production environment.

Deploy OpenDSO Services

Navigate to the OpenDSO orchestration directory and deploy:

cd ~/opendso/opendso-docker-compose

# View available profiles
./run.sh -l

# Deploy all services
./run.sh -p all -c

# Verify deployment
docker ps

Verify Deployment

Access services via your domain, for example:

Terraform Operations

Manual Terraform Commands

If you prefer direct Terraform commands over the run.sh script:

# Initialize Terraform
terraform init

# Plan changes
terraform plan -var-file="./oes-test.tfvars" -var="github_token=$GITHUB_TOKEN"

# Apply changes
terraform apply -var-file="./oes-test.tfvars" -var="github_token=$GITHUB_TOKEN" -auto-approve

# Show outputs
terraform output

# Destroy infrastructure (careful!)
terraform destroy -var-file="./oes-test.tfvars" -var="github_token=$GITHUB_TOKEN" -auto-approve

State Management

Terraform state is stored locally by default.

Updating Deployments

Update OpenDSO Components

Updates roll forward by pushing a new git tag on the config and/or models repo (which triggers the GitHub Actions release workflow described in Tagging Versioned Updates) and then re-running setup-opendso.sh on the host so it picks up the new latest release:

# SSH to instance
ssh -i ~/.ssh/tf_id_rsa.pem ec2-user@<PRIVATE_IP>

# Stop services
cd ~/opendso/opendso-docker-compose
./run.sh -p all -d

# Re-run setup to download the latest tagged release archives
# (config.zip / models.zip / opendso.zip from each repo's GitHub releases)
export GITHUB_TOKEN="your-token"
cd ~
./setup-opendso.sh

# Restart services
cd ~/opendso/opendso-docker-compose
./run.sh -p all -c

Note: setup-opendso.sh always pulls the latest GitHub release for each repo. Make sure the tag you intend to deploy is the most recent published release on the config and models repos before running this; otherwise the host will pick up whichever tag is most recent.

Update Terraform Configuration

# From the IaC git directory
git pull

# Review changes
terraform plan -var-file="./oes-test.tfvars" -var="github_token=$GITHUB_TOKEN"

# Apply updates
terraform apply -var-file="./oes-test.tfvars" -var="github_token=$GITHUB_TOKEN"

Tagging Versioned Updates

A client deployment is "versioned" by which git tags are the most recent published releases on the config, models, and opendso repositories. The deployment host pulls the latest release of each via setup-opendso.sh. To roll forward (or back) you cut a new tag, let the GitHub Actions workflow build and publish the archive, then re-run setup-opendso.sh on the host.

When to Tag

Tag a new version any time you have a deployable change to one of these repos:

  • \{client_repo_name\}-config-docker-compose — environment variables, image tags in docker/.env, service config files. Most client-driven changes (image version bumps, credential updates, new service tunables) live here.
  • \{client_repo_name\}-docker-compose (the models repo) — OpenFMB adapter mappings for the client's field devices. Tag whenever adapter coverage or configuration changes.
  • opendso-docker-compose — base orchestration. Usually tagged by the OpenDSO platform team, not the client team.

Tagging Convention

Use SemVer-style tags (e.g. 1.4.0, 1.4.1). The workflow accepts any tag pattern (tags: '*'), but staying consistent makes the GitHub release list and the .version file dropped into each archive readable.

# From a clean checkout of the repo you want to release
git checkout main
git pull

# Create and push the tag
git tag 1.4.0
git push origin 1.4.0

What Happens on Push

The .github/workflows/main.yaml in the config and models repos triggers on the tag push and:

  1. Checks out the tagged commit with full history.
  2. Runs GitVersion to derive a version.
  3. Writes the tag name into a .version file at the repo root so the unpacked archive on the deployment host carries the version stamp (cat ~/config/.version will show the tag a deployed host is running).
  4. Zips the working tree (excluding .git*, .vscode/*, .editorconfig) into config.zip (config repo) or models.zip (models repo).
  5. Creates a GitHub release for the tag and attaches the zip as a release asset.

You can confirm a release published correctly by visiting the repo's Releases page on GitHub and verifying the new tag has a config.zip or models.zip attached.

Deploying the New Tag

Once the workflow has finished and the new tag is the most recent release on its repo, deploy it to the host:

ssh -i ~/.ssh/tf_id_rsa.pem ec2-user@<PRIVATE_IP>

# Stop services so the archive contents can be replaced cleanly
cd ~/opendso/opendso-docker-compose
./run.sh -p all -d

# Re-run setup-opendso.sh — this will fetch the *latest* release of each repo
# listed in config.json and overwrite ~/opendso/, ~/config/, and ~/models/
export GITHUB_TOKEN="your-token"
cd ~
./setup-opendso.sh

# Confirm the deployed version stamp
cat ~/config/.version
cat ~/models/.version

# Bring services back up
cd ~/opendso/opendso-docker-compose
./run.sh -p all -c

Notes and gotchas:

  • setup-opendso.sh always pulls releases/latest. There is no per-tag pinning in config.json. If you need to roll back, the supported path is to either re-publish an older tag as the latest release on GitHub, or to manually download and unzip the older release asset on the host.
  • Pushing a tag that already exists will not re-run the workflow. Delete the tag remotely (git push --delete origin 1.4.0) and the GitHub release before re-tagging if you need to rebuild an archive at the same version.
  • The workflow only runs on tag pushes. Pushing commits to main does not publish a new archive — nothing changes for deployed hosts until a tag is cut.
  • unzip -o overwrites files in place but does not delete files that were removed in the new tag. If a release removes a file, manually delete it from ~/config/ or ~/models/ (or wipe the directory before running setup-opendso.sh).

For troubleshooting failed workflows, missing release assets, or setup-opendso.sh errors, see Production Deployment Troubleshooting → Release Archive and setup-opendso.sh Issues.

Provisioning Scripts Explained

init.sh

Installs system dependencies and Docker:

  • Updates system packages
  • Installs unzip, golang, nss-tools, docker
  • Enables and starts Docker service
  • Installs Docker Compose
  • Adds ec2-user to docker group

setup-opendso.sh

Downloads the versioned OpenDSO release archives from GitHub and unpacks them into the deployment layout that run.sh expects.

What it does:

  • Reads config.json (in the same directory it is run from) to get the GitHub organization and the list of releases to fetch
  • For each entry, calls GET /repos/{organization}/{repositoryName}/releases/latest and reads .assets[0].id (the first attached asset on the latest release) — this is the config.zip, models.zip, or opendso.zip published by each repo's tag-archive workflow
  • Downloads the asset with Accept: application/octet-stream against /repos/{organization}/{repositoryName}/releases/assets/{ASSET_ID} and writes it to {displayName}.zip
  • Runs unzip -o {displayName}.zip -d {displayName} so the archive contents land in ~/opendso/, ~/config/, and ~/models/
  • If a repo has no releases yet, the API returns a null asset id and the script logs Unable to find release for {OWNER}/{REPO} and skips it (the script does not fail the whole deployment)

Requires:

  • GITHUB_TOKEN environment variable with repo scope on the OES org (the script exits with code 5 if it is unset)
  • jq and curl available on the host (installed by init.sh)
  • config.json present in the working directory

Why this matters for upgrades: because the script only ever asks for releases/latest, the way to change what gets deployed is to change which tagged release is the most recent on the config / models / opendso repo — see Tagging Versioned Updates.

setup-certs.sh

Generates TLS certificates using Let's Encrypt:

  • Uses certbot with DNS-01 challenge (Google Cloud DNS)
  • Generates wildcard certificates
  • Copies certificates to ~/certs/ directory
  • Creates OpenADR-specific certificate files
  • Sets appropriate file permissions

Requires: Google Cloud DNS credentials

Monitoring and Maintenance

View Logs

# Docker service logs
sudo journalctl -u docker -f

# Container logs
docker compose logs -f

# System logs
sudo journalctl -f

Monitor Resources

# Docker stats
docker stats

# System resources
htop
df -h
free -h

Database Backup and Restore

OpenDSO uses multiple databases to store different types of data:

  1. MongoDB - Stores GMS API configuration data, user settings, and application state
  2. PostgreSQL (Topology Genesis) - Stores parsed CIM topology data and equipment information
  3. SQLite (OpenADR Service) - Stores OpenADR VEN/VTN registration and event data

Backup MongoDB (GMS API Data)

The run.sh script provides a convenient backup command that uses mongodump to create a binary archive of the MongoDB database:

cd ~/opendso/opendso-docker-compose
./run.sh -b

What this does:

  • Connects to the running MongoDB container
  • Authenticates using credentials from environment variables
  • Creates a binary archive dump of the database
  • Saves the output to db.dump in the current directory

Requirements:

  • MongoDB container must be running
  • Credentials must be properly configured in the environment

Manual Backup (Alternative):

If you need more control over the backup process:

# Backup with custom filename
# Note: Use the database name as authenticationDatabase (typically settings_api)
docker exec mongodb sh -c 'mongodump --authenticationDatabase ${MONGODB_COLLECTION} \
-u ${MONGODB_USERNAME} -p ${MONGODB_PASSWORD} \
--db ${MONGODB_COLLECTION} --archive' > gms-backup-$(date +%Y%m%d-%H%M%S).dump

# Backup to a directory (not archive)
docker exec mongodb sh -c 'mongodump --authenticationDatabase ${MONGODB_COLLECTION} \
-u ${MONGODB_USERNAME} -p ${MONGODB_PASSWORD} \
--db ${MONGODB_COLLECTION} --out /data/backup'

Important: The SettingsAPIUser is created in the settings_api database, not in the admin database. Therefore, use --authenticationDatabase settings_api (or the database name from ${MONGODB_COLLECTION}) instead of --authenticationDatabase admin.

Restore MongoDB Database

To restore from a backup:

cd ~/opendso/opendso-docker-compose
./run.sh -r

Requirements:

  • db.dump file must exist in the current directory
  • MongoDB container must be running

Manual Restore (Alternative):

# Restore from custom backup file
# Note: Use the database name as authenticationDatabase (typically settings_api)
docker exec -i mongodb sh -c 'mongorestore --authenticationDatabase ${MONGODB_COLLECTION} \
-u ${MONGODB_USERNAME} -p ${MONGODB_PASSWORD} \
--db ${MONGODB_COLLECTION} --archive' < gms-backup-20240101.dump

# Drop existing database before restore
docker exec -i mongodb sh -c 'mongorestore --authenticationDatabase ${MONGODB_COLLECTION} \
-u ${MONGODB_USERNAME} -p ${MONGODB_PASSWORD} \
--db ${MONGODB_COLLECTION} --drop --archive' < db.dump

Backup PostgreSQL (Topology Genesis Data)

The Topology Genesis service uses PostgreSQL to store parsed CIM files and topology data:

# Backup PostgreSQL database
docker exec postgres sh -c 'pg_dump -U ${POSTGRES_USER} ${POSTGRES_DB}' > topology-backup-$(date +%Y%m%d-%H%M%S).sql

# Compressed backup
docker exec postgres sh -c 'pg_dump -U ${POSTGRES_USER} ${POSTGRES_DB}' | gzip > topology-backup-$(date +%Y%m%d-%H%M%S).sql.gz

Restore PostgreSQL:

# Restore from SQL dump
docker exec -i postgres sh -c 'psql -U ${POSTGRES_USER} ${POSTGRES_DB}' < topology-backup.sql

# Restore from compressed backup
gunzip < topology-backup.sql.gz | docker exec -i postgres sh -c 'psql -U ${POSTGRES_USER} ${POSTGRES_DB}'

Backup OpenADR Service Data

The OpenADR service uses SQLite for VEN registrations and event data:

# Locate and backup SQLite database
docker exec oadr-service sh -c 'sqlite3 /app/data/oadr.db ".backup /tmp/oadr-backup.db"'
docker cp oadr-service:/tmp/oadr-backup.db ./oadr-backup-$(date +%Y%m%d-%H%M%S).db

# Or simply copy the database file
docker cp oadr-service:/app/data/oadr.db ./oadr-backup-$(date +%Y%m%d-%H%M%S).db

Restore OpenADR Database:

# Stop the service first
docker stop oadr-service

# Copy backup to container
docker cp oadr-backup.db oadr-service:/app/data/oadr.db

# Restart service
docker start oadr-service

Complete System Backup Example

Use this script example to help create a backup of OpenDSO database systems:

#!/bin/bash
# Complete backup script

BACKUP_DIR="./backups/$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BACKUP_DIR"

echo "Starting OpenDSO complete backup..."

# Backup MongoDB (GMS API)
echo "Backing up MongoDB..."
./run.sh -b
mv db.dump "$BACKUP_DIR/mongodb-gms.dump"

# Backup PostgreSQL (Topology Genesis)
echo "Backing up PostgreSQL..."
docker exec postgres sh -c 'pg_dump -U ${POSTGRES_USER} ${POSTGRES_DB}' | gzip > "$BACKUP_DIR/postgres-topology.sql.gz"

# Backup OpenADR Service (if running)
if docker ps | grep -q oadr-service; then
echo "Backing up OpenADR Service..."
docker cp oadr-service:/app/data/oadr.db "$BACKUP_DIR/oadr-service.db"
fi

# Backup configuration files
echo "Backing up configuration..."
cp -r ../config "$BACKUP_DIR/config-backup"

# Example copy to S3 (if configured)
# aws s3 cp "$BACKUP_DIR" s3://your-backup-bucket/opendso/backup-$(date +%Y%m%d) --recursive

echo "Backup complete: $BACKUP_DIR"
ls -lh "$BACKUP_DIR"

Certificate Renewal

Let's Encrypt certificates expire after 90 days. Set up automatic renewal:

# Create renewal script
cat > ~/renew-certs.sh <<'EOF'
#!/bin/bash
cd ~
./setup-certs.sh
cd ~/opendso/opendso-docker-compose
./run.sh -p all -d
./run.sh -p all -c
EOF

chmod +x ~/renew-certs.sh

# Add to crontab (run monthly)
crontab -e
# Add: 0 0 1 * * /home/ec2-user/renew-certs.sh

Tearing Down Infrastructure

To destroy the infrastructure:

# Using run.sh script
./run.sh -i ./oes-test.tfvars -p -t

# Or using Terraform directly
terraform destroy -var-file="./oes-test.tfvars" -var="github_token=$GITHUB_TOKEN" -auto-approve

Warning: This permanently deletes the EC2 instance and all data. Ensure backups are taken first.

Troubleshooting

For production deployment issues, see the Production Deployment Troubleshooting Guide.

Common issues covered include:

  • Terraform deployment failures
  • Provisioner script errors
  • EC2 access and SSH issues
  • DNS and certificate problems
  • Infrastructure drift detection and resolution
  • Multi-environment management
  • Emergency procedures and disaster recovery

For Docker and container-specific issues, see the Docker Troubleshooting Guide.

Infrastructure Drift Management

Understanding Drift

Infrastructure drift occurs when the actual deployed infrastructure differs from the Terraform configuration.

Causes of Drift

  1. Manual AWS Console changes after Terraform deployment
  2. EBS volume resizing performed outside Terraform
  3. Security group rule additions for operational needs
  4. Terraform configuration not applied to existing resources

Detecting Drift

Check for drift using Terraform:

# View current state vs configuration
terraform plan -var-file="./oes-test.tfvars" -var="github_token=$GITHUB_TOKEN"

# Refresh state from AWS
terraform refresh -var-file="./oes-test.tfvars" -var="github_token=$GITHUB_TOKEN"

# Show detailed state
terraform show

Resolving Drift

Option 1: Update Terraform to match actual infrastructure

Option 2: Import existing resources into Terraform

Option 3: Accept managed drift

For resources with intentional manual changes, use lifecycle rules:

resource "aws_security_group" "ssh_sg" {
# ... config ...

lifecycle {
ignore_changes = [ingress] # Allow manual ingress rule changes
}
}

Security Best Practices

1. Restrict SSH Access

Update security group to limit SSH to specific IPs:

# In main.tf, modify ingress rule:
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["YOUR_VPN_CIDR"] # Instead of 0.0.0.0/0
}

2. Use Systems Manager Session Manager

Instead of SSH, use AWS Systems Manager:

# Install Session Manager plugin
aws ssm start-session --target i-1234567890abcdef0

3. Secrets Management

Store sensitive data securely:

# Use AWS Secrets Manager for GitHub token
aws secretsmanager create-secret \
--name opendso/github-token \
--secret-string "your-token"

4. Enable CloudWatch Logging

Add CloudWatch agent for centralized logging:

sudo yum install -y amazon-cloudwatch-agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

5. Regular Updates

# System updates
sudo yum update -y

# Docker updates
sudo yum update docker -y
sudo systemctl restart docker

Advanced Configuration

Custom Domain Configuration

To use a different domain, modify setup-certs.sh:

# Edit domain variable
DOMAIN=your-domain.com

Update DNS records accordingly.

Resize Instance

To change instance type (warning: this could destroy existing data, back up everything first):

# In main.tf, modify instance_type:
resource "aws_instance" "app_server" {
instance_type = "t2.xlarge" # Change from t2.large
# ...
}

Then apply changes:

terraform apply -var-file="./oes-test.tfvars" -var="github_token=$GITHUB_TOKEN"

Add Additional Storage

# In main.tf, add additional block device:
resource "aws_instance" "app_server" {
# ... existing config ...

ebs_block_device {
device_name = "/dev/sdf"
volume_size = 100
volume_type = "gp3"
}
}

Next Steps

Support

For production deployment support:

  • Review Terraform plan output before applying changes
  • Contact the OpenDSO software team
  • Consult with your OES support team