Self-hosting on AWS

This guide provides AWS-specific information for deploying Membrane in your AWS environment.

S3 Storage configuration

Create the necessary S3 buckets:

resource "aws_s3_bucket" "tmp" {
  bucket = "${var.environment}-integration-app-tmp"
}

resource "aws_s3_bucket" "connectors" {
  bucket = "${var.environment}-integration-app-connectors"
}

resource "aws_s3_bucket" "static" {
  bucket = "${var.environment}-integration-app-static"
}

# Lifecycle rules for tmp bucket
resource "aws_s3_bucket_lifecycle_configuration" "tmp" {
  bucket = aws_s3_bucket.tmp.id

  rule {
    id     = "cleanup"
    status = "Enabled"

    filter {
      prefix = ""
    }

    expiration {
      days = 7
    }
  }
}

# CORS configuration for static bucket
resource "aws_s3_bucket_cors_configuration" "static" {
  bucket = aws_s3_bucket.static.id

  cors_rule {
    allowed_headers = ["*"]
    allowed_methods = ["GET"]
    allowed_origins = ["*"]
    max_age_seconds = 3000
  }
}

Create CloudFront Distribution:

resource "aws_cloudfront_origin_access_control" "static" {
  name                              = "${var.environment}-static-oac"
  description                       = "OAC for static S3 bucket"
  origin_access_control_origin_type = "s3"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}

resource "aws_cloudfront_distribution" "static" {
  enabled             = true
  default_root_object = "index.html"

  aliases = ["static.${var.environment}.${var.hosted_zone_name}"]

  origin {
    domain_name              = aws_s3_bucket.static.bucket_regional_domain_name
    origin_id                = aws_s3_bucket.static.id
    origin_access_control_id = aws_cloudfront_origin_access_control.static.id
  }

  default_cache_behavior {
    allowed_methods        = ["GET", "HEAD"]
    cached_methods         = ["GET", "HEAD"]
    target_origin_id       = aws_s3_bucket.static.id
    viewer_protocol_policy = "redirect-to-https"
    forwarded_values {
      query_string = false
      cookies {
        forward = "none"
      }
    }
  }

  price_class = "PriceClass_100"
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  viewer_certificate {
    acm_certificate_arn      = aws_acm_certificate.cloudfront.arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.2_2021"
  }

  tags = {
    Service = "core"
  }
}

resource "aws_s3_bucket_policy" "static_cloudfront" {
  bucket = aws_s3_bucket.static.id
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Principal = {
          Service = "cloudfront.amazonaws.com"
        },
        Action   = "s3:GetObject",
        Resource = "${aws_s3_bucket.static.arn}/*",
        Condition = {
          StringEquals = {
            "AWS:SourceArn" = aws_cloudfront_distribution.static.arn
          }
        }
      }
    ]
  })
}

IAM Role Configuration

Membrane containers support AWS IAM role-based access to S3 and other AWS services. This is the preferred method over providing explicit access and secret keys.

Container IAM Configuration

To use IAM roles instead of access keys:

Create an IAM role with appropriate S3 permissions
Assign this role to your ECS tasks, EKS pods, or EC2 instances
Omit the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables

When running in a properly configured AWS environment, the containers will automatically use the IAM role credentials.

MongoDB on AWS

While AWS DocumentDB is technically compatible with our application, we recommend using native MongoDB (either self-hosted on EC2 or a managed MongoDB Atlas cluster) for better compatibility.

Some customers have encountered edge cases with DocumentDB due to differences in MongoDB API implementation. If you choose to use DocumentDB, be prepared for potential compatibility issues.

Redis on AWS

For Redis, you can use Amazon ElastiCache. Keep in mind that Redis is only used as a cache in our application and can be safely restarted or cleared. There's no persistent data stored in Redis that isn't recoverable from other sources.

ElastiCache TLS Configuration

If you're using ElastiCache with in-transit encryption enabled, set the following environment variable:

REDIS_DISABLE_TLS_VERIFICATION=true

This is required because ElastiCache uses self-signed certificates that won't pass standard TLS verification.

EKS Deployment

When deploying Membrane on Amazon EKS, you may encounter startup issues due to DNS resolution delays that are common in Kubernetes environments. This section provides recommended configurations and troubleshooting guidance.

Common Startup Issues

CrashLoopBackOff due to health check failures: During initial deployment or image upgrades, pods may fail health checks because MongoDB or Redis connections timeout before DNS resolution completes.

Symptoms:

Pods entering CrashLoopBackOff state shortly after startup
Health check endpoint returning failures
Logs showing connection timeouts to MongoDB or Redis

Recommended EKS Configuration

Add these environment variables to your deployment to handle slower DNS resolution in Kubernetes:

env:
  # Increase MongoDB server selection timeout (default: 30000ms)
  - name: MONGO_SERVER_SELECTION_TIMEOUT_MS
    value: '60000'

  # Increase Redis connection timeout (default: 10000ms for standalone)
  - name: REDIS_CONNECT_TIMEOUT_MS
    value: '60000'

  # Configure health check retry behavior for transient failures
  - name: HEALTH_CHECK_RETRIES
    value: '5' # Number of retries (default: 3)
  - name: HEALTH_CHECK_RETRY_DELAY_MS
    value: '2000' # Initial delay between retries in ms (default: 1000)
  - name: HEALTH_CHECK_MAX_RETRY_DELAY_MS
    value: '30000' # Maximum delay between retries in ms (default: 10000)

  # Required for ElastiCache with in-transit encryption
  - name: REDIS_DISABLE_TLS_VERIFICATION
    value: 'true'

  # Optional: Skip specific health checks if needed
  # - name: SKIP_HEALTH_CHECKS
  #   value: "mongo,redis"  # Or "all" to skip all checks

Health Check Retry Configuration

Health checks include built-in retry logic with exponential backoff to handle transient network issues common in Kubernetes environments.

Variable	Default	Description
`HEALTH_CHECK_RETRIES`	`3`	Number of retry attempts after initial failure
`HEALTH_CHECK_RETRY_DELAY_MS`	`1000`	Initial delay between retries (milliseconds)
`HEALTH_CHECK_MAX_RETRY_DELAY_MS`	`10000`	Maximum delay between retries (milliseconds)

The retry mechanism uses exponential backoff with jitter:

First retry: ~1 second delay
Second retry: ~2 seconds delay
Third retry: ~4 seconds delay
And so on, up to the maximum delay

For EKS environments with slow DNS resolution, we recommend increasing these values:

env:
  - name: HEALTH_CHECK_RETRIES
    value: '5'
  - name: HEALTH_CHECK_RETRY_DELAY_MS
    value: '2000'
  - name: HEALTH_CHECK_MAX_RETRY_DELAY_MS
    value: '30000'

Health Check Skip Options

If retry logic is insufficient, you can skip specific health checks using the SKIP_HEALTH_CHECKS environment variable:

Value	Description
`all`	Skip all health checks (MongoDB, Redis, storage, custom code)
`mongo`	Skip only MongoDB connectivity check
`redis`	Skip only Redis connectivity check
`storage`	Skip only cloud storage bucket check
`custom_code`	Skip only custom code runner check
`mongo,redis`	Skip multiple checks (comma-separated)

Note: Skipping health checks is recommended only as a last resort. The retry mechanism should handle most transient failures.

Kubernetes Probe Configuration

In addition to the environment variables, consider adjusting your Kubernetes probe settings:

startupProbe:
  httpGet:
    path: /
    port: 5000
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 60 # Allow up to 5 minutes for startup

livenessProbe:
  httpGet:
    path: /
    port: 5000
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /
    port: 5000
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3

Troubleshooting

1. Check pod logs for connection errors:

kubectl logs <pod-name> -n <namespace>

Look for timeout errors related to MongoDB or Redis connections.

2. Verify DNS resolution from within the cluster:

kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup <your-mongodb-host>

3. Test connectivity to external services:

kubectl run -it --rm debug --image=mongo:latest --restart=Never -- mongosh "mongodb+srv://<connection-string>" --eval "db.runCommand({ping:1})"

4. If issues persist after configuration changes:

Ensure security groups allow traffic between EKS nodes and your MongoDB/Redis instances
Verify VPC peering or PrivateLink configurations if using cross-VPC connections
Check that IAM roles have appropriate permissions for S3 bucket access