Skip to content

Multi-Cloud PKI

Why This Matters

For executives: Multi-cloud strategy fails if you're locked into each cloud's native PKI. AWS certificates don't work in Azure, Azure Key Vault doesn't work in GCP. Vendor lock-in means losing negotiating leverage and paying premium prices. Unified multi-cloud PKI enables true cloud portability, avoids vendor lock-in, and provides single security policy across all environments. This is strategic infrastructure that enables business flexibility.

For security leaders: Native cloud PKI services are convenient but create security silos. Different policies in AWS vs Azure vs on-premises = security gaps and inconsistent enforcement. Multi-cloud PKI provides unified security policy, centralized audit trails, and consistent compliance regardless of deployment location. This is how you achieve security at scale across heterogeneous environments.

For engineers: Managing certificates separately in each cloud is operational hell. AWS ACM for AWS, Azure Key Vault for Azure, Let's Encrypt for on-premises - three different APIs, three different renewal processes, three different failure modes. Multi-cloud PKI means one automation platform, one set of procedures, one place to troubleshoot. This is operational sanity.

Common scenario: Your organization runs workloads in AWS and Azure, with on-premises infrastructure. Current state: AWS certificates managed through ACM, Azure through Key Vault, on-premises through manual processes. Result: inconsistent security policies, no unified visibility, three different operational procedures. Multi-cloud PKI unifies this into single platform with consistent management across all environments.


TL;DR

Multi-cloud PKI architectures enable organizations to manage certificates consistently across AWS, Azure, GCP, and on-premises infrastructure while avoiding vendor lock-in and maintaining unified security policies. The fundamental challenge is that each cloud provider offers different native certificate services (AWS ACM, Azure Key Vault Certificates, GCP Certificate Manager) with incompatible APIs, limited portability, and varying feature sets. Successful multi-cloud PKI requires: centralized certificate authority infrastructure independent of any single cloud, unified certificate lifecycle automation using cloud-agnostic tools (Terraform, Kubernetes cert-manager, HashiCorp Vault), consistent secrets management across environments, service mesh integration for microservices (Istio, Linkerd, Consul Connect), and comprehensive visibility through centralized monitoring. Organizations should prioritize interoperability over cloud-native features, establish clear policies for when to use managed services versus self-hosted PKI, and build automation that works identically across all clouds.

Key Insight: The promise of cloud portability fails in practice if your PKI is deeply coupled to provider-specific services. A successful multi-cloud strategy treats certificate management as a horizontal platform service that spans clouds rather than vertical integration within each cloud's ecosystem. This requires accepting some friction (managing your own CA instead of using native services) in exchange for true portability and unified control.


Overview

Multi-cloud PKI addresses the operational reality that most enterprises use multiple cloud providers, maintain on-premises infrastructure, and require consistent certificate management across all environments. This creates challenges around:

Architectural Challenges:

  • Divergent APIs and data models across cloud providers
  • Different certificate validation and renewal workflows
  • Incompatible secrets management systems
  • Varied integration patterns for compute services
  • Cloud-specific networking and security boundaries

Operational Challenges:

  • Maintaining visibility across dispersed certificate inventory
  • Consistent policy enforcement independent of cloud provider
  • Unified renewal and lifecycle management
  • Cross-cloud certificate distribution
  • Audit and compliance across heterogeneous environments

Strategic Considerations:

  • Vendor lock-in risks when using cloud-native PKI services
  • Cost optimization across cloud billing models
  • Disaster recovery and multi-region failover
  • Regulatory requirements for data sovereignty
  • Migration flexibility between clouds

Cloud Provider Certificate Services

AWS Certificate Manager (ACM)

AWS's managed certificate service for AWS resources:

Capabilities:

  • Free certificates for AWS-integrated services
  • Automatic renewal with no customer action
  • Integration with ELB, CloudFront, API Gateway, Elastic Beanstalk
  • Supports public (via Amazon's CA) and private certificates
  • Regional service with certificate-per-region requirement
  • No export of private keys for public certificates

Limitations:

  • Only works with AWS services (cannot export most certificates)
  • No support for client certificates
  • Limited to 398-day validity
  • Regional isolation requires certificate duplication
  • Cannot use with EC2 instances directly (must use load balancer)

Use Cases:

  • Public-facing websites on CloudFront or ALB
  • API Gateway REST APIs
  • Internal services using Private CA
  • Temporary certificates for testing

Terraform Example:

# Request ACM certificate
resource "aws_acm_certificate" "example" {
  domain_name       = "example.com"
  validation_method = "DNS"

  subject_alternative_names = [
    "www.example.com",
    "api.example.com"
  ]

  tags = {
    Environment = "production"
    ManagedBy   = "Terraform"
  }

  lifecycle {
    create_before_destroy = true
  }
}

# Create DNS validation records
resource "aws_route53_record" "cert_validation" {
  for_each = {
    for dvo in aws_acm_certificate.example.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
    }
  }

  allow_overwrite = true
  name            = each.value.name
  records         = [each.value.record]
  ttl             = 60
  type            = each.value.type
  zone_id         = var.route53_zone_id
}

# Wait for validation
resource "aws_acm_certificate_validation" "example" {
  certificate_arn         = aws_acm_certificate.example.arn
  validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}

# Attach to load balancer
resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.example.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
  certificate_arn   = aws_acm_certificate_validation.example.certificate_arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.example.arn
  }
}

AWS Private CA:

# Create private CA
resource "aws_acmpca_certificate_authority" "example" {
  type = "ROOT"

  certificate_authority_configuration {
    key_algorithm     = "RSA_2048"
    signing_algorithm = "SHA256WITHRSA"

    subject {
      common_name  = "Example Corp Root CA"
      organization = "Example Corp"
      country      = "US"
    }
  }

  permanent_deletion_time_in_days = 7
}

# Issue certificate from private CA
resource "aws_acmpca_certificate" "server" {
  certificate_authority_arn   = aws_acmpca_certificate_authority.example.arn
  certificate_signing_request = tls_cert_request.server.cert_request_pem
  signing_algorithm           = "SHA256WITHRSA"

  validity {
    type  = "DAYS"
    value = 90
  }
}

Azure Key Vault Certificates

Azure's integrated certificate management within Key Vault:

Capabilities:

  • Unified storage for certificates, keys, and secrets
  • Automatic renewal with supported CAs (DigiCert, GlobalSign)
  • Manual import of certificates from any CA
  • Export of certificates with private keys (for entitled users)
  • Integration with Azure App Service, Application Gateway, CDN
  • RBAC integration with Azure AD
  • Soft-delete and purge protection

Limitations:

  • Per-vault limits (5000 certificate versions)
  • Regional service requiring cross-region replication
  • API throttling can impact automation at scale
  • Cost per certificate operation (retrieval, update)
  • Limited to Azure-integrated services

Use Cases:

  • Azure App Service custom domains
  • Application Gateway SSL termination
  • Azure Functions HTTPS
  • VM-based applications with Key Vault integration
  • Client certificate authentication

Terraform Example:

# Create Key Vault
resource "azurerm_key_vault" "example" {
  name                = "example-kv"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  tenant_id           = data.azurerm_client_config.current.tenant_id
  sku_name            = "standard"

  soft_delete_retention_days = 90
  purge_protection_enabled   = true

  network_acls {
    default_action = "Deny"
    bypass         = "AzureServices"

    ip_rules = ["1.2.3.4"]
  }
}

# Import certificate
resource "azurerm_key_vault_certificate" "imported" {
  name         = "imported-cert"
  key_vault_id = azurerm_key_vault.example.id

  certificate {
    contents = filebase64("certificate.pfx")
    password = var.pfx_password
  }
}

# Create self-signed certificate
resource "azurerm_key_vault_certificate" "selfsigned" {
  name         = "selfsigned-cert"
  key_vault_id = azurerm_key_vault.example.id

  certificate_policy {
    issuer_parameters {
      name = "Self"
    }

    key_properties {
      exportable = true
      key_size   = 2048
      key_type   = "RSA"
      reuse_key  = true
    }

    lifetime_action {
      action {
        action_type = "AutoRenew"
      }

      trigger {
        days_before_expiry = 30
      }
    }

    secret_properties {
      content_type = "application/x-pkcs12"
    }

    x509_certificate_properties {
      extended_key_usage = ["1.3.6.1.5.5.7.3.1"]  # Server auth

      key_usage = [
        "digitalSignature",
        "keyEncipherment",
      ]

      subject            = "CN=example.com"
      validity_in_months = 12

      subject_alternative_names {
        dns_names = ["example.com", "www.example.com"]
      }
    }
  }
}

# Use certificate with App Service
resource "azurerm_app_service_certificate" "example" {
  name                = "example-cert"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  key_vault_secret_id = azurerm_key_vault_certificate.imported.secret_id
}

resource "azurerm_app_service_custom_hostname_binding" "example" {
  hostname            = "www.example.com"
  app_service_name    = azurerm_app_service.example.name
  resource_group_name = azurerm_resource_group.example.name
  ssl_state           = "SniEnabled"
  thumbprint          = azurerm_app_service_certificate.example.thumbprint
}

Google Cloud Certificate Manager

GCP's newer certificate management service:

Capabilities:

  • Global service (not regional like AWS ACM)
  • Automatic certificate provisioning for external HTTPS load balancers
  • DNS authorization via Cloud DNS
  • Certificate maps for routing to multiple certificates
  • Integration with Cloud Load Balancing, Cloud CDN
  • Self-managed certificates for custom CAs

Limitations:

  • Relatively new service (GA in 2021)
  • Limited to GCP load balancers and CDN
  • Cannot use certificates on Compute Engine instances
  • No client certificate support
  • Regional Certificate Manager for internal load balancers

Use Cases:

  • Global HTTPS load balancers
  • Multi-region CDN deployments
  • GKE ingress with managed certificates
  • Cloud Run custom domains

Terraform Example:

# DNS authorization for domain validation
resource "google_certificate_manager_dns_authorization" "default" {
  name        = "dns-auth"
  description = "DNS authorization for example.com"
  domain      = "example.com"
}

# Create DNS record for validation
resource "google_dns_record_set" "cname" {
  name         = google_certificate_manager_dns_authorization.default.dns_resource_record[0].name
  type         = google_certificate_manager_dns_authorization.default.dns_resource_record[0].type
  ttl          = 300
  managed_zone = google_dns_managed_zone.default.name
  rrdatas      = [google_certificate_manager_dns_authorization.default.dns_resource_record[0].data]
}

# Create certificate
resource "google_certificate_manager_certificate" "default" {
  name        = "example-cert"
  description = "Certificate for example.com"
  scope       = "DEFAULT"

  managed {
    domains = [
      "example.com",
      "www.example.com"
    ]

    dns_authorizations = [
      google_certificate_manager_dns_authorization.default.id
    ]
  }
}

# Create certificate map
resource "google_certificate_manager_certificate_map" "default" {
  name        = "cert-map"
  description = "Certificate map for load balancer"
}

resource "google_certificate_manager_certificate_map_entry" "default" {
  name         = "cert-map-entry"
  description  = "Map entry for example.com"
  map          = google_certificate_manager_certificate_map.default.name
  certificates = [google_certificate_manager_certificate.default.id]
  hostname     = "example.com"
}

# Attach to load balancer
resource "google_compute_target_https_proxy" "default" {
  name             = "https-proxy"
  url_map          = google_compute_url_map.default.id
  certificate_map  = "//certificatemanager.googleapis.com/${google_certificate_manager_certificate_map.default.id}"
}


Decision Framework

Use cloud-native PKI (AWS ACM, Azure Key Vault, GCP Certificate Manager) when:

  • Single-cloud deployment (no multi-cloud requirements)
  • Cloud-native services only (ELB, Application Gateway, Cloud Load Balancer)
  • Comfort with vendor lock-in
  • Prioritize operational simplicity over portability
  • No complex certificate requirements (standard TLS only)

Use centralized multi-cloud PKI when:

  • True multi-cloud deployment (workloads in 2+ clouds)
  • On-premises + cloud hybrid architecture
  • Need consistent security policy across all environments
  • Want to avoid vendor lock-in
  • Complex certificate requirements (custom validation, special extensions)
  • Regulatory requirements for PKI control

Architecture pattern selection:

Centralized CA with distributed issuance (most common):

  • Single CA infrastructure (HashiCorp Vault, EJBCA, Smallstep)
  • cert-manager deployed in each cloud/cluster
  • Good: Unified policy, single source of truth
  • Challenge: CA becomes single point of failure (need HA)

Federated CAs with cross-certification:

  • Separate CA per cloud, cross-certified for trust
  • Good: No single point of failure, regional autonomy
  • Challenge: Complex trust relationships, policy consistency

Hybrid approach:

  • Cloud-native for simple use cases (public-facing TLS)
  • Self-hosted CA for complex use cases (service mesh, mutual TLS)
  • Good: Pragmatic, uses best tool for each job
  • Challenge: Managing two systems

Tool selection:

HashiCorp Vault when:

  • Need centralized secrets management + PKI
  • Dynamic secret requirements
  • Already using Vault for other use cases
  • Kubernetes + traditional infrastructure
  • Strong community support valuable

cert-manager when:

  • Kubernetes-native workloads primarily
  • Want ACME protocol support
  • Need Let's Encrypt integration
  • Simpler use cases (no complex secret management)

Venafi when:

  • Enterprise scale (10,000+ certificates)
  • Complex compliance requirements
  • Need commercial support
  • Budget supports commercial tooling

EJBCA / Smallstep when:

  • Need full-featured open-source CA
  • Custom PKI requirements
  • Want self-hosted without cloud dependencies

Red flags indicating multi-cloud PKI problems:

  • Different certificate management in each cloud (no consistency)
  • No unified visibility into certificate inventory
  • Manual certificate operations in any environment
  • "We'll figure out multi-cloud later" (becomes migration nightmare)
  • Using cloud-native exclusively without portability plan
  • No policy for when to use native vs self-hosted PKI
  • Different security policies in each cloud

Common mistakes:

  • Starting with cloud-native PKI, discovering vendor lock-in too late
  • Treating each cloud as separate problem instead of unified architecture
  • Not planning for certificate portability from day one
  • Underestimating operational complexity of self-hosted CA
  • Over-engineering (trying to abstract too much, creating complexity)
  • No clear ownership of multi-cloud PKI architecture

Cross-Cloud Architecture Patterns

Centralized CA with Distributed Issuance

Single certificate authority serving all clouds:

                    ┌──────────────────┐
                    │  Centralized CA  │
                    │   (HashiCorp     │
                    │     Vault or     │
                    │   Custom PKI)    │
                    └────────┬─────────┘
                             │ HTTPS/ACME
           ┌─────────────────┼─────────────────┐
           │                 │                 │
      ┌────▼────┐       ┌────▼────┐      ┌────▼────┐
      │   AWS   │       │  Azure  │      │   GCP   │
      │ Issuer  │       │ Issuer  │      │ Issuer  │
      │ Agent   │       │ Agent   │      │ Agent   │
      └────┬────┘       └────┬────┘      └────┬────┘
           │                 │                 │
      ┌────▼────┐       ┌────▼────┐      ┌────▼────┐
      │ Secrets │       │   Key   │      │ Secret  │
      │ Manager │       │  Vault  │      │ Manager │
      └────┬────┘       └────┬────┘      └────┬────┘
           │                 │                 │
      ┌────▼────┐       ┌────▼────┐      ┌────▼────┐
      │   EC2   │       │   VMs   │      │  GCE    │
      │   ECS   │       │   AKS   │      │  GKE    │
      │   EKS   │       │  App Svc│      │Cloud Run│
      └─────────┘       └─────────┘      └─────────┘

Implementation:

class MultiCloudCertificateIssuer:
    """Centralized certificate issuance for multiple clouds"""

    def __init__(self, vault_url, vault_token):
        import hvac

        self.vault = hvac.Client(url=vault_url, token=vault_token)

        # Cloud-specific clients
        self.aws_sm = boto3.client('secretsmanager')
        self.azure_kv = SecretClient(
            vault_url="https://example-kv.vault.azure.net/",
            credential=DefaultAzureCredential()
        )
        self.gcp_sm = secretmanager.SecretManagerServiceClient()

    def issue_certificate(self, common_name, cloud_provider, secret_path):
        """Issue certificate and distribute to appropriate cloud"""

        # Issue from Vault PKI
        response = self.vault.secrets.pki.generate_certificate(
            name='multi-cloud-role',
            common_name=common_name,
            ttl='90d',
            mount_point='pki-int'
        )

        certificate = response['data']['certificate']
        private_key = response['data']['private_key']
        ca_chain = response['data']['ca_chain']

        # Combine into full chain
        full_chain = certificate + '\n' + '\n'.join(ca_chain)

        # Distribute to cloud-specific secrets manager
        if cloud_provider == 'aws':
            self.store_in_aws(secret_path, full_chain, private_key)
        elif cloud_provider == 'azure':
            self.store_in_azure(secret_path, full_chain, private_key)
        elif cloud_provider == 'gcp':
            self.store_in_gcp(secret_path, full_chain, private_key)

        return {
            'certificate': certificate,
            'secret_path': secret_path,
            'cloud_provider': cloud_provider,
            'expires': response['data']['expiration']
        }

    def store_in_aws(self, secret_name, certificate, private_key):
        """Store certificate in AWS Secrets Manager"""

        secret_value = json.dumps({
            'certificate': certificate,
            'private_key': private_key
        })

        try:
            self.aws_sm.create_secret(
                Name=secret_name,
                SecretString=secret_value,
                Tags=[
                    {'Key': 'ManagedBy', 'Value': 'MultiCloudPKI'},
                    {'Key': 'Type', 'Value': 'TLSCertificate'}
                ]
            )
        except self.aws_sm.exceptions.ResourceExistsException:
            self.aws_sm.put_secret_value(
                SecretId=secret_name,
                SecretString=secret_value
            )

    def store_in_azure(self, secret_name, certificate, private_key):
        """Store certificate in Azure Key Vault"""

        # Combine into PFX format for Azure
        pfx_bytes = self.create_pfx(certificate, private_key)

        self.azure_kv.set_secret(
            name=secret_name,
            value=base64.b64encode(pfx_bytes).decode()
        )

    def store_in_gcp(self, secret_name, certificate, private_key):
        """Store certificate in GCP Secret Manager"""

        project_id = 'your-project-id'
        parent = f"projects/{project_id}"

        secret_value = json.dumps({
            'certificate': certificate,
            'private_key': private_key
        })

        # Create secret if doesn't exist
        try:
            self.gcp_sm.create_secret(
                request={
                    "parent": parent,
                    "secret_id": secret_name,
                    "secret": {
                        "replication": {"automatic": {}}
                    }
                }
            )
        except Exception:
            pass  # Secret already exists

        # Add secret version
        parent_secret = f"{parent}/secrets/{secret_name}"
        self.gcp_sm.add_secret_version(
            request={
                "parent": parent_secret,
                "payload": {"data": secret_value.encode()}
            }
        )

Federated CA Model

Multiple CAs per cloud, cross-signed for trust:

                    ┌──────────────────┐
                    │    Root CA       │
                    │  (On-premises    │
                    │     HSM)         │
                    └────────┬─────────┘
           ┌─────────────────┼─────────────────┐
           │                 │                 │
      ┌────▼────┐       ┌────▼────┐      ┌────▼────┐
      │   AWS   │       │  Azure  │      │   GCP   │
      │ Issuing │       │ Issuing │      │ Issuing │
      │   CA    │       │   CA    │      │   CA    │
      └────┬────┘       └────┬────┘      └────┬────┘
           │                 │                 │
   Local issuance    Local issuance    Local issuance
   in AWS VPC        in Azure VNet     in GCP VPC

Benefits:

  • Reduced latency (local issuance)
  • Cloud isolation for security
  • Compliance with data residency
  • Failure isolation

Challenges:

  • Complex trust chain management
  • Certificate distribution complexity
  • Increased operational overhead
  • CA key management per cloud

Service Mesh Integration

Using service mesh for multi-cloud certificate automation:

# Istio configuration for multi-cloud
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: multi-cloud
spec:
  meshConfig:
    # Centralized CA
    ca:
      address: "vault.example.com:8200"
      tlsSettings:
        mode: SIMPLE

    # Certificate settings
    certificates:
      - secretName: istio-ca-secret
        dnsNames:
          - "*.aws.example.com"
          - "*.azure.example.com"
          - "*.gcp.example.com"

    # Trust domain spanning clouds
    trustDomain: "example.com"

  components:
    pilot:
      k8s:
        env:
          # Enable multi-cluster
          - name: PILOT_ENABLE_CROSS_CLUSTER_WORKLOAD_ENTRY
            value: "true"
          - name: PILOT_SKIP_VALIDATE_TRUST_DOMAIN
            value: "true"

Kubernetes cert-manager for Multi-Cloud

Cert-manager provides cloud-agnostic certificate automation:

Installation Across Clouds

# Install cert-manager (same across all clouds)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml

# Verify installation
kubectl get pods -n cert-manager

Vault Issuer Configuration

# vault-issuer.yaml - Same configuration across AWS, Azure, GCP clusters
apiVersion: v1
kind: Secret
metadata:
  name: vault-token
  namespace: cert-manager
type: Opaque
data:
  token: <base64-encoded-vault-token>
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: vault-issuer
spec:
  vault:
    server: https://vault.example.com:8200
    path: pki-int/sign/kubernetes
    caBundle: <base64-encoded-ca-bundle>
    auth:
      tokenSecretRef:
        name: vault-token
        key: token

Certificate Request

# certificate.yaml - Works identically across all clouds
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-app
  namespace: production
spec:
  secretName: example-app-tls
  duration: 2160h  # 90 days
  renewBefore: 720h  # 30 days

  subject:
    organizations:
      - Example Corp

  commonName: example-app.example.com

  dnsNames:
    - example-app.example.com
    - example-app.aws.example.com
    - example-app.azure.example.com
    - example-app.gcp.example.com

  issuerRef:
    name: vault-issuer
    kind: ClusterIssuer
    group: cert-manager.io

  privateKey:
    algorithm: RSA
    size: 2048
    rotationPolicy: Always

Ingress Integration

# ingress.yaml - Standard Kubernetes, cloud-agnostic
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-app
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: vault-issuer
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - example-app.example.com
      secretName: example-app-tls
  rules:
    - host: example-app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: example-app
                port:
                  number: 80

HashiCorp Vault Multi-Cloud Deployment

Architecture

                     ┌─────────────────┐
                     │  Load Balancer  │
                     │   (Multi-Cloud  │
                     │    Endpoint)    │
                     └────────┬────────┘
            ┌─────────────────┼─────────────────┐
            │                 │                 │
       ┌────▼────┐       ┌────▼────┐      ┌────▼────┐
       │  Vault  │       │  Vault  │      │  Vault  │
       │  Node   │◄─────►│  Node   │◄────►│  Node   │
       │  (AWS)  │       │ (Azure) │      │  (GCP)  │
       └────┬────┘       └────┬────┘      └────┬────┘
            │                 │                 │
       ┌────▼────┐       ┌────▼────┐      ┌────▼────┐
       │   DynamoDB│      │  Azure  │      │Firestore│
       │  Storage  │      │ Storage │      │ Storage │
       └───────────┘      └─────────┘      └─────────┘

Terraform Deployment

# modules/vault-cluster/main.tf
variable "cloud_provider" {
  description = "Cloud provider (aws, azure, gcp)"
  type        = string
}

variable "region" {
  description = "Cloud region"
  type        = string
}

# AWS Vault Cluster
module "vault_aws" {
  source = "./modules/vault-cluster"
  count  = var.deploy_aws ? 1 : 0

  cloud_provider = "aws"
  region         = "us-east-1"

  vault_version  = "1.15.0"
  instance_count = 3
  instance_type  = "m5.large"

  storage_backend = "dynamodb"

  kms_key_id = aws_kms_key.vault.id

  tags = {
    Environment = "production"
    ManagedBy   = "Terraform"
  }
}

# Azure Vault Cluster
module "vault_azure" {
  source = "./modules/vault-cluster"
  count  = var.deploy_azure ? 1 : 0

  cloud_provider = "azure"
  region         = "eastus"

  vault_version  = "1.15.0"
  instance_count = 3
  instance_type  = "Standard_D2s_v3"

  storage_backend = "azure"

  key_vault_id = azurerm_key_vault.vault.id
}

# GCP Vault Cluster
module "vault_gcp" {
  source = "./modules/vault-cluster"
  count  = var.deploy_gcp ? 1 : 0

  cloud_provider = "gcp"
  region         = "us-central1"

  vault_version  = "1.15.0"
  instance_count = 3
  instance_type  = "n1-standard-2"

  storage_backend = "firestore"

  kms_key_id = google_kms_crypto_key.vault.id
}

# Global load balancer
resource "cloudflare_load_balancer" "vault" {
  zone_id = var.cloudflare_zone_id
  name    = "vault.example.com"

  default_pool_ids = [
    cloudflare_load_balancer_pool.aws.id,
    cloudflare_load_balancer_pool.azure.id,
    cloudflare_load_balancer_pool.gcp.id
  ]

  fallback_pool_id = cloudflare_load_balancer_pool.aws.id

  session_affinity = "cookie"
}

resource "cloudflare_load_balancer_pool" "aws" {
  name = "vault-aws"

  origins {
    name    = "vault-aws-1"
    address = module.vault_aws[0].endpoint
    enabled = true
  }

  monitor = cloudflare_load_balancer_monitor.vault.id
}

# Health monitor
resource "cloudflare_load_balancer_monitor" "vault" {
  type     = "https"
  path     = "/v1/sys/health"
  interval = 60
  timeout  = 5
  retries  = 2

  expected_codes = "200,429,473,503"  # Various Vault health states
}

PKI Secrets Engine Configuration

# Enable PKI secrets engine
vault secrets enable -path=pki-root pki
vault secrets enable -path=pki-int pki

# Tune max lease TTL
vault secrets tune -max-lease-ttl=87600h pki-root  # 10 years
vault secrets tune -max-lease-ttl=43800h pki-int   # 5 years

# Generate root CA
vault write -field=certificate pki-root/root/generate/internal \
    common_name="Example Corp Root CA" \
    ttl=87600h > root-ca.crt

# Generate intermediate CSR
vault write -field=csr pki-int/intermediate/generate/internal \
    common_name="Example Corp Intermediate CA" \
    > pki-int.csr

# Sign intermediate with root
vault write -field=certificate pki-root/root/sign-intermediate \
    csr=@pki-int.csr \
    format=pem_bundle \
    ttl=43800h > intermediate-ca.crt

# Import signed intermediate
vault write pki-int/intermediate/set-signed \
    certificate=@intermediate-ca.crt

# Configure URLs
vault write pki-int/config/urls \
    issuing_certificates="https://vault.example.com:8200/v1/pki-int/ca" \
    crl_distribution_points="https://vault.example.com:8200/v1/pki-int/crl"

# Create role for multi-cloud certificates
vault write pki-int/roles/multi-cloud \
    allowed_domains="example.com,aws.example.com,azure.example.com,gcp.example.com" \
    allow_subdomains=true \
    max_ttl="2160h" \
    key_type="rsa" \
    key_bits=2048

Secrets Management Integration

Unified Secrets Distribution

class MultiCloudSecretsManager:
    """Distribute certificates across cloud secrets managers"""

    def __init__(self):
        # Initialize cloud clients
        self.aws_sm = boto3.client('secretsmanager', region_name='us-east-1')
        self.azure_kv = SecretClient(
            vault_url="https://example-kv.vault.azure.net/",
            credential=DefaultAzureCredential()
        )
        self.gcp_sm = secretmanager.SecretManagerServiceClient()
        self.gcp_project = 'your-project-id'

    def distribute_certificate(self, cert_pem, key_pem, ca_chain, target_clouds):
        """Distribute certificate to multiple clouds"""

        results = {}

        for cloud_config in target_clouds:
            cloud = cloud_config['provider']
            secret_name = cloud_config['secret_name']

            try:
                if cloud == 'aws':
                    arn = self.store_aws(secret_name, cert_pem, key_pem, ca_chain)
                    results['aws'] = {'success': True, 'arn': arn}

                elif cloud == 'azure':
                    url = self.store_azure(secret_name, cert_pem, key_pem, ca_chain)
                    results['azure'] = {'success': True, 'url': url}

                elif cloud == 'gcp':
                    name = self.store_gcp(secret_name, cert_pem, key_pem, ca_chain)
                    results['gcp'] = {'success': True, 'name': name}

            except Exception as e:
                results[cloud] = {'success': False, 'error': str(e)}

        return results

    def store_aws(self, secret_name, cert_pem, key_pem, ca_chain):
        """Store in AWS Secrets Manager"""

        secret_value = json.dumps({
            'certificate': cert_pem,
            'private_key': key_pem,
            'ca_chain': ca_chain,
            'updated_at': datetime.utcnow().isoformat()
        })

        try:
            response = self.aws_sm.create_secret(
                Name=secret_name,
                SecretString=secret_value,
                Tags=[
                    {'Key': 'Type', 'Value': 'TLSCertificate'},
                    {'Key': 'ManagedBy', 'Value': 'MultiCloudPKI'}
                ]
            )
            return response['ARN']

        except self.aws_sm.exceptions.ResourceExistsException:
            response = self.aws_sm.put_secret_value(
                SecretId=secret_name,
                SecretString=secret_value
            )
            return response['ARN']

    def store_azure(self, secret_name, cert_pem, key_pem, ca_chain):
        """Store in Azure Key Vault as certificate"""

        from cryptography.hazmat.primitives import serialization
        from cryptography.hazmat.primitives.serialization import pkcs12

        # Load certificate and key
        cert = x509.load_pem_x509_certificate(cert_pem.encode())
        key = serialization.load_pem_private_key(key_pem.encode(), password=None)

        # Create PFX/PKCS12
        pfx = pkcs12.serialize_key_and_certificates(
            name=b"certificate",
            key=key,
            cert=cert,
            cas=None,
            encryption_algorithm=serialization.BestAvailableEncryption(b"")
        )

        # Import to Key Vault
        poller = self.azure_kv.import_certificate(
            certificate_name=secret_name,
            certificate_bytes=pfx
        )

        return poller.result().id

    def store_gcp(self, secret_name, cert_pem, key_pem, ca_chain):
        """Store in GCP Secret Manager"""

        parent = f"projects/{self.gcp_project}"

        secret_value = json.dumps({
            'certificate': cert_pem,
            'private_key': key_pem,
            'ca_chain': ca_chain,
            'updated_at': datetime.utcnow().isoformat()
        })

        # Create secret if doesn't exist
        try:
            secret = self.gcp_sm.create_secret(
                request={
                    "parent": parent,
                    "secret_id": secret_name,
                    "secret": {
                        "replication": {
                            "automatic": {}
                        },
                        "labels": {
                            "type": "tls-certificate",
                            "managed-by": "multi-cloud-pki"
                        }
                    }
                }
            )
        except Exception:
            secret = self.gcp_sm.get_secret(
                request={"name": f"{parent}/secrets/{secret_name}"}
            )

        # Add new version
        version = self.gcp_sm.add_secret_version(
            request={
                "parent": secret.name,
                "payload": {"data": secret_value.encode()}
            }
        )

        return version.name

Monitoring and Visibility

Centralized Certificate Inventory

class MultiCloudCertificateInventory:
    """Maintain unified certificate inventory across clouds"""

    def __init__(self, db_connection):
        self.db = db_connection

        # Cloud clients
        self.aws_acm = boto3.client('acm')
        self.aws_sm = boto3.client('secretsmanager')
        self.azure_kv = SecretClient(...)
        self.gcp_cm = CertificateManagerClient()

    def scan_all_clouds(self):
        """Scan certificates across all cloud providers"""

        inventory = {
            'aws': self.scan_aws(),
            'azure': self.scan_azure(),
            'gcp': self.scan_gcp(),
            'timestamp': datetime.utcnow().isoformat()
        }

        # Store in database
        self.store_inventory(inventory)

        # Analyze for issues
        issues = self.analyze_inventory(inventory)

        return {
            'inventory': inventory,
            'issues': issues,
            'summary': self.generate_summary(inventory)
        }

    def scan_aws(self):
        """Scan AWS certificates from ACM and Secrets Manager"""

        certificates = []

        # Scan ACM certificates in all regions
        for region in ['us-east-1', 'us-west-2', 'eu-west-1']:
            acm = boto3.client('acm', region_name=region)

            paginator = acm.get_paginator('list_certificates')
            for page in paginator.paginate():
                for cert_summary in page['CertificateSummaryList']:
                    cert = acm.describe_certificate(
                        CertificateArn=cert_summary['CertificateArn']
                    )['Certificate']

                    certificates.append({
                        'cloud': 'aws',
                        'region': region,
                        'service': 'acm',
                        'id': cert['CertificateArn'],
                        'domain': cert['DomainName'],
                        'sans': cert.get('SubjectAlternativeNames', []),
                        'issuer': cert.get('Issuer'),
                        'not_before': cert['NotBefore'].isoformat(),
                        'not_after': cert['NotAfter'].isoformat(),
                        'status': cert['Status'],
                        'in_use': len(cert.get('InUseBy', [])) > 0
                    })

        # Scan Secrets Manager for certificates
        sm_certs = self.scan_aws_secrets_manager()
        certificates.extend(sm_certs)

        return certificates

    def scan_azure(self):
        """Scan Azure Key Vault certificates"""

        certificates = []

        # List all vaults (would need to iterate subscriptions/resource groups)
        for vault_url in self.get_azure_vaults():
            client = CertificateClient(
                vault_url=vault_url,
                credential=DefaultAzureCredential()
            )

            for cert_properties in client.list_properties_of_certificates():
                cert = client.get_certificate(cert_properties.name)

                certificates.append({
                    'cloud': 'azure',
                    'service': 'key_vault',
                    'vault': vault_url,
                    'id': cert.id,
                    'name': cert.name,
                    'sans': self.extract_sans_from_azure(cert),
                    'not_before': cert.properties.not_before.isoformat(),
                    'not_after': cert.properties.not_after.isoformat(),
                    'enabled': cert.properties.enabled
                })

        return certificates

    def scan_gcp(self):
        """Scan GCP Certificate Manager"""

        certificates = []

        client = CertificateManagerClient()

        # List certificates across all locations
        for location in ['global', 'us-central1', 'europe-west1']:
            parent = f"projects/{self.gcp_project}/locations/{location}"

            for cert in client.list_certificates(parent=parent):
                certificates.append({
                    'cloud': 'gcp',
                    'location': location,
                    'service': 'certificate_manager',
                    'id': cert.name,
                    'domains': cert.managed.domains if cert.managed else [],
                    'expire_time': cert.expire_time.isoformat() if cert.expire_time else None,
                    'scope': cert.scope
                })

        return certificates

    def analyze_inventory(self, inventory):
        """Identify issues in certificate inventory"""

        issues = []
        now = datetime.utcnow()

        for cloud, certificates in inventory.items():
            if cloud == 'timestamp':
                continue

            for cert in certificates:
                # Check expiration
                not_after = datetime.fromisoformat(cert['not_after'].replace('Z', '+00:00'))
                days_until_expiry = (not_after - now).days

                if days_until_expiry < 0:
                    issues.append({
                        'severity': 'critical',
                        'type': 'expired',
                        'cloud': cloud,
                        'certificate': cert['id'],
                        'domain': cert.get('domain', cert.get('name')),
                        'expired_days_ago': abs(days_until_expiry)
                    })
                elif days_until_expiry < 30:
                    issues.append({
                        'severity': 'warning',
                        'type': 'expiring_soon',
                        'cloud': cloud,
                        'certificate': cert['id'],
                        'domain': cert.get('domain', cert.get('name')),
                        'days_until_expiry': days_until_expiry
                    })

                # Check if certificate is unused
                if 'in_use' in cert and not cert['in_use']:
                    issues.append({
                        'severity': 'info',
                        'type': 'unused',
                        'cloud': cloud,
                        'certificate': cert['id']
                    })

        return issues

Metrics and Alerting

from prometheus_client import Gauge, Counter

# Define metrics
certificates_total = Gauge(
    'multicloud_certificates_total',
    'Total certificates',
    ['cloud', 'status']
)

certificates_expiring = Gauge(
    'multicloud_certificates_expiring',
    'Certificates expiring soon',
    ['cloud', 'days_threshold']
)

certificate_renewals = Counter(
    'multicloud_certificate_renewals_total',
    'Certificate renewals',
    ['cloud', 'success']
)

class MultiCloudMetrics:
    """Collect and expose metrics for multi-cloud certificates"""

    def update_metrics(self, inventory):
        """Update Prometheus metrics from inventory"""

        # Reset gauges
        certificates_total._metrics.clear()
        certificates_expiring._metrics.clear()

        for cloud, certificates in inventory.items():
            if cloud == 'timestamp':
                continue

            # Count by status
            status_counts = {}
            for cert in certificates:
                status = cert.get('status', 'unknown')
                status_counts[status] = status_counts.get(status, 0) + 1

            for status, count in status_counts.items():
                certificates_total.labels(cloud=cloud, status=status).set(count)

            # Count expiring certificates
            now = datetime.utcnow()
            expiring_30 = 0
            expiring_60 = 0
            expiring_90 = 0

            for cert in certificates:
                not_after = datetime.fromisoformat(cert['not_after'].replace('Z', '+00:00'))
                days = (not_after - now).days

                if days < 30:
                    expiring_30 += 1
                if days < 60:
                    expiring_60 += 1
                if days < 90:
                    expiring_90 += 1

            certificates_expiring.labels(cloud=cloud, days_threshold='30').set(expiring_30)
            certificates_expiring.labels(cloud=cloud, days_threshold='60').set(expiring_60)
            certificates_expiring.labels(cloud=cloud, days_threshold='90').set(expiring_90)

Common Pitfalls

Cloud-Specific Lock-In

Problem: Deep integration with cloud-native services makes migration impossible
Solution: Use cloud-agnostic tools (Vault, cert-manager), maintain portability as design principle

Inconsistent Policies

Problem: Different certificate policies across clouds create security gaps
Solution: Centralized policy engine, enforce at issuance regardless of cloud

Secret Sprawl

Problem: Certificates stored inconsistently across cloud secret stores
Solution: Unified secrets management strategy, automated distribution

Monitoring Blindness

Problem: Cannot see certificates across all clouds simultaneously
Solution: Centralized inventory system, regular scanning, unified dashboards

Manual Processes

Problem: Cloud-specific renewal processes create operational burden
Solution: Automation using cert-manager or similar tools that work identically everywhere


Security Considerations

Trust Chain Management

  • Maintain consistent root CA across all clouds
  • Protect root CA private key in HSM or air-gapped system
  • Document and test trust chain validation
  • Plan for CA key rotation across all environments

Secrets Security

  • Encrypt certificates at rest in all cloud secret stores
  • Use IAM/RBAC to restrict certificate access
  • Audit all certificate retrievals
  • Implement secrets rotation policies

Network Security

  • Use private endpoints for certificate issuance
  • Encrypt all certificate distribution
  • Implement network segmentation
  • Monitor for unauthorized access patterns

Compliance

  • Maintain audit logs across all clouds
  • Document certificate lifecycle for compliance
  • Implement retention policies consistently
  • Regular compliance assessments

Real-World Examples

Large Financial Institution

Multi-cloud deployment across AWS, Azure, on-premises:

  • Centralized Vault cluster for certificate issuance
  • 50,000+ certificates across 3 clouds and on-prem
  • 90-day certificate lifetimes with automated renewal
  • Kubernetes cert-manager in all clouds
  • Istio service mesh for mTLS across clouds
  • Centralized monitoring via Splunk
  • Compliance reporting for PCI-DSS, SOC 2

Lessons: Centralization essential at scale, cloud-agnostic tools critical, automation non-negotiable, visibility requires dedicated tooling.

SaaS Provider

Global deployment across AWS and GCP:

  • HashiCorp Vault in both clouds
  • Let's Encrypt for external certificates
  • Vault PKI for internal microservices
  • Cert-manager in all Kubernetes clusters
  • Unified certificate inventory system
  • Automated renewal 30 days before expiry
  • Prometheus metrics for monitoring

Lessons: Public and private PKI can coexist, Kubernetes makes multi-cloud simpler, observability prevents outages, automation enables developer self-service.

Enterprise with Hybrid Cloud

Azure primary, AWS secondary, large on-premises:

  • On-premises root CA (air-gapped)
  • Issuing CAs in Azure and AWS
  • SCEP for legacy systems
  • ACME for modern workloads
  • Azure Key Vault and AWS Secrets Manager
  • Manual approval for external certificates
  • Automated internal certificates

Lessons: Hybrid requires multiple protocols, legacy systems need different approaches, governance layers can slow automation, gradual migration necessary.


Lessons from Production

What We Learned at Apex Capital (AWS ACM Vendor Lock-In)

Apex Capital started cloud migration using AWS ACM for all certificates. Simple, worked great... until needed to expand to Azure:

Problem: AWS certificates completely non-portable

ACM certificates:

  • Can't be exported (private keys stay in AWS)
  • Only work with AWS services (ELB, CloudFront, API Gateway)
  • Can't be used in Azure or GCP
  • Can't be used on-premises

When Azure deployment needed certificates, discovered:

  • Had to rebuild entire certificate infrastructure in Azure
  • Different API, different automation
  • Different monitoring
  • Different operational procedures
  • No unified visibility across AWS + Azure

Cost: 6 months additional work, $300K in duplicate infrastructure, ongoing operational complexity

What should have been different:

Deploy multi-cloud PKI from start:

  • HashiCorp Vault or similar centralized CA
  • cert-manager in Kubernetes (cloud-agnostic)
  • Certificates portable across any environment
  • Single automation platform
  • Unified monitoring and visibility

Key insight: Cloud-native PKI seems simpler initially but creates vendor lock-in. Multi-cloud PKI requires upfront complexity but enables true portability.

Warning signs you're heading for same mistake:

  • "We're only using AWS/Azure/GCP" (famous last words)
  • Starting with cloud-native PKI without portability plan
  • No strategy for certificate portability
  • Assuming cloud-native is "simpler" without considering long-term costs
  • Not evaluating vendor lock-in implications

What We Learned at Vortex (Inconsistent Multi-Cloud Policies)

Vortex ran workloads in AWS and Azure. Initially let each cloud team manage their own PKI:

Problem: Policy inconsistencies created security gaps

AWS team and Azure team made different decisions:

  • AWS: 90-day certificate lifespans
  • Azure: 365-day certificate lifespans
  • AWS: Automated renewal
  • Azure: Manual approval required
  • AWS: Strong Key Usage enforcement
  • Azure: Permissive configurations

Result:

  • Security auditor found inconsistent policies
  • Compliance violation (different security levels in different environments)
  • No unified visibility (couldn't answer "how many certificates do we have?")
  • Different incident response procedures per cloud

What we did:

  • Deployed centralized HashiCorp Vault for all clouds
  • Unified certificate policy (same lifespans, same validation, same automation)
  • Single source of truth for certificate inventory
  • Consistent operational procedures across all environments
  • Centralized monitoring and alerting

Cost: $200K to unify + 4 months implementation

Key insight: Multi-cloud requires architectural discipline. Can't let each cloud be separate problem. Need unified policy and centralized control from start.

Warning signs you're heading for same mistake:

  • Each cloud team manages their own PKI
  • No unified policy across clouds
  • Can't answer "how many certificates across all clouds?"
  • Different security standards in different environments
  • "Each cloud is different" used as excuse for inconsistency

What We Learned at Nexus (Multi-Cloud Certificate Monitoring Blind Spots)

Nexus deployed workloads across AWS, Azure, and on-premises. Certificate monitoring in each environment separately:

Problem: No unified visibility = expired certificates discovered too late

Monitoring configuration:

  • AWS: CloudWatch alerts for ACM certificates
  • Azure: Azure Monitor for Key Vault certificates
  • On-premises: Separate monitoring system

Result:

  • Azure certificate expired unnoticed (monitoring not configured correctly)
  • 6-hour outage in Azure environment
  • AWS and on-premises unaffected but couldn't quickly determine blast radius
  • Incident response confused (which environment affected?)

What we did:

  • Deployed centralized certificate inventory system
  • Single monitoring dashboard across all clouds
  • Unified alerting (one place, all certificates)
  • Regular certificate inventory reconciliation
  • Automated expiry notifications with 90/60/30/7 day warnings

Key insight: Multi-cloud monitoring is hard. Native cloud monitoring misses things. Need centralized visibility platform independent of any single cloud.

Warning signs you're heading for same mistake:

  • Separate monitoring in each cloud
  • No unified certificate inventory
  • Can't quickly answer "what certificates expire in next 30 days across all environments?"
  • Relying on cloud-native monitoring exclusively
  • No centralized alerting

Business Impact

Cost of getting this wrong: Apex Capital's AWS ACM lock-in cost $300K + 6 months to fix (should have been avoided with proper architecture). Vortex's inconsistent policies cost $200K + 4 months unification + compliance violation. Nexus's monitoring blind spot caused 6-hour outage + customer SLA penalties + emergency remediation costs.

Value of getting this right: Proper multi-cloud PKI provides:

  • True cloud portability: Workloads can move between clouds without certificate rework
  • Vendor negotiating leverage: Not locked into any single cloud provider
  • Consistent security: Same policies across all environments, no gaps
  • Unified visibility: Single place to see all certificates, all clouds
  • Operational efficiency: One automation platform, not three
  • Reduced compliance risk: Consistent controls auditable across all environments

Strategic capabilities: Multi-cloud PKI enables:

  • Cloud migration without vendor lock-in
  • Multi-cloud disaster recovery (failover between clouds)
  • Best-of-breed cloud selection (use AWS for X, Azure for Y)
  • Negotiating leverage with cloud providers
  • Hybrid cloud strategies (on-premises + multiple clouds)

ROI analysis:

Cloud-native approach (seems cheaper initially):

  • $0 upfront (ACM, Key Vault are "free")
  • $300K-$500K migration cost when expanding to second cloud
  • Ongoing operational complexity (multiple systems)
  • Vendor lock-in (weak negotiating position)

Multi-cloud PKI approach:

  • $50K-$150K upfront (HashiCorp Vault, cert-manager, automation)
  • $0 migration cost when expanding to additional clouds
  • Reduced operational complexity (single system)
  • No vendor lock-in (strong negotiating position)

Break-even: First cloud expansion or migration pays for multi-cloud PKI infrastructure

Executive summary: Multi-cloud PKI is strategic infrastructure investment enabling true cloud portability. Initial complexity cost is recovered first time you need to expand to second cloud or migrate between providers. Vendor lock-in is strategic risk with quantifiable cost.


When to Bring in Expertise

You can probably handle this yourself if:

  • Single-cloud deployment (multi-cloud not required)
  • Simple use cases (standard TLS only)
  • Using managed services (AWS ACM, Azure Key Vault)
  • Small scale (<500 certificates)
  • Have time to learn through iteration

Consider getting help if:

  • Expanding from single-cloud to multi-cloud
  • Need unified policy across clouds
  • Complex certificate requirements (service mesh, mutual TLS)
  • Large scale (5,000+ certificates across clouds)
  • Compliance requirements across environments

Definitely call us if:

  • Multi-cloud architecture from scratch (get it right from start)
  • Migration from cloud-native to multi-cloud PKI (complex)
  • Already have multi-cloud but inconsistent policies
  • Certificate-related outages in multi-cloud environment
  • Need unified monitoring and visibility across clouds

We've implemented multi-cloud PKI at Apex Capital (AWS + Azure + on-premises), Vortex (policy unification across clouds), and Nexus (centralized monitoring across heterogeneous environments). We know which architectures work in theory versus which survive multi-cloud operational reality.

ROI of expertise: Apex Capital could have avoided $300K + 6 months with proper initial architecture (consulting cost: $50K). Vortex could have avoided $200K policy unification with proper multi-cloud design from start (consulting cost: $40K). Nexus could have avoided outage with proper monitoring architecture (consulting cost: $20K). Pattern recognition prevents expensive mistakes.


Further Reading

Standards and Documentation

Tools and Projects


Last Updated: 2025-11-09
Maintenance Notes: Update cloud provider service features regularly (frequent changes), add new multi-cloud tools, expand service mesh patterns, track cloud pricing changes for PKI services