为多云 Serverless 函数构建基于 Phoenix 的动态 mTLS 证书颁发与测试体系

后端架构

文章字数: 3.7k

阅读时长: 15 分

一个棘手的架构问题摆在面前：如何在一个混合云环境中，确保一个部署在传统服务器上的 Elixir/Phoenix 核心服务与多个部署在 Vercel 和 Google Cloud 上的 Serverless 函数之间的通信是绝对安全的。这些函数是短暂的、按需启动的，并且分布在不同的云提供商生态中。

传统的 API 密钥或 JWT 方案在这里显得力不从心。它们解决了应用层的认证与授权，但无法在传输层提供双向的、加密的身份验证。任何一个环节的网络配置失误或中间人攻击都可能导致灾难。我们需要一种更底层的、基于零信任原则的方案：双向 TLS 认证（mTLS）。

方案权衡：为何静态证书与云原生 IAM 均非最优解

在生产环境中，我们首先排除了两种看似可行的方案。

方案A：静态长周期证书

这是最直接的想法：为每个 Serverless 函数生成一个客户端证书，将其作为环境变量或 Secret 注入到函数运行时中。

优点: 实现简单，逻辑清晰。
缺点: 这在运维上是一场灾难。证书轮换（rotation）几乎无法自动化。一旦某个函数的私钥泄露，撤销过程将非常痛苦，且存在巨大的安全窗口。在拥有成百上千个函数的系统中，手动管理这些证书的生命周期是不可想象的。这违背了 Serverless 架构所追求的弹性与自动化精神。

方案B：云提供商原生 IAM

另一种思路是利用各云平台自身的身份认证机制。例如，Google Cloud Functions 可以被授予一个服务账户（Service Account），并能生成 OIDC 令牌来证明自己的身份。Vercel 也有类似的环境变量注入机制。

优点: 在单一云生态内，这是非常安全且推荐的做法。
缺点: 我们的场景是跨云的。让 Phoenix 服务去验证一个 Google Cloud 的 OIDC 令牌，同时又要验证来自 Vercel 的某种身份凭证，将导致认证逻辑变得极其复杂和脆弱。每增加一个新的云提供商，认证代码都需要重构。这种方案破坏了架构的统一性和可移植性，将身份验证的复杂性推给了应用层，而不是在传输层统一解决。

最终选择：基于 Phoenix 的动态短周期证书颁发中心

我们最终采纳的架构是，将 Phoenix 应用本身提升为一个轻量级的证书颁发机构（CA）代理。它负责为每一个冷启动的 Serverless 函数动态签发一个生命周期极短（例如，仅有几小时）的客户端证书。

这个架构的核心流程如下：

引导信任: 每个 Serverless 函数在部署时，被注入一个唯一的、高熵的引导令牌（Bootstrap Token）。这个令牌是一次性的，或者有严格的访问控制。
证书请求: 函数实例冷启动时，它使用引导令牌向 Phoenix CA 服务的一个特定端点发起请求，申请客户端证书。
签发与分发: Phoenix CA 服务验证引导令牌，验证通过后，动态生成一对密钥和证书签名请求（CSR），并使用其根 CA 私钥进行签名，最后将生成的客户端证书、私钥和 CA 证书链返回给函数。
建立mTLS连接: 函数在内存中加载这些凭证，并用它们来初始化其 HTTP 客户端。之后，所有对 Phoenix 服务或其他需要 mTLS 保护的服务的调用，都将使用这个短周期证书进行双向认证。

sequenceDiagram
    participant Vercel/GCF as Serverless Function
    participant PhoenixCA as Phoenix CA Service
    participant ProtectedSvc as Protected Phoenix Service

    Vercel/GCF->>+PhoenixCA: POST /issue-certificate (携带 Bootstrap Token)
    Note right of PhoenixCA: 1. 验证 Token 
 2. 生成密钥对与 CSR 
 3. 使用根 CA 签名
    PhoenixCA-->>-Vercel/GCF: 200 OK (返回短周期证书, 私钥, CA链)
    
    Vercel/GCF->>Vercel/GCF: 在内存中加载证书和私钥
    
    Note over Vercel/GCF, ProtectedSvc: 建立 mTLS 连接
    Vercel/GCF->>+ProtectedSvc: GET /api/secure-data (使用客户端证书)
    ProtectedSvc->>ProtectedSvc: 验证客户端证书是否由信任的 CA 签发
    ProtectedSvc-->>-Vercel/GCF: 200 OK (返回安全数据)

这个方案的优势是显而易见的：

安全性: 证书生命周期极短，大大缩减了密钥泄露的风险窗口。
自动化: 整个过程完全自动化，无需人工干预证书的生命周期管理。
平台无关: mTLS 是开放标准，此模式可应用于任何计算环境，无论是 Serverless、容器还是虚拟机。

核心实现：Phoenix CA 服务

我们将使用 Elixir 的 :public_key 和 x509 库来实现 CA 的核心功能。首先，我们需要一个根 CA 的证书和私钥。在生产环境中，这必须存储在 HSM 或 Vault 等安全系统中。为了演示，我们假设它们被安全地加载到应用配置中。

1. 路由与控制器

我们需要一个端点来处理证书签发请求。

# lib/my_app_web/router.ex
scope "/internal/pki", MyAppWeb do
  pipe_through [:api]
  post "/issue-certificate", CertificateController, :issue
end

# lib/my_app_web/controllers/certificate_controller.ex
defmodule MyAppWeb.CertificateController do
  use MyAppWeb, :controller

  # 假设我们有一个服务模块来处理引导令牌的验证
  alias MyApp.Auth.BootstrapTokenService
  # 证书颁发的核心逻辑
  alias MyApp.PKI.CertificateAuthority

  def issue(conn, %{"bootstrap_token" => token}) do
    case BootstrapTokenService.validate_and_get_identity(token) do
      {:ok, identity} ->
        # identity 可以包含函数名、环境等信息，用于生成证书的 Subject
        case CertificateAuthority.issue_short_lived_cert(identity) do
          {:ok, cert_data} ->
            json(conn, cert_data)
          {:error, reason} ->
            conn
            |> put_status(:internal_server_error)
            |> json(%{error: "certificate_issuance_failed", reason: reason})
        end

      {:error, :not_found} ->
        conn
        |> put_status(:unauthorized)
        |> json(%{error: "invalid_bootstrap_token"})
    end
  end

  def issue(conn, _params) do
    conn
    |> put_status(:bad_request)
    |> json(%{error: "missing_bootstrap_token"})
  end
end

2. 证书颁发核心逻辑

这是整个方案的心脏。CertificateAuthority 模块负责所有加密操作。

# lib/my_app/pki/certificate_authority.ex
defmodule MyApp.PKI.CertificateAuthority do
  @moduledoc """
  负责动态生成和签发短周期客户端证书
  """

  # 在真实项目中，这些配置应来自加密的配置源
  @root_ca_cert_pem Application.compile_env!(:my_app, [__MODULE__, :root_ca_cert])
  @root_ca_key_pem Application.compile_env!(:my_app, [__MODULE__, :root_ca_key])
  # 证书有效期，例如 4 小时
  @cert_validity_seconds 4 * 60 * 60

  def issue_short_lived_cert(identity) do
    # 步骤 1: 动态为客户端生成一个新的密钥对
    client_key = :public_key.generate_key({:ecdh, :secp256r1})

    # 步骤 2: 构造证书的 Subject 和其他属性
    # 'identity' 应该是一个 map，例如 %{cn: "vercel-function-123", ou: "production"}
    subject = [
      {:commonName, to_charlist(identity.cn)},
      {:organizationalUnitName, to_charlist(identity.ou)},
      {:organizationName, 'My Company'}
    ]

    # 从根 CA 证书中获取签发者信息
    {:ok, root_ca_cert_der} = :public_key.pem_decode(@root_ca_cert_pem)
    [root_ca_entry] = root_ca_cert_der
    {'OTPCertificate', root_ca_tbs, _, _} = root_ca_entry
    {'TBSCertificate', _, _, issuer, _, _, _, _, _, _, _, _} = root_ca_tbs

    # 步骤 3: 创建待签名的证书（TBSCertificate）
    tbs_cert = build_tbs_certificate(subject, client_key, issuer)

    # 步骤 4: 使用根 CA 私钥进行签名
    {:ok, root_ca_key_decoded} = :public_key.pem_decode(@root_ca_key_pem)
    [root_ca_key_entry] = root_ca_key_decoded
    
    signature = :public_key.sign(tbs_cert, :sha256, root_ca_key_entry)

    # 步骤 5: 组装最终的 X.509 证书
    {'AlgorithmIdentifier', oid, _} = elem(root_ca_tbs, 2)
    signed_cert = {'Certificate', tbs_cert, {'AlgorithmIdentifier', oid, nil}, signature}
    
    # 步骤 6: 将证书和私钥编码为 PEM 格式返回
    cert_pem = :public_key.pem_encode([signed_cert])
    key_pem = :public_key.pem_encode([{:ECPrivateKey, client_key, {:explicit, :secp256r1}}])

    {:ok,
     %{
       certificate: cert_pem,
       private_key: key_pem,
       ca_chain: @root_ca_cert_pem
     }}
  rescue
    e -> {:error, {__MODULE__, :signing_error, e}}
  end

  defp build_tbs_certificate(subject, client_key, issuer) do
    serial_number = :crypto.strong_rand_bytes(20) |> :binary.decode_unsigned()
    now = :calendar.universal_time()
    not_before = now
    not_after = :calendar.seconds_to_gregorian_seconds(now)
    |> Kernel.+( @cert_validity_seconds)
    |> :calendar.gregorian_seconds_to_datetime()

    # X.509 v3 扩展
    extensions = [
      {'Extension', :id_ce_basicConstraints, true, {:extnValue, %{cA: false}}},
      {'Extension', :id_ce_keyUsage, true, {:extnValue, [:digitalSignature, :keyEncipherment]}},
      {'Extension', :id_ce_extKeyUsage, true, {:extnValue, [:id_kp_clientAuth]}}
    ]

    {'TBSCertificate',
     :v3,
     serial_number,
     {'AlgorithmIdentifier', :id_ecdsa_with_sha256, nil},
     issuer,
     {'Validity', not_before, not_after},
     subject,
     {:SubjectPublicKeyInfo, {'AlgorithmIdentifier', :id_ecPublicKey, :secp256r1}, client_key},
     nil,
     nil,
     extensions
    }
  end
end

这段代码非常底层，直接与 Erlang/OTP 的加密模块交互。在真实项目中，可以考虑使用 x509 这样的 Hex 包来简化操作，但这里的原生实现能更好地揭示其工作原理。

Serverless 函数端实现

Serverless 函数（无论是 Vercel 还是 Google Cloud Functions，这里以 Node.js 为例）的核心逻辑是在冷启动时获取并使用证书。

// common/mtls_client.ts
import https from 'https';
import axios from 'axios';

// 在函数环境中，这些变量需要被安全地设置
const PHOENIX_CA_URL = process.env.PHOENIX_CA_URL!;
const BOOTSTRAP_TOKEN = process.env.BOOTSTRAP_TOKEN!;

interface CertificatePayload {
  certificate: string;
  private_key: string;
  ca_chain: string;
}

// 缓存证书，避免在函数 warm start 时重复获取
let httpsAgent: https.Agent | null = null;

async function getHttpsAgent(): Promise<https.Agent> {
  if (httpsAgent) {
    // 日志：使用缓存的 mTLS Agent
    console.log("Using cached mTLS agent.");
    return httpsAgent;
  }

  try {
    console.log("Cold start: Fetching new mTLS certificate...");
    // 步骤 1: 使用引导令牌请求证书
    const response = await axios.post<CertificatePayload>(
      `${PHOENIX_CA_URL}/internal/pki/issue-certificate`,
      { bootstrap_token: BOOTSTRAP_TOKEN },
      { timeout: 5000 } // 设置超时，防止冷启动挂起
    );

    const { certificate, private_key, ca_chain } = response.data;
    
    // 步骤 2: 创建并缓存 HTTPS Agent
    httpsAgent = new https.Agent({
      cert: certificate,
      key: private_key,
      ca: ca_chain,
      keepAlive: true, // 保持连接以提高性能
    });
    
    console.log("Successfully created and cached new mTLS agent.");
    return httpsAgent;

  } catch (error) {
    console.error("Failed to fetch or create mTLS agent:", error);
    // 关键错误：如果证书获取失败，函数应该快速失败
    throw new Error("mTLS certificate acquisition failed.");
  }
}

// 导出一个配置好的 axios 实例
export async function createMtlsClient() {
  const agent = await getHttpsAgent();
  return axios.create({ httpsAgent: agent });
}

// Vercel Function 示例
// api/secure-proxy.ts
import { VercelRequest, VercelResponse } from '@vercel/node';
import { createMtlsClient } from '../common/mtls_client';

export default async (req: VercelRequest, res: VercelResponse) => {
  try {
    const apiClient = await createMtlsClient();
    // 使用 mTLS 客户端请求受保护的 Phoenix 服务
    const phoenixResponse = await apiClient.get('https://api.my-phoenix-app.com/v1/secure-data');
    
    res.status(200).json(phoenixResponse.data);
  } catch (error) {
    console.error("Error during mTLS request to Phoenix:", error);
    res.status(502).json({ error: 'bad_gateway' });
  }
};

这段 TypeScript 代码展示了函数的启动逻辑：获取证书、创建 https.Agent 并将其缓存以供后续的 warm invocations 使用。这是对性能至关重要的优化。

测试体系的构建

测试这个分布式安全系统是最大的挑战。我们不能在测试环境中依赖真实的 CA 服务或 Serverless 平台。我们需要在 Phoenix 应用内部构建一个完全自洽的、可控的测试环境。

目标： 在集成测试中，验证一个受 mTLS 保护的 Phoenix 端点能够正确地拒绝无效证书的请求，并接受由我们信任的（测试）CA 签发的有效证书的请求。

策略： 我们将创建一个仅在 :test 环境下编译的 TestCA 模块。这个模块使用一个独立的、仅用于测试的根 CA 证书来签发客户端证书。测试用例将直接调用这个模块来获取证书，然后用它来配置 HTTP 客户端并发起请求。

1. 配置 Cowboy (Phoenix Web Server) 以要求 mTLS

首先，我们需要配置 Phoenix 的底层 Web 服务器 Cowboy 来强制要求客户端证书。

# config/runtime.ex
if config_env() == :prod do
  config :my_app, MyAppWeb.Endpoint,
    https: [
      # ... 其他 https 配置
      port: 443,
      certfile: System.get_env("SERVER_CERT_PATH"),
      keyfile: System.get_env("SERVER_KEY_PATH"),
      # 要求客户端证书，并使用我们的根 CA 进行验证
      cacertfile: System.get_env("ROOT_CA_CERT_PATH"),
      verify: :verify_peer,
      fail_if_no_peer_cert: true
    ]
end

2. 构建测试专用的 CA 模块

# test/support/test_ca.ex
defmodule MyApp.Test.TestCA do
  @moduledoc """
  一个仅用于测试环境的证书颁发机构。
  它使用一个独立的、不安全的测试根CA。
  """
  
  # 这些证书文件仅存在于 test/fixtures/pki 目录中
  @test_root_ca_cert_pem File.read!("test/fixtures/pki/test_ca.crt")
  @test_root_ca_key_pem File.read!("test/fixtures/pki/test_ca.key")

  # 使用 `MyApp.PKI.CertificateAuthority` 的逻辑，但传入测试 CA 凭证
  def issue_client_cert_for_test(cn \\ "test-client") do
    # 简化版的 issue_short_lived_cert, 仅用于测试
    # 在真实项目中，可以重构 CertificateAuthority 模块使其更易于测试
    client_key = :public_key.generate_key({:ecdh, :secp256r1})

    subject = [{:commonName, to_charlist(cn)}]
    
    {:ok, root_ca_cert_der} = :public_key.pem_decode(@test_root_ca_cert_pem)
    [root_ca_entry] = root_ca_cert_der
    {'OTPCertificate', root_ca_tbs, _, _} = root_ca_entry
    {'TBSCertificate', _, _, issuer, _, _, _, _, _, _, _, _} = root_ca_tbs
    
    tbs_cert = MyApp.PKI.CertificateAuthority.build_tbs_certificate(subject, client_key, issuer)

    {:ok, root_ca_key_decoded} = :public_key.pem_decode(@test_root_ca_key_pem)
    [root_ca_key_entry] = root_ca_key_decoded
    
    signature = :public_key.sign(tbs_cert, :sha256, root_ca_key_entry)
    
    {'AlgorithmIdentifier', oid, _} = elem(root_ca_tbs, 2)
    signed_cert = {'Certificate', tbs_cert, {'AlgorithmIdentifier', oid, nil}, signature}
    
    cert_pem = :public_key.pem_encode([signed_cert])
    key_pem = :public_key.pem_encode([{:ECPrivateKey, client_key, {:explicit, :secp256r1}}])

    %{
      cert: cert_pem,
      key: key_pem,
      cacert: @test_root_ca_cert_pem
    }
  end
end

3. 编写集成测试

现在我们可以编写一个集成测试，来验证受保护的端点。我们将使用 Tesla 作为 HTTP 客户端，因为它能很方便地配置 mTLS 选项。

# test/my_app_web/controllers/secure_data_controller_test.exs
defmodule MyAppWeb.SecureDataControllerTest do
  use MyAppWeb.ConnCase, async: true
  alias MyApp.Test.TestCA

  # 这个测试需要在 endpoint_case.ex 中配置 https
  @endpoint_opts [
    https: [
      port: 4001,
      # 使用测试服务器证书
      certfile: "test/fixtures/pki/test_server.crt",
      keyfile: "test/fixtures/pki/test_server.key",
      # 使用测试 CA 验证客户端
      cacertfile: "test/fixtures/pki/test_ca.crt",
      verify: :verify_peer,
      fail_if_no_peer_cert: true
    ]
  ]
  use MyAppWeb.EndpointCase, opts: @endpoint_opts

  describe "GET /api/secure-data" do
    test "拒绝没有客户端证书的请求" do
      # 创建一个没有 mTLS 配置的客户端
      client = Tesla.client([], {Tesla.Adapter.Hackney, []})

      assert_raise Tesla.Error, fn ->
        Tesla.get(client, "https://localhost:4001/api/secure-data")
      end
    end

    test "使用有效的客户端证书成功访问" do
      # 步骤 1: 在测试中动态生成一个客户端证书
      creds = TestCA.issue_client_cert_for_test()

      # 步骤 2: 配置 Tesla 客户端使用这个证书
      adapter_opts = [
        ssl_options: [
          cert: to_charlist(creds.cert),
          key: {:ECPrivateKey, :public_key.pem_decode(creds.key)},
          cacert: to_charlist(creds.cacert)
        ]
      ]
      client = Tesla.client([], {Tesla.Adapter.Hackney, adapter_opts})
      
      # 步骤 3: 发起请求并断言成功
      {:ok, response} = Tesla.get(client, "https://localhost:4001/api/secure-data")
      assert response.status == 200
      assert response.body["data"] == "this is secret"
    end
  end
end

这个测试完美地模拟了整个流程：配置一个强制 mTLS 的服务器端点，在测试运行时动态生成一个客户端证书，然后使用该证书成功地通过了服务器的验证。它为我们这套复杂的安全体系提供了坚实的质量保障。

架构的局限性与未来展望

这套基于 Phoenix 的动态 mTLS 架构虽然解决了跨云 Serverless 安全通信的核心痛点，但它并非没有权衡。

首要的局限性在于，Phoenix CA 服务自身成为了一个高价值目标和潜在的单点故障。必须采用多实例、异地部署等高可用策略来保障其稳定性，并且其引导令牌数据库和根 CA 私钥的安全性是整个体系的基石，需要最高级别的保护。

其次，函数冷启动的延迟会因为增加了一次网络调用（证书申请）而略有上升。虽然可以通过内存缓存来优化 warm start，但在对延迟极度敏感的场景下，需要评估这个额外的开销。

展望未来，此架构可以进一步演进。例如，引入标准的 SPIFFE/SPIRE 框架来规范化工作负载身份的识别与证书签发流程，使之更具通用性。此外，还可以实现证书吊销列表（CRL）或在线证书状态协议（OCSP），来处理引导令牌或函数身份被撤销的场景，从而构建一个更加完备和健壮的零信任网络环境。

mTLS Google Cloud Functions Vercel Functions 测试 Phoenix

构建从Oracle到云端MariaDB的实时数据同步管道：基于Debezium与Kafka的生产实践

2023-10-27 数据工程

CDC MariaDB 云服务商 Oracle Debezium Kafka

构建一个管理 Qdrant 与 CV 工作负载的 K8s Operator：从 etcd 一致性到 Jotai 前端状态同步

2023-10-27 云原生

etcd Qdrant CV Kubernetes Jotai