Cloud Spectra Gateway -- Architecture v1.0.0

How the gateway works under the hood, in your own AWS account, with no vendor control plane.

This page describes the internal architecture of Cloud Spectra Gateway at the level an enterprise architecture review expects: where the software runs, how packets and requests flow through it, how it is configured and operated, and how it isolates failure domains and scales. Cloud Spectra Gateway is delivered through the AWS Marketplace and deploys entirely inside your own AWS account.

Read alongside: the Quick Start walks through a first deployment, the User Guide covers per-feature configuration, and the FAQ answers common questions. This page focuses on the why and how behind the design.

1. Design principles

Cloud Spectra Gateway is built on a small number of principles that shape every other design decision. The product replaces metered AWS networking and LLM-API spend with a fixed EC2 cost -- Your Cloud, Off the Meter -- without introducing a vendor-operated control plane or moving your data out of your account.

In your own AWS account

The gateway is an EC2 appliance that runs in the customer's account, in the customer's VPC, under an IAM instance role that the customer can inspect. There is nothing to "connect to" outside your account boundary for the data plane to function.

No vendor control plane

There is no Cloud Spectra-operated SaaS backend that your traffic or configuration passes through. The management dashboard, the configuration API, and the data plane all run on the appliance itself; configuration is persisted to AWS SSM Parameter Store inside your account. If the vendor disappeared tomorrow, the gateway you already deployed keeps routing traffic.

graph LR
    subgraph TYPICAL["Typical SaaS network/AI gateway"]
        direction TB
        TC["Your VPC
workloads"] -->|traffic + config| VCP["Vendor control plane
(outside your account)"] VCP --> TINT["Internet / LLM APIs"] end subgraph CS["Cloud Spectra Gateway"] direction TB CC["Your VPC
workloads"] --> CGW["Cloud Spectra appliance
(EC2 in YOUR account)"] CGW --> CINT["Internet / LLM APIs"] CGW -.config.-> SSM["SSM Parameter Store
(YOUR account)"] end style VCP fill:#fecaca,stroke:#ef4444,color:#991b1b style CGW fill:#d1fae5,stroke:#10b981,color:#065f46 style SSM fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a

Data sovereignty

Because the appliance lives in your account, packets, proxied HTTP, firewall logs, and -- for the AI Gateway -- prompts, completions, and the cache all stay within your account's boundary. Outbound calls go directly from your appliance to their destination (the internet, or an LLM provider you configure). With local inference and an empty remote fallback, AI traffic can be kept entirely in-account.

Fixed cost vs metered

AWS managed networking services bill per-hour and per-GB. By running the equivalent functions on EC2 you pay for compute you control: a fixed instance cost (optionally on Spot) plus the Marketplace software fee, with no per-GB data-processing meter on the appliance itself.

PrincipleWhat it means architecturally
In your accountEC2 appliance in your VPC, your IAM role, your subnets
No vendor control planeDashboard + API + data plane all on the appliance; config in your SSM
Data sovereigntyTraffic, logs, prompts, and caches stay in your account
Fixed vs meteredInstance cost replaces per-GB data-processing meters

2. Deployment topology

A deployment consists of public-facing networking (an Elastic IP for a stable endpoint, a Gateway Load Balancer for horizontal scale) in front of a per-AZ fleet of appliance instances managed by EC2 Auto Scaling. Each Availability Zone runs its own Auto Scaling Group and egresses through its own elastic network interface (ENI), so steady-state traffic never crosses an AZ boundary and never incurs cross-AZ data charges.

New-VPC vs existing-VPC models

The product ships two base CloudFormation templates, plus a standalone-AMI path:

ModelWhat it provisionsWhen to use
New VPCA fresh VPC with public and private subnets, route tables, and an internet gateway, then the appliance fleetGreenfield deployments and evaluations
Existing / BYO VPCThe appliance fleet into subnets you already own; your route tables point at the per-AZ gateway ENIsProduction VPCs with established CIDR plans
Standalone AMIA single instance launched directly from the AMI; boots with NAT and dashboard, no CloudFormationQuick trials and minimal footprints

The full topology below shows a two-AZ existing-VPC deployment. Private workloads route through the gateway ENI in their own AZ; the appliances themselves egress to the internet via the public subnet.

graph TD
    EIP["Elastic IP
(stable endpoint)"] IGW["Internet Gateway"] GWLB["Gateway Load Balancer
(GENEVE)"] subgraph VPC["Customer VPC"] direction TB subgraph AZA["Availability Zone A"] direction TB PUBA["Public subnet A"] PRIA["Private subnet A"] ENIA["Gateway ENI A"] ASGA["Auto Scaling Group A
(1..N appliance instances)"] WLA["Private workloads A"] PRIA --> WLA WLA -->|"default route 0.0.0.0/0"| ENIA ENIA --- ASGA ASGA --- PUBA end subgraph AZB["Availability Zone B"] direction TB PUBB["Public subnet B"] PRIB["Private subnet B"] ENIB["Gateway ENI B"] ASGB["Auto Scaling Group B
(1..N appliance instances)"] WLB["Private workloads B"] PRIB --> WLB WLB -->|"default route 0.0.0.0/0"| ENIB ENIB --- ASGB ASGB --- PUBB end SSM["SSM Parameter Store
(configuration)"] end PUBA --> IGW PUBB --> IGW EIP --- IGW GWLB -. horizontal scale .- ASGA GWLB -. horizontal scale .- ASGB ASGA -.reads/writes config.-> SSM ASGB -.reads/writes config.-> SSM IGW --> INET["Internet / upstream APIs"] style EIP fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a style GWLB fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6 style ASGA fill:#d1fae5,stroke:#10b981,color:#065f46 style ASGB fill:#d1fae5,stroke:#10b981,color:#065f46 style SSM fill:#fef3c7,stroke:#f59e0b,color:#92400e
Per-AZ isolation by design: there is one Auto Scaling Group and one gateway ENI per Availability Zone. Workloads in AZ A send their default route to ENI A, workloads in AZ B to ENI B. Keeping each AZ's egress in-zone is what eliminates cross-AZ data-transfer charges in steady state.

3. Data plane

The data plane is where customer packets are forwarded, source-NATed, inspected, filtered, and load-balanced. It runs in the Linux kernel and in user-space services on each appliance instance. A private instance's outbound packet takes the following path.

flowchart LR
    SRC["Private instance
(in AZ A)"] -->|"default route"| ENI["Per-AZ gateway ENI A"] ENI --> NFT["nftables
(stateless allow/deny)"] NFT -->|"queued for inspection"| SUR["Suricata IDS/IPS
(inline, NFQUEUE)"] SUR -->|"verdict: accept"| PROXY{"Forward HTTP
proxy? (Squid)"} PROXY -->|"proxied + cached/filtered"| SNAT["Source NAT (sNAT)"] PROXY -->|"not proxied"| SNAT SNAT --> OUT["Public subnet -> IGW -> Internet"] SUR -.->|"verdict: drop"| DROP["Dropped + logged"] NFT -.->|"deny rule match"| DROP style ENI fill:#d1fae5,stroke:#10b981,color:#065f46 style NFT fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a style SUR fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6 style PROXY fill:#fef3c7,stroke:#f59e0b,color:#92400e style DROP fill:#fecaca,stroke:#ef4444,color:#991b1b

Outbound path stage by stage

  1. Default route to the per-AZ ENI. The private subnet's route table sends 0.0.0.0/0 to the gateway ENI in the same AZ.
  2. nftables. Stateless allow/deny rules (source/destination CIDR, port, protocol) are enforced in the Linux kernel before any further processing. Security
  3. Suricata. Accepted packets are handed inline to Suricata via NFQUEUE for intrusion detection/prevention; Suricata returns an accept or drop verdict at the data plane. Security
  4. Forward proxy (optional). If a workload is configured to use the Squid forward proxy, HTTP/HTTPS flows pass through it for authentication, response caching, domain filtering, and bandwidth limiting.
  5. Source NAT. The packet is source-NATed to the appliance's address and leaves via the public subnet and internet gateway, presenting the stable Elastic IP to the internet.

Return path

Reply traffic returns to the appliance's connection-tracking state, is reverse-NATed back to the originating private instance, and is delivered over the same per-AZ ENI. The kernel conntrack table keeps the flow pinned to the instance that established it, so a long-lived connection is handled coherently for its lifetime.

sequenceDiagram
    autonumber
    participant W as Private workload (AZ A)
    participant G as Appliance (AZ A)
    participant I as Internet endpoint
    W->>G: SYN to 0.0.0.0/0 via ENI A
    Note over G: nftables -> Suricata -> (proxy) -> sNAT
    G->>I: SYN (source = appliance / EIP)
    I-->>G: SYN-ACK
    Note over G: conntrack maps reply -> original flow
    G-->>W: SYN-ACK (reverse NAT to workload)
    W->>G: data ...
    G->>I: data ... (same flow, same instance)

Inbound and load-balanced paths

The data plane also supports inbound and in-appliance load balancing:

  • Destination NAT / port forwarding (dNAT) forwards inbound TCP to private targets.
  • In-appliance L4 load balancing uses Linux IPVS, kept in sync with an AWS Network Load Balancer target set.
  • TLS termination is handled by HAProxy on port 443 using an AWS Certificate Manager (ACM) certificate, with a redirect on port 80.
PortServiceRole in the data/control plane
443HTTPS dashboard (HAProxy/ACM)Management UI and TLS termination
8080Config APIREST configuration endpoint
8090AI GatewayOpenAI-compatible LLM endpoint AI
configurableSquid forward proxyOutbound HTTP proxy + caching
80RedirectHTTP-to-HTTPS redirect

4. Control plane

The control plane is how operators configure the gateway and how the fleet keeps itself consistent. It is entirely in-account: an Angular dashboard served over HTTPS, a configuration REST API, configuration persisted to SSM Parameter Store, and the per-AZ Auto Scaling Group lifecycle.

graph TD
    OP["Operator / Terraform"] -->|"HTTPS 443 (ACM TLS)"| DASH["Angular dashboard + Config API
(on the appliance)"] DASH -->|"reads/writes"| SSM["SSM Parameter Store
(desired configuration)"] SSM -->|"poll for changes"| INST["Appliance instances
(all AZs)"] INST -->|"apply"| DP["Data-plane services
(nftables, Suricata, Squid,
IPVS, HAProxy, sNAT/dNAT)"] ASG["Per-AZ Auto Scaling Groups"] -->|"launch / replace / scale"| INST INST -->|"new instance reads config on boot"| SSM style DASH fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a style SSM fill:#fef3c7,stroke:#f59e0b,color:#92400e style INST fill:#d1fae5,stroke:#10b981,color:#065f46 style ASG fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6

Dashboard and configuration API

The management dashboard is an Angular single-page application served over HTTPS, with TLS terminated by HAProxy on port 443 using an ACM certificate. Every dashboard action is backed by the configuration REST API (port 8080), so the same operations are scriptable. The Terraform provider (cloudspectra/cloudspectra, installed via a one-time ~/.terraformrc network-mirror block) drives that API as well, so the gateway can be managed as code.

Configuration in SSM Parameter Store

Desired configuration is the source of truth and is stored in SSM Parameter Store in your account. The dashboard and API write configuration there; appliance instances read it. This decoupling is what makes the fleet stateless: any instance can be replaced, and a freshly launched instance reads current configuration on boot and converges to it.

Per-AZ Auto Scaling Group lifecycle

Each AZ's Auto Scaling Group launches, health-checks, and replaces instances independently. Because desired state lives in SSM rather than on any single box, the lifecycle is simple: a new instance boots from the AMI, reads configuration, programs its data plane, and (behind the Gateway Load Balancer) begins taking traffic. A terminated instance is replaced without operator action.

Stateless fleet, durable config: the appliances hold no unique state that cannot be rebuilt from SSM. That is the property that lets the fleet scale out, scale in, and self-heal without a vendor control plane coordinating it.

5. AI Gateway architecture AI Gateway tier

The AI Gateway is an OpenAI-compatible reverse proxy for LLM traffic, exposed on port 8090. Clients point their OpenAI base URL at the gateway; it applies caching, meters tokens, writes an audit log, and routes each request to Amazon Bedrock, OpenAI, or Anthropic, or to an in-account local model served by vLLM. The interface follows the OpenAI API reference, so existing SDKs work by changing only the base URL.

Caching layers above routing

Two cache layers sit in front of routing, so the most expensive operation -- calling a model -- is skipped whenever possible:

  • Exact-match response cache: identical requests return a stored response with no upstream call.
  • Semantic cache: an embedding-based similar-prompt cache that raises hit rates beyond exact match by matching prompts that are equivalent in meaning, not just byte-identical.
flowchart TD
    REQ["Client request
(OpenAI-compatible, :8090)"] --> AUTH["Auth + token metering
+ audit log"] AUTH --> EXACT{"Exact-match
cache hit?"} EXACT -->|"yes"| HIT["Return cached response"] EXACT -->|"no"| SEM{"Semantic
cache hit?"} SEM -->|"yes"| HIT SEM -->|"no"| ROUTE["Routing layer"] ROUTE --> LOCAL{"model = local/*?"} LOCAL -->|"yes"| VLLM["Local vLLM
(in-account GPU)"] LOCAL -->|"no"| REMOTE["Remote provider
Bedrock / OpenAI / Anthropic"] VLLM -->|"pre-first-token failure"| OVF{"Overflow policy
queue / spill / reject"} OVF -->|"spill"| REMOTE OVF -->|"queue"| VLLM OVF -->|"reject"| ERR["Return error
(data stays in account)"] VLLM --> STORE["Store in caches"] REMOTE --> STORE STORE --> RESP["Response to client"] HIT --> RESP style EXACT fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a style SEM fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a style VLLM fill:#d1fae5,stroke:#10b981,color:#065f46 style REMOTE fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6 style OVF fill:#fef3c7,stroke:#f59e0b,color:#92400e style ERR fill:#fecaca,stroke:#ef4444,color:#991b1b

Routing and local inference

Remote models route to Bedrock, OpenAI, or Anthropic based on the requested model name. A model addressed as local/<model> is served in-account by vLLM on GPU instances, OpenAI-compatible like the rest. On a pre-first-token failure of the local model (for example, capacity pressure), an overflow policy governs what happens next:

Overflow policyBehavior on local pre-first-token failure
queueHold the request and wait for local capacity
spillFall back to a configured remote model
rejectReturn an error; nothing leaves the account
Data-sovereignty control: with an empty remote fallback (or the reject policy), AI traffic never leaves your account -- local inference handles it or the request fails closed. This is the architectural lever for teams that must keep prompts and completions in-account.

Cached vs uncached request, side by side

sequenceDiagram
    autonumber
    participant C as Client (OpenAI SDK)
    participant G as AI Gateway (:8090)
    participant K as Cache (exact + semantic)
    participant M as Model (Bedrock / OpenAI / Anthropic / local vLLM)
    Note over C,M: Uncached request
    C->>G: POST /v1/chat/completions
    G->>K: lookup (exact, then semantic)
    K-->>G: miss
    G->>M: forward request
    M-->>G: completion
    G->>K: store response
    G-->>C: completion (+ token usage)
    Note over C,M: Subsequent equivalent request
    C->>G: POST /v1/chat/completions
    G->>K: lookup
    K-->>G: hit
    G-->>C: cached completion (no model call)

6. High availability & scaling

Availability and scale come from three independent mechanisms: per-AZ isolation, horizontal scale behind the Gateway Load Balancer, and live vertical resize -- all fronted by a stable Elastic IP endpoint.

Per-AZ isolation

Each Availability Zone is its own failure domain: a dedicated subnet, gateway ENI, and Auto Scaling Group. An incident confined to one AZ does not take down egress for workloads in other AZs, and keeping each AZ's traffic in-zone avoids cross-AZ data charges.

Horizontal scale (GWLB)

Within an AZ, the Gateway Load Balancer distributes flows across the instances in that AZ's Auto Scaling Group using the GENEVE protocol. Adding instances increases aggregate throughput; the GWLB spreads flows so no single instance is a throughput ceiling for the AZ.

Vertical resize and the stable endpoint

Instance size can be changed live to give each instance more CPU and network bandwidth. Throughout scale-out, scale-in, and resize, the Elastic IP provides a stable public endpoint, so external dependencies (allow-lists, DNS, partner integrations) see one unchanging address.

graph TD
    EIP["Stable Elastic IP
(unchanging endpoint)"] --> GWLB["Gateway Load Balancer (GENEVE)"] GWLB --> I1["Instance 1"] GWLB --> I2["Instance 2"] GWLB --> I3["Instance N (scale out)"] subgraph SCALE["Scaling dimensions"] H["Horizontal: add/remove instances
(GWLB spreads flows)"] V["Vertical: resize instance type
(more CPU / bandwidth)"] Z["Per-AZ: independent ASG + ENI
per Availability Zone"] end GWLB -.- H I1 -.- V GWLB -.- Z style EIP fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a style GWLB fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6 style I1 fill:#d1fae5,stroke:#10b981,color:#065f46 style I2 fill:#d1fae5,stroke:#10b981,color:#065f46 style I3 fill:#d1fae5,stroke:#10b981,color:#065f46
MechanismFailure / scale property
Per-AZ ASG + ENIAZ-level fault isolation; in-zone egress avoids cross-AZ charges
Gateway Load BalancerHorizontal scale-out within an AZ; flows spread across instances
Auto Scaling GroupUnhealthy instances replaced automatically from the AMI + SSM config
Vertical resizeMore CPU/bandwidth per instance, changed live
Elastic IPOne stable endpoint across all scaling and replacement events

7. Security & IAM model

The security model follows from the design principles: least-privilege execution inside your account, data that stays in your account, and encryption in transit.

Least-privilege instance role

The appliance runs under an IAM instance role scoped to the AWS actions it actually needs -- for example, reading and writing its configuration parameters in SSM Parameter Store, managing the network interfaces and routes it operates, and, where enabled, accessing ACM for the TLS certificate and Bedrock for AI routing. The role is created in your account by the CloudFormation template and is fully visible to your security team for review.

graph LR
    ROLE["IAM instance role
(least privilege, in your account)"] --> SSMP["SSM Parameter Store
(read/write own config)"] ROLE --> NET["EC2 networking
(ENIs, routes it operates)"] ROLE --> ACMR["ACM
(TLS certificate, when enabled)"] ROLE --> BR["Amazon Bedrock
(AI routing, when enabled)"] style ROLE fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a style SSMP fill:#fef3c7,stroke:#f59e0b,color:#92400e style NET fill:#d1fae5,stroke:#10b981,color:#065f46 style ACMR fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6 style BR fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6

Cross-account / home-account role (base vs operational)

The least-privilege instance role above is intentionally minimal. From it the gateway boots, elects a leader, associates its Elastic IP, completes its Auto Scaling launch hook, and runs outbound source NAT on its primary interface. Most other capabilities are deliberately moved out of the base role into a separately deployed operational IAM role -- the cross-account / home-account role. The backend's single AWS access path assumes this operational role and, by design, refuses to fall back to the base instance role -- so a feature whose operational role is not deployed is simply inert until it is.

The operational stack creates one inline-policy role per feature, named <name>-cross-account-roleN (the gateway rebuilds the name at runtime to assume it). Every such role trusts only two principals in your home account -- the gateway instance role and the CloudFormation handler role -- gated by an external ID and, optionally, your AWS Organizations ID. The same template is deployed once in your home account and once in each member account you manage; the home-account stack additionally grants the gateway instance role permission to assume those roles. Permissions are updated over time through CloudFormation Change Sets the gateway stages and you execute -- never by the gateway editing IAM at runtime.

graph LR
    INST["Gateway instance role
(base, minimal)"] -->|"boot, NAT,
master EIP"| BASEOK["Boot-to-ready"] INST -->|"sts:AssumeRole
+ external ID"| OPS["Operational roles
<name>-cross-account-roleN"] CFH["CF handler role"] -->|"sts:AssumeRole"| OPS OPS --> FEAT["Most features
(NAT data plane, GWLB,
DNS, scaling, AI Gateway)"] OPS -.->|"member accounts"| SPOKE["Spoke account roles
(same template)"] style INST fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a style CFH fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a style BASEOK fill:#d1fae5,stroke:#10b981,color:#065f46 style OPS fill:#fef3c7,stroke:#f59e0b,color:#92400e style FEAT fill:#d1fae5,stroke:#10b981,color:#065f46 style SPOKE fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
Setting it up. See Cross-account / home-account IAM role in the User Guide for the deploy and update procedures.

In-account execution and where data lives

All processing happens on the appliance in your account. Configuration lives in your SSM Parameter Store; firewall and proxy logs, NAT state, and AI prompts/completions/caches reside on resources you own. There is no off-account control plane that your traffic or configuration transits.

Encryption in transit

The management dashboard and configuration API are served over HTTPS, with TLS terminated by HAProxy on port 443 using an ACM certificate; port 80 only redirects to HTTPS. For workload traffic, the gateway terminates TLS for the services you configure it to front, again using ACM-managed certificates.

ConcernHow Cloud Spectra addresses it
Execution boundaryEC2 appliance in your VPC under your IAM role; no vendor control plane
Least privilegeInstance role scoped to the actions the appliance needs, visible in your account
Data residencyConfig, logs, NAT state, and AI prompts/caches stay in your account
Encryption in transitHTTPS dashboard/API and workload TLS via ACM (HAProxy on 443); 80 redirects
Next steps: see the Quick Start to deploy, the User Guide for per-feature configuration, and the FAQ for common questions about cost, data residency, and operations.