This page describes the internal architecture of Cloud Spectra Gateway at the level an enterprise architecture review expects: where the software runs, how packets and requests flow through it, how it is configured and operated, and how it isolates failure domains and scales. Cloud Spectra Gateway is delivered through the AWS Marketplace and deploys entirely inside your own AWS account.
1. Design principles
Cloud Spectra Gateway is built on a small number of principles that shape every other design decision. The product replaces metered AWS networking and LLM-API spend with a fixed EC2 cost -- Your Cloud, Off the Meter -- without introducing a vendor-operated control plane or moving your data out of your account.
In your own AWS account
The gateway is an EC2 appliance that runs in the customer's account, in the customer's VPC, under an IAM instance role that the customer can inspect. There is nothing to "connect to" outside your account boundary for the data plane to function.
No vendor control plane
There is no Cloud Spectra-operated SaaS backend that your traffic or configuration passes through. The management dashboard, the configuration API, and the data plane all run on the appliance itself; configuration is persisted to AWS SSM Parameter Store inside your account. If the vendor disappeared tomorrow, the gateway you already deployed keeps routing traffic.
graph LR
subgraph TYPICAL["Typical SaaS network/AI gateway"]
direction TB
TC["Your VPC
workloads"] -->|traffic + config| VCP["Vendor control plane
(outside your account)"]
VCP --> TINT["Internet / LLM APIs"]
end
subgraph CS["Cloud Spectra Gateway"]
direction TB
CC["Your VPC
workloads"] --> CGW["Cloud Spectra appliance
(EC2 in YOUR account)"]
CGW --> CINT["Internet / LLM APIs"]
CGW -.config.-> SSM["SSM Parameter Store
(YOUR account)"]
end
style VCP fill:#fecaca,stroke:#ef4444,color:#991b1b
style CGW fill:#d1fae5,stroke:#10b981,color:#065f46
style SSM fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
Data sovereignty
Because the appliance lives in your account, packets, proxied HTTP, firewall logs, and -- for the AI Gateway -- prompts, completions, and the cache all stay within your account's boundary. Outbound calls go directly from your appliance to their destination (the internet, or an LLM provider you configure). With local inference and an empty remote fallback, AI traffic can be kept entirely in-account.
Fixed cost vs metered
AWS managed networking services bill per-hour and per-GB. By running the equivalent functions on EC2 you pay for compute you control: a fixed instance cost (optionally on Spot) plus the Marketplace software fee, with no per-GB data-processing meter on the appliance itself.
| Principle | What it means architecturally |
|---|---|
| In your account | EC2 appliance in your VPC, your IAM role, your subnets |
| No vendor control plane | Dashboard + API + data plane all on the appliance; config in your SSM |
| Data sovereignty | Traffic, logs, prompts, and caches stay in your account |
| Fixed vs metered | Instance cost replaces per-GB data-processing meters |
2. Deployment topology
A deployment consists of public-facing networking (an Elastic IP for a stable endpoint, a Gateway Load Balancer for horizontal scale) in front of a per-AZ fleet of appliance instances managed by EC2 Auto Scaling. Each Availability Zone runs its own Auto Scaling Group and egresses through its own elastic network interface (ENI), so steady-state traffic never crosses an AZ boundary and never incurs cross-AZ data charges.
New-VPC vs existing-VPC models
The product ships two base CloudFormation templates, plus a standalone-AMI path:
| Model | What it provisions | When to use |
|---|---|---|
| New VPC | A fresh VPC with public and private subnets, route tables, and an internet gateway, then the appliance fleet | Greenfield deployments and evaluations |
| Existing / BYO VPC | The appliance fleet into subnets you already own; your route tables point at the per-AZ gateway ENIs | Production VPCs with established CIDR plans |
| Standalone AMI | A single instance launched directly from the AMI; boots with NAT and dashboard, no CloudFormation | Quick trials and minimal footprints |
The full topology below shows a two-AZ existing-VPC deployment. Private workloads route through the gateway ENI in their own AZ; the appliances themselves egress to the internet via the public subnet.
graph TD
EIP["Elastic IP
(stable endpoint)"]
IGW["Internet Gateway"]
GWLB["Gateway Load Balancer
(GENEVE)"]
subgraph VPC["Customer VPC"]
direction TB
subgraph AZA["Availability Zone A"]
direction TB
PUBA["Public subnet A"]
PRIA["Private subnet A"]
ENIA["Gateway ENI A"]
ASGA["Auto Scaling Group A
(1..N appliance instances)"]
WLA["Private workloads A"]
PRIA --> WLA
WLA -->|"default route 0.0.0.0/0"| ENIA
ENIA --- ASGA
ASGA --- PUBA
end
subgraph AZB["Availability Zone B"]
direction TB
PUBB["Public subnet B"]
PRIB["Private subnet B"]
ENIB["Gateway ENI B"]
ASGB["Auto Scaling Group B
(1..N appliance instances)"]
WLB["Private workloads B"]
PRIB --> WLB
WLB -->|"default route 0.0.0.0/0"| ENIB
ENIB --- ASGB
ASGB --- PUBB
end
SSM["SSM Parameter Store
(configuration)"]
end
PUBA --> IGW
PUBB --> IGW
EIP --- IGW
GWLB -. horizontal scale .- ASGA
GWLB -. horizontal scale .- ASGB
ASGA -.reads/writes config.-> SSM
ASGB -.reads/writes config.-> SSM
IGW --> INET["Internet / upstream APIs"]
style EIP fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
style GWLB fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
style ASGA fill:#d1fae5,stroke:#10b981,color:#065f46
style ASGB fill:#d1fae5,stroke:#10b981,color:#065f46
style SSM fill:#fef3c7,stroke:#f59e0b,color:#92400e
3. Data plane
The data plane is where customer packets are forwarded, source-NATed, inspected, filtered, and load-balanced. It runs in the Linux kernel and in user-space services on each appliance instance. A private instance's outbound packet takes the following path.
flowchart LR
SRC["Private instance
(in AZ A)"] -->|"default route"| ENI["Per-AZ gateway ENI A"]
ENI --> NFT["nftables
(stateless allow/deny)"]
NFT -->|"queued for inspection"| SUR["Suricata IDS/IPS
(inline, NFQUEUE)"]
SUR -->|"verdict: accept"| PROXY{"Forward HTTP
proxy? (Squid)"}
PROXY -->|"proxied + cached/filtered"| SNAT["Source NAT (sNAT)"]
PROXY -->|"not proxied"| SNAT
SNAT --> OUT["Public subnet -> IGW -> Internet"]
SUR -.->|"verdict: drop"| DROP["Dropped + logged"]
NFT -.->|"deny rule match"| DROP
style ENI fill:#d1fae5,stroke:#10b981,color:#065f46
style NFT fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
style SUR fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
style PROXY fill:#fef3c7,stroke:#f59e0b,color:#92400e
style DROP fill:#fecaca,stroke:#ef4444,color:#991b1b
Outbound path stage by stage
- Default route to the per-AZ ENI. The private subnet's route table sends
0.0.0.0/0to the gateway ENI in the same AZ. - nftables. Stateless allow/deny rules (source/destination CIDR, port, protocol) are enforced in the Linux kernel before any further processing. Security
- Suricata. Accepted packets are handed inline to Suricata via NFQUEUE for intrusion detection/prevention; Suricata returns an accept or drop verdict at the data plane. Security
- Forward proxy (optional). If a workload is configured to use the Squid forward proxy, HTTP/HTTPS flows pass through it for authentication, response caching, domain filtering, and bandwidth limiting.
- Source NAT. The packet is source-NATed to the appliance's address and leaves via the public subnet and internet gateway, presenting the stable Elastic IP to the internet.
Return path
Reply traffic returns to the appliance's connection-tracking state, is reverse-NATed back to the originating private instance, and is delivered over the same per-AZ ENI. The kernel conntrack table keeps the flow pinned to the instance that established it, so a long-lived connection is handled coherently for its lifetime.
sequenceDiagram
autonumber
participant W as Private workload (AZ A)
participant G as Appliance (AZ A)
participant I as Internet endpoint
W->>G: SYN to 0.0.0.0/0 via ENI A
Note over G: nftables -> Suricata -> (proxy) -> sNAT
G->>I: SYN (source = appliance / EIP)
I-->>G: SYN-ACK
Note over G: conntrack maps reply -> original flow
G-->>W: SYN-ACK (reverse NAT to workload)
W->>G: data ...
G->>I: data ... (same flow, same instance)
Inbound and load-balanced paths
The data plane also supports inbound and in-appliance load balancing:
- Destination NAT / port forwarding (dNAT) forwards inbound TCP to private targets.
- In-appliance L4 load balancing uses Linux IPVS, kept in sync with an AWS Network Load Balancer target set.
- TLS termination is handled by HAProxy on port 443 using an AWS Certificate Manager (ACM) certificate, with a redirect on port 80.
| Port | Service | Role in the data/control plane |
|---|---|---|
443 | HTTPS dashboard (HAProxy/ACM) | Management UI and TLS termination |
8080 | Config API | REST configuration endpoint |
8090 | AI Gateway | OpenAI-compatible LLM endpoint AI |
| configurable | Squid forward proxy | Outbound HTTP proxy + caching |
80 | Redirect | HTTP-to-HTTPS redirect |
4. Control plane
The control plane is how operators configure the gateway and how the fleet keeps itself consistent. It is entirely in-account: an Angular dashboard served over HTTPS, a configuration REST API, configuration persisted to SSM Parameter Store, and the per-AZ Auto Scaling Group lifecycle.
graph TD
OP["Operator / Terraform"] -->|"HTTPS 443 (ACM TLS)"| DASH["Angular dashboard + Config API
(on the appliance)"]
DASH -->|"reads/writes"| SSM["SSM Parameter Store
(desired configuration)"]
SSM -->|"poll for changes"| INST["Appliance instances
(all AZs)"]
INST -->|"apply"| DP["Data-plane services
(nftables, Suricata, Squid,
IPVS, HAProxy, sNAT/dNAT)"]
ASG["Per-AZ Auto Scaling Groups"] -->|"launch / replace / scale"| INST
INST -->|"new instance reads config on boot"| SSM
style DASH fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
style SSM fill:#fef3c7,stroke:#f59e0b,color:#92400e
style INST fill:#d1fae5,stroke:#10b981,color:#065f46
style ASG fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
Dashboard and configuration API
The management dashboard is an Angular single-page application served over HTTPS, with TLS terminated by HAProxy on port 443 using an ACM certificate. Every dashboard action is backed by the configuration REST API (port 8080), so the same operations are scriptable. The Terraform provider (cloudspectra/cloudspectra, installed via a one-time ~/.terraformrc network-mirror block) drives that API as well, so the gateway can be managed as code.
Configuration in SSM Parameter Store
Desired configuration is the source of truth and is stored in SSM Parameter Store in your account. The dashboard and API write configuration there; appliance instances read it. This decoupling is what makes the fleet stateless: any instance can be replaced, and a freshly launched instance reads current configuration on boot and converges to it.
Per-AZ Auto Scaling Group lifecycle
Each AZ's Auto Scaling Group launches, health-checks, and replaces instances independently. Because desired state lives in SSM rather than on any single box, the lifecycle is simple: a new instance boots from the AMI, reads configuration, programs its data plane, and (behind the Gateway Load Balancer) begins taking traffic. A terminated instance is replaced without operator action.
5. AI Gateway architecture AI Gateway tier
The AI Gateway is an OpenAI-compatible reverse proxy for LLM traffic, exposed on port 8090. Clients point their OpenAI base URL at the gateway; it applies caching, meters tokens, writes an audit log, and routes each request to Amazon Bedrock, OpenAI, or Anthropic, or to an in-account local model served by vLLM. The interface follows the OpenAI API reference, so existing SDKs work by changing only the base URL.
Caching layers above routing
Two cache layers sit in front of routing, so the most expensive operation -- calling a model -- is skipped whenever possible:
- Exact-match response cache: identical requests return a stored response with no upstream call.
- Semantic cache: an embedding-based similar-prompt cache that raises hit rates beyond exact match by matching prompts that are equivalent in meaning, not just byte-identical.
flowchart TD
REQ["Client request
(OpenAI-compatible, :8090)"] --> AUTH["Auth + token metering
+ audit log"]
AUTH --> EXACT{"Exact-match
cache hit?"}
EXACT -->|"yes"| HIT["Return cached response"]
EXACT -->|"no"| SEM{"Semantic
cache hit?"}
SEM -->|"yes"| HIT
SEM -->|"no"| ROUTE["Routing layer"]
ROUTE --> LOCAL{"model = local/*?"}
LOCAL -->|"yes"| VLLM["Local vLLM
(in-account GPU)"]
LOCAL -->|"no"| REMOTE["Remote provider
Bedrock / OpenAI / Anthropic"]
VLLM -->|"pre-first-token failure"| OVF{"Overflow policy
queue / spill / reject"}
OVF -->|"spill"| REMOTE
OVF -->|"queue"| VLLM
OVF -->|"reject"| ERR["Return error
(data stays in account)"]
VLLM --> STORE["Store in caches"]
REMOTE --> STORE
STORE --> RESP["Response to client"]
HIT --> RESP
style EXACT fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
style SEM fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
style VLLM fill:#d1fae5,stroke:#10b981,color:#065f46
style REMOTE fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
style OVF fill:#fef3c7,stroke:#f59e0b,color:#92400e
style ERR fill:#fecaca,stroke:#ef4444,color:#991b1b
Routing and local inference
Remote models route to Bedrock, OpenAI, or Anthropic based on the requested model name. A model addressed as local/<model> is served in-account by vLLM on GPU instances, OpenAI-compatible like the rest. On a pre-first-token failure of the local model (for example, capacity pressure), an overflow policy governs what happens next:
| Overflow policy | Behavior on local pre-first-token failure |
|---|---|
queue | Hold the request and wait for local capacity |
spill | Fall back to a configured remote model |
reject | Return an error; nothing leaves the account |
reject policy), AI traffic never leaves your account -- local inference handles it or the request fails closed. This is the architectural lever for teams that must keep prompts and completions in-account.
Cached vs uncached request, side by side
sequenceDiagram
autonumber
participant C as Client (OpenAI SDK)
participant G as AI Gateway (:8090)
participant K as Cache (exact + semantic)
participant M as Model (Bedrock / OpenAI / Anthropic / local vLLM)
Note over C,M: Uncached request
C->>G: POST /v1/chat/completions
G->>K: lookup (exact, then semantic)
K-->>G: miss
G->>M: forward request
M-->>G: completion
G->>K: store response
G-->>C: completion (+ token usage)
Note over C,M: Subsequent equivalent request
C->>G: POST /v1/chat/completions
G->>K: lookup
K-->>G: hit
G-->>C: cached completion (no model call)
6. High availability & scaling
Availability and scale come from three independent mechanisms: per-AZ isolation, horizontal scale behind the Gateway Load Balancer, and live vertical resize -- all fronted by a stable Elastic IP endpoint.
Per-AZ isolation
Each Availability Zone is its own failure domain: a dedicated subnet, gateway ENI, and Auto Scaling Group. An incident confined to one AZ does not take down egress for workloads in other AZs, and keeping each AZ's traffic in-zone avoids cross-AZ data charges.
Horizontal scale (GWLB)
Within an AZ, the Gateway Load Balancer distributes flows across the instances in that AZ's Auto Scaling Group using the GENEVE protocol. Adding instances increases aggregate throughput; the GWLB spreads flows so no single instance is a throughput ceiling for the AZ.
Vertical resize and the stable endpoint
Instance size can be changed live to give each instance more CPU and network bandwidth. Throughout scale-out, scale-in, and resize, the Elastic IP provides a stable public endpoint, so external dependencies (allow-lists, DNS, partner integrations) see one unchanging address.
graph TD
EIP["Stable Elastic IP
(unchanging endpoint)"] --> GWLB["Gateway Load Balancer (GENEVE)"]
GWLB --> I1["Instance 1"]
GWLB --> I2["Instance 2"]
GWLB --> I3["Instance N (scale out)"]
subgraph SCALE["Scaling dimensions"]
H["Horizontal: add/remove instances
(GWLB spreads flows)"]
V["Vertical: resize instance type
(more CPU / bandwidth)"]
Z["Per-AZ: independent ASG + ENI
per Availability Zone"]
end
GWLB -.- H
I1 -.- V
GWLB -.- Z
style EIP fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
style GWLB fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
style I1 fill:#d1fae5,stroke:#10b981,color:#065f46
style I2 fill:#d1fae5,stroke:#10b981,color:#065f46
style I3 fill:#d1fae5,stroke:#10b981,color:#065f46
| Mechanism | Failure / scale property |
|---|---|
| Per-AZ ASG + ENI | AZ-level fault isolation; in-zone egress avoids cross-AZ charges |
| Gateway Load Balancer | Horizontal scale-out within an AZ; flows spread across instances |
| Auto Scaling Group | Unhealthy instances replaced automatically from the AMI + SSM config |
| Vertical resize | More CPU/bandwidth per instance, changed live |
| Elastic IP | One stable endpoint across all scaling and replacement events |
7. Security & IAM model
The security model follows from the design principles: least-privilege execution inside your account, data that stays in your account, and encryption in transit.
Least-privilege instance role
The appliance runs under an IAM instance role scoped to the AWS actions it actually needs -- for example, reading and writing its configuration parameters in SSM Parameter Store, managing the network interfaces and routes it operates, and, where enabled, accessing ACM for the TLS certificate and Bedrock for AI routing. The role is created in your account by the CloudFormation template and is fully visible to your security team for review.
graph LR
ROLE["IAM instance role
(least privilege, in your account)"] --> SSMP["SSM Parameter Store
(read/write own config)"]
ROLE --> NET["EC2 networking
(ENIs, routes it operates)"]
ROLE --> ACMR["ACM
(TLS certificate, when enabled)"]
ROLE --> BR["Amazon Bedrock
(AI routing, when enabled)"]
style ROLE fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
style SSMP fill:#fef3c7,stroke:#f59e0b,color:#92400e
style NET fill:#d1fae5,stroke:#10b981,color:#065f46
style ACMR fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
style BR fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
Cross-account / home-account role (base vs operational)
The least-privilege instance role above is intentionally minimal. From it the gateway boots, elects a leader, associates its Elastic IP, completes its Auto Scaling launch hook, and runs outbound source NAT on its primary interface. Most other capabilities are deliberately moved out of the base role into a separately deployed operational IAM role -- the cross-account / home-account role. The backend's single AWS access path assumes this operational role and, by design, refuses to fall back to the base instance role -- so a feature whose operational role is not deployed is simply inert until it is.
The operational stack creates one inline-policy role per feature, named <name>-cross-account-roleN (the gateway rebuilds the name at runtime to assume it). Every such role trusts only two principals in your home account -- the gateway instance role and the CloudFormation handler role -- gated by an external ID and, optionally, your AWS Organizations ID. The same template is deployed once in your home account and once in each member account you manage; the home-account stack additionally grants the gateway instance role permission to assume those roles. Permissions are updated over time through CloudFormation Change Sets the gateway stages and you execute -- never by the gateway editing IAM at runtime.
graph LR
INST["Gateway instance role
(base, minimal)"] -->|"boot, NAT,
master EIP"| BASEOK["Boot-to-ready"]
INST -->|"sts:AssumeRole
+ external ID"| OPS["Operational roles
<name>-cross-account-roleN"]
CFH["CF handler role"] -->|"sts:AssumeRole"| OPS
OPS --> FEAT["Most features
(NAT data plane, GWLB,
DNS, scaling, AI Gateway)"]
OPS -.->|"member accounts"| SPOKE["Spoke account roles
(same template)"]
style INST fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
style CFH fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
style BASEOK fill:#d1fae5,stroke:#10b981,color:#065f46
style OPS fill:#fef3c7,stroke:#f59e0b,color:#92400e
style FEAT fill:#d1fae5,stroke:#10b981,color:#065f46
style SPOKE fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
In-account execution and where data lives
All processing happens on the appliance in your account. Configuration lives in your SSM Parameter Store; firewall and proxy logs, NAT state, and AI prompts/completions/caches reside on resources you own. There is no off-account control plane that your traffic or configuration transits.
Encryption in transit
The management dashboard and configuration API are served over HTTPS, with TLS terminated by HAProxy on port 443 using an ACM certificate; port 80 only redirects to HTTPS. For workload traffic, the gateway terminates TLS for the services you configure it to front, again using ACM-managed certificates.
| Concern | How Cloud Spectra addresses it |
|---|---|
| Execution boundary | EC2 appliance in your VPC under your IAM role; no vendor control plane |
| Least privilege | Instance role scoped to the actions the appliance needs, visible in your account |
| Data residency | Config, logs, NAT state, and AI prompts/caches stay in your account |
| Encryption in transit | HTTPS dashboard/API and workload TLS via ACM (HAProxy on 443); 80 redirects |