Cloud Spectra Gateway - Architecture

This page describes the internal architecture of Cloud Spectra Gateway at the level an enterprise architecture review expects: where the software runs, how packets and requests flow through it, how it is configured and operated, and how it isolates failure domains and scales. Cloud Spectra Gateway is delivered through the AWS Marketplace and deploys entirely inside your own AWS account.

Read alongside: the Quick Start walks through a first deployment, the User Guide covers per-feature configuration, and the FAQ answers common questions. This page focuses on the why and how behind the design.

1. Design principles

Cloud Spectra Gateway is built on a small number of principles that shape every other design decision. The product replaces metered AWS networking and LLM-API spend with a fixed EC2 cost -- Your Cloud, Off the Meter -- without introducing a vendor-operated control plane or moving your data out of your account.

In your own AWS account

The gateway is an EC2 appliance that runs in the customer's account, in the customer's VPC, under an IAM instance role that the customer can inspect. There is nothing to "connect to" outside your account boundary for the data plane to function.

No vendor control plane

There is no Cloud Spectra-operated SaaS backend that your traffic or configuration passes through. The management dashboard, the configuration API, and the data plane all run on the appliance itself; configuration is persisted to AWS SSM Parameter Store inside your account. If the vendor disappeared tomorrow, the gateway you already deployed keeps routing traffic.

graph LR
    subgraph TYPICAL["Typical SaaS network/AI gateway"]
        direction TB
        TC["Your VPC
workloads"] -->|traffic + config| VCP["Vendor control plane
(outside your account)"]
        VCP --> TINT["Internet / LLM APIs"]
    end
    subgraph CS["Cloud Spectra Gateway"]
        direction TB
        CC["Your VPC
workloads"] --> CGW["Cloud Spectra appliance
(EC2 in YOUR account)"]
        CGW --> CINT["Internet / LLM APIs"]
        CGW -.config.-> SSM["SSM Parameter Store
(YOUR account)"]
    end
    style VCP fill:#fecaca,stroke:#ef4444,color:#991b1b
    style CGW fill:#d1fae5,stroke:#10b981,color:#065f46
    style SSM fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a

Data sovereignty

Because the appliance lives in your account, packets, proxied HTTP, firewall logs, and -- for the AI Gateway -- prompts, completions, and the cache all stay within your account's boundary. Outbound calls go directly from your appliance to their destination (the internet, or an LLM provider you configure). With local inference and an empty remote fallback, AI traffic can be kept entirely in-account.

Fixed cost vs metered

AWS managed networking services bill per-hour and per-GB. By running the equivalent functions on EC2 you pay for compute you control: a fixed instance cost (optionally on Spot) plus the Marketplace software fee, with no per-GB data-processing meter on the appliance itself.

Principle	What it means architecturally
In your account	EC2 appliance in your VPC, your IAM role, your subnets
No vendor control plane	Dashboard + API + data plane all on the appliance; config in your SSM
Data sovereignty	Traffic, logs, prompts, and caches stay in your account
Fixed vs metered	Instance cost replaces per-GB data-processing meters

2. Deployment topology

A deployment consists of public-facing networking (an Elastic IP for a stable endpoint, a Gateway Load Balancer for horizontal scale) in front of a per-AZ fleet of appliance instances managed by EC2 Auto Scaling. Each Availability Zone runs its own Auto Scaling Group and egresses through its own elastic network interface (ENI), so steady-state traffic never crosses an AZ boundary and never incurs cross-AZ data charges.

New-VPC vs existing-VPC models

The product ships two base CloudFormation templates, plus a standalone-AMI path:

Model	What it provisions	When to use
New VPC	A fresh VPC with public and private subnets, route tables, and an internet gateway, then the appliance fleet	Greenfield deployments and evaluations
Existing / BYO VPC	The appliance fleet into subnets you already own; your route tables point at the per-AZ gateway ENIs	Production VPCs with established CIDR plans
Standalone AMI	A single instance launched directly from the AMI; boots with NAT and dashboard, no CloudFormation	Quick trials and minimal footprints

The full topology below shows a two-AZ existing-VPC deployment. Private workloads route through the gateway ENI in their own AZ; the appliances themselves egress to the internet via the public subnet.

graph TD
    EIP["Elastic IP
(stable endpoint)"]
    IGW["Internet Gateway"]
    GWLB["Gateway Load Balancer
(GENEVE)"]

    subgraph VPC["Customer VPC"]
        direction TB
        subgraph AZA["Availability Zone A"]
            direction TB
            PUBA["Public subnet A"]
            PRIA["Private subnet A"]
            ENIA["Gateway ENI A"]
            ASGA["Auto Scaling Group A
(1..N appliance instances)"]
            WLA["Private workloads A"]
            PRIA --> WLA
            WLA -->|"default route 0.0.0.0/0"| ENIA
            ENIA --- ASGA
            ASGA --- PUBA
        end
        subgraph AZB["Availability Zone B"]
            direction TB
            PUBB["Public subnet B"]
            PRIB["Private subnet B"]
            ENIB["Gateway ENI B"]
            ASGB["Auto Scaling Group B
(1..N appliance instances)"]
            WLB["Private workloads B"]
            PRIB --> WLB
            WLB -->|"default route 0.0.0.0/0"| ENIB
            ENIB --- ASGB
            ASGB --- PUBB
        end
        SSM["SSM Parameter Store
(configuration)"]
    end

    PUBA --> IGW
    PUBB --> IGW
    EIP --- IGW
    GWLB -. horizontal scale .- ASGA
    GWLB -. horizontal scale .- ASGB
    ASGA -.reads/writes config.-> SSM
    ASGB -.reads/writes config.-> SSM
    IGW --> INET["Internet / upstream APIs"]

    style EIP fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    style GWLB fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
    style ASGA fill:#d1fae5,stroke:#10b981,color:#065f46
    style ASGB fill:#d1fae5,stroke:#10b981,color:#065f46
    style SSM fill:#fef3c7,stroke:#f59e0b,color:#92400e

Per-AZ isolation by design: there is one Auto Scaling Group and one gateway ENI per Availability Zone. Workloads in AZ A send their default route to ENI A, workloads in AZ B to ENI B. Keeping each AZ's egress in-zone is what eliminates cross-AZ data-transfer charges in steady state.

3. Data plane

The data plane is where customer packets are forwarded, source-NATed, inspected, filtered, and load-balanced. It runs in the Linux kernel and in user-space services on each appliance instance. A private instance's outbound packet takes the following path.

flowchart LR
    SRC["Private instance
(in AZ A)"] -->|"default route"| ENI["Per-AZ gateway ENI A"]
    ENI --> NFT["nftables
(stateless allow/deny)"]
    NFT -->|"queued for inspection"| SUR["Suricata IDS/IPS
(inline, NFQUEUE)"]
    SUR -->|"verdict: accept"| PROXY{"Forward HTTP
proxy? (Squid)"}
    PROXY -->|"proxied + cached/filtered"| SNAT["Source NAT (sNAT)"]
    PROXY -->|"not proxied"| SNAT
    SNAT --> OUT["Public subnet -> IGW -> Internet"]
    SUR -.->|"verdict: drop"| DROP["Dropped + logged"]
    NFT -.->|"deny rule match"| DROP

    style ENI fill:#d1fae5,stroke:#10b981,color:#065f46
    style NFT fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    style SUR fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
    style PROXY fill:#fef3c7,stroke:#f59e0b,color:#92400e
    style DROP fill:#fecaca,stroke:#ef4444,color:#991b1b

Outbound path stage by stage

Default route to the per-AZ ENI. The private subnet's route table sends 0.0.0.0/0 to the gateway ENI in the same AZ.
nftables. Stateless allow/deny rules (source/destination CIDR, port, protocol) are enforced in the Linux kernel before any further processing. Security
Suricata. Accepted packets are handed inline to Suricata via NFQUEUE for intrusion detection/prevention; Suricata returns an accept or drop verdict at the data plane. Security
Forward proxy (optional). If a workload is configured to use the Squid forward proxy, HTTP/HTTPS flows pass through it for authentication, response caching, domain filtering, and bandwidth limiting.
Source NAT. The packet is source-NATed to the appliance's address and leaves via the public subnet and internet gateway, presenting the stable Elastic IP to the internet.

Return path

Reply traffic returns to the appliance's connection-tracking state, is reverse-NATed back to the originating private instance, and is delivered over the same per-AZ ENI. The kernel conntrack table keeps the flow pinned to the instance that established it, so a long-lived connection is handled coherently for its lifetime.

sequenceDiagram
    autonumber
    participant W as Private workload (AZ A)
    participant G as Appliance (AZ A)
    participant I as Internet endpoint
    W->>G: SYN to 0.0.0.0/0 via ENI A
    Note over G: nftables -> Suricata -> (proxy) -> sNAT
    G->>I: SYN (source = appliance / EIP)
    I-->>G: SYN-ACK
    Note over G: conntrack maps reply -> original flow
    G-->>W: SYN-ACK (reverse NAT to workload)
    W->>G: data ...
    G->>I: data ... (same flow, same instance)

Inbound and load-balanced paths

The data plane also supports inbound and in-appliance load balancing:

Destination NAT / port forwarding (dNAT) forwards inbound TCP to private targets.
In-appliance L4 load balancing uses Linux IPVS, kept in sync with an AWS Network Load Balancer target set.
TLS termination is handled by HAProxy on port 443 using an AWS Certificate Manager (ACM) certificate, with a redirect on port 80.

Port	Service	Role in the data/control plane
`443`	HTTPS dashboard (HAProxy/ACM)	Management UI and TLS termination
`8080`	Config API	REST configuration endpoint
`8090`	AI Gateway	OpenAI-compatible LLM endpoint AI
configurable	Squid forward proxy	Outbound HTTP proxy + caching
`80`	Redirect	HTTP-to-HTTPS redirect

4. Control plane

The control plane is how operators configure the gateway and how the fleet keeps itself consistent. It is entirely in-account: an Angular dashboard served over HTTPS, a configuration REST API, configuration persisted to SSM Parameter Store, and the per-AZ Auto Scaling Group lifecycle.

graph TD
    OP["Operator / Terraform"] -->|"HTTPS 443 (ACM TLS)"| DASH["Angular dashboard + Config API
(on the appliance)"]
    DASH -->|"reads/writes"| SSM["SSM Parameter Store
(desired configuration)"]
    SSM -->|"poll for changes"| INST["Appliance instances
(all AZs)"]
    INST -->|"apply"| DP["Data-plane services
(nftables, Suricata, Squid,
IPVS, HAProxy, sNAT/dNAT)"]
    ASG["Per-AZ Auto Scaling Groups"] -->|"launch / replace / scale"| INST
    INST -->|"new instance reads config on boot"| SSM

    style DASH fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    style SSM fill:#fef3c7,stroke:#f59e0b,color:#92400e
    style INST fill:#d1fae5,stroke:#10b981,color:#065f46
    style ASG fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6

Dashboard and configuration API

The management dashboard is an Angular single-page application served over HTTPS, with TLS terminated by HAProxy on port 443 using an ACM certificate. Every dashboard action is backed by the configuration REST API (port 8080), so the same operations are scriptable. The Terraform provider (cloudspectra/cloudspectra, installed via a one-time ~/.terraformrc network-mirror block) drives that API as well, so the gateway can be managed as code.

Configuration in SSM Parameter Store

Desired configuration is the source of truth and is stored in SSM Parameter Store in your account. The dashboard and API write configuration there; appliance instances read it. This decoupling is what makes the fleet stateless: any instance can be replaced, and a freshly launched instance reads current configuration on boot and converges to it.

Per-AZ Auto Scaling Group lifecycle

Each AZ's Auto Scaling Group launches, health-checks, and replaces instances independently. Because desired state lives in SSM rather than on any single box, the lifecycle is simple: a new instance boots from the AMI, reads configuration, programs its data plane, and (behind the Gateway Load Balancer) begins taking traffic. A terminated instance is replaced without operator action.

Stateless fleet, durable config: the appliances hold no unique state that cannot be rebuilt from SSM. That is the property that lets the fleet scale out, scale in, and self-heal without a vendor control plane coordinating it.

5. AI Gateway architecture AI Gateway tier

The AI Gateway is an OpenAI-compatible reverse proxy for LLM traffic, exposed on port 8090. Clients point their OpenAI base URL at the gateway; it applies caching, meters tokens, writes an audit log, and routes each request to Amazon Bedrock, OpenAI, or Anthropic, or to an in-account local model served by vLLM. The interface follows the OpenAI API reference, so existing SDKs work by changing only the base URL.

Caching layers above routing

Two cache layers sit in front of routing, so the most expensive operation -- calling a model -- is skipped whenever possible:

Exact-match response cache: identical requests return a stored response with no upstream call.
Semantic cache: an embedding-based similar-prompt cache that raises hit rates beyond exact match by matching prompts that are equivalent in meaning, not just byte-identical.

flowchart TD
    REQ["Client request
(OpenAI-compatible, :8090)"] --> AUTH["Auth + token metering
+ audit log"]
    AUTH --> EXACT{"Exact-match
cache hit?"}
    EXACT -->|"yes"| HIT["Return cached response"]
    EXACT -->|"no"| SEM{"Semantic
cache hit?"}
    SEM -->|"yes"| HIT
    SEM -->|"no"| ROUTE["Routing layer"]
    ROUTE --> LOCAL{"model = local/*?"}
    LOCAL -->|"yes"| VLLM["Local vLLM
(in-account GPU)"]
    LOCAL -->|"no"| REMOTE["Remote provider
Bedrock / OpenAI / Anthropic"]
    VLLM -->|"pre-first-token failure"| OVF{"Overflow policy
queue / spill / reject"}
    OVF -->|"spill"| REMOTE
    OVF -->|"queue"| VLLM
    OVF -->|"reject"| ERR["Return error
(data stays in account)"]
    VLLM --> STORE["Store in caches"]
    REMOTE --> STORE
    STORE --> RESP["Response to client"]
    HIT --> RESP

    style EXACT fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    style SEM fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    style VLLM fill:#d1fae5,stroke:#10b981,color:#065f46
    style REMOTE fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
    style OVF fill:#fef3c7,stroke:#f59e0b,color:#92400e
    style ERR fill:#fecaca,stroke:#ef4444,color:#991b1b

Routing and local inference

Remote models route to Bedrock, OpenAI, or Anthropic based on the requested model name. A model addressed as local/<model> is served in-account by vLLM on GPU instances, OpenAI-compatible like the rest. On a pre-first-token failure of the local model (for example, capacity pressure), an overflow policy governs what happens next:

Overflow policy	Behavior on local pre-first-token failure
`queue`	Hold the request and wait for local capacity
`spill`	Fall back to a configured remote model
`reject`	Return an error; nothing leaves the account

Data-sovereignty control: with an empty remote fallback (or the reject policy), AI traffic never leaves your account -- local inference handles it or the request fails closed. This is the architectural lever for teams that must keep prompts and completions in-account.

Cached vs uncached request, side by side

sequenceDiagram
    autonumber
    participant C as Client (OpenAI SDK)
    participant G as AI Gateway (:8090)
    participant K as Cache (exact + semantic)
    participant M as Model (Bedrock / OpenAI / Anthropic / local vLLM)
    Note over C,M: Uncached request
    C->>G: POST /v1/chat/completions
    G->>K: lookup (exact, then semantic)
    K-->>G: miss
    G->>M: forward request
    M-->>G: completion
    G->>K: store response
    G-->>C: completion (+ token usage)
    Note over C,M: Subsequent equivalent request
    C->>G: POST /v1/chat/completions
    G->>K: lookup
    K-->>G: hit
    G-->>C: cached completion (no model call)

6. High availability & scaling

Availability and scale come from three independent mechanisms: per-AZ isolation, horizontal scale behind the Gateway Load Balancer, and live vertical resize -- all fronted by a stable Elastic IP endpoint.

Per-AZ isolation

Each Availability Zone is its own failure domain: a dedicated subnet, gateway ENI, and Auto Scaling Group. An incident confined to one AZ does not take down egress for workloads in other AZs, and keeping each AZ's traffic in-zone avoids cross-AZ data charges.

Horizontal scale (GWLB)

Within an AZ, the Gateway Load Balancer distributes flows across the instances in that AZ's Auto Scaling Group using the GENEVE protocol. Adding instances increases aggregate throughput; the GWLB spreads flows so no single instance is a throughput ceiling for the AZ.

Vertical resize and the stable endpoint

Instance size can be changed live to give each instance more CPU and network bandwidth. Throughout scale-out, scale-in, and resize, the Elastic IP provides a stable public endpoint, so external dependencies (allow-lists, DNS, partner integrations) see one unchanging address.

graph TD
    EIP["Stable Elastic IP
(unchanging endpoint)"] --> GWLB["Gateway Load Balancer (GENEVE)"]
    GWLB --> I1["Instance 1"]
    GWLB --> I2["Instance 2"]
    GWLB --> I3["Instance N (scale out)"]
    subgraph SCALE["Scaling dimensions"]
        H["Horizontal: add/remove instances
(GWLB spreads flows)"]
        V["Vertical: resize instance type
(more CPU / bandwidth)"]
        Z["Per-AZ: independent ASG + ENI
per Availability Zone"]
    end
    GWLB -.- H
    I1 -.- V
    GWLB -.- Z
    style EIP fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    style GWLB fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
    style I1 fill:#d1fae5,stroke:#10b981,color:#065f46
    style I2 fill:#d1fae5,stroke:#10b981,color:#065f46
    style I3 fill:#d1fae5,stroke:#10b981,color:#065f46

Mechanism	Failure / scale property
Per-AZ ASG + ENI	AZ-level fault isolation; in-zone egress avoids cross-AZ charges
Gateway Load Balancer	Horizontal scale-out within an AZ; flows spread across instances
Auto Scaling Group	Unhealthy instances replaced automatically from the AMI + SSM config
Vertical resize	More CPU/bandwidth per instance, changed live
Elastic IP	One stable endpoint across all scaling and replacement events

7. Security & IAM model

The security model follows from the design principles: least-privilege execution inside your account, data that stays in your account, and encryption in transit.

Least-privilege instance role

The appliance runs under an IAM instance role scoped to the AWS actions it actually needs -- for example, reading and writing its configuration parameters in SSM Parameter Store, managing the network interfaces and routes it operates, and, where enabled, accessing ACM for the TLS certificate and Bedrock for AI routing. The role is created in your account by the CloudFormation template and is fully visible to your security team for review.

graph LR
    ROLE["IAM instance role
(least privilege, in your account)"] --> SSMP["SSM Parameter Store
(read/write own config)"]
    ROLE --> NET["EC2 networking
(ENIs, routes it operates)"]
    ROLE --> ACMR["ACM
(TLS certificate, when enabled)"]
    ROLE --> BR["Amazon Bedrock
(AI routing, when enabled)"]
    style ROLE fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    style SSMP fill:#fef3c7,stroke:#f59e0b,color:#92400e
    style NET fill:#d1fae5,stroke:#10b981,color:#065f46
    style ACMR fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6
    style BR fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6

Cross-account / home-account role (base vs operational)

The least-privilege instance role above is intentionally minimal. From it the gateway boots, elects a leader, associates its Elastic IP, completes its Auto Scaling launch hook, and runs outbound source NAT on its primary interface. Most other capabilities are deliberately moved out of the base role into a separately deployed operational IAM role -- the cross-account / home-account role. The backend's single AWS access path assumes this operational role and, by design, refuses to fall back to the base instance role -- so a feature whose operational role is not deployed is simply inert until it is.

The operational stack creates one inline-policy role per feature, named <name>-cross-account-roleN (the gateway rebuilds the name at runtime to assume it). Every such role trusts only two principals in your home account -- the gateway instance role and the CloudFormation handler role -- gated by an external ID and, optionally, your AWS Organizations ID. The same template is deployed once in your home account and once in each member account you manage; the home-account stack additionally grants the gateway instance role permission to assume those roles. Permissions are updated over time through CloudFormation Change Sets the gateway stages and you execute -- never by the gateway editing IAM at runtime.

graph LR
    INST["Gateway instance role
(base, minimal)"] -->|"boot, NAT,
master EIP"| BASEOK["Boot-to-ready"]
    INST -->|"sts:AssumeRole
+ external ID"| OPS["Operational roles
<name>-cross-account-roleN"]
    CFH["CF handler role"] -->|"sts:AssumeRole"| OPS
    OPS --> FEAT["Most features
(NAT data plane, GWLB,
DNS, scaling, AI Gateway)"]
    OPS -.->|"member accounts"| SPOKE["Spoke account roles
(same template)"]
    style INST fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    style CFH fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    style BASEOK fill:#d1fae5,stroke:#10b981,color:#065f46
    style OPS fill:#fef3c7,stroke:#f59e0b,color:#92400e
    style FEAT fill:#d1fae5,stroke:#10b981,color:#065f46
    style SPOKE fill:#ede9fe,stroke:#8b5cf6,color:#5b21b6

Setting it up. See Cross-account / home-account IAM role in the User Guide for the deploy and update procedures.

In-account execution and where data lives

All processing happens on the appliance in your account. Configuration lives in your SSM Parameter Store; firewall and proxy logs, NAT state, and AI prompts/completions/caches reside on resources you own. There is no off-account control plane that your traffic or configuration transits.

Encryption in transit

The management dashboard and configuration API are served over HTTPS, with TLS terminated by HAProxy on port 443 using an ACM certificate; port 80 only redirects to HTTPS. For workload traffic, the gateway terminates TLS for the services you configure it to front, again using ACM-managed certificates.

Concern	How Cloud Spectra addresses it
Execution boundary	EC2 appliance in your VPC under your IAM role; no vendor control plane
Least privilege	Instance role scoped to the actions the appliance needs, visible in your account
Data residency	Config, logs, NAT state, and AI prompts/caches stay in your account
Encryption in transit	HTTPS dashboard/API and workload TLS via ACM (HAProxy on 443); 80 redirects

Next steps: see the Quick Start to deploy, the User Guide for per-feature configuration, and the FAQ for common questions about cost, data residency, and operations.

Cloud Spectra Gateway -- Architecture v1.0.0

1. Design principles

In your own AWS account

No vendor control plane

Data sovereignty

Fixed cost vs metered

2. Deployment topology

New-VPC vs existing-VPC models

3. Data plane

Outbound path stage by stage

Return path

Inbound and load-balanced paths

4. Control plane

Dashboard and configuration API

Configuration in SSM Parameter Store

Per-AZ Auto Scaling Group lifecycle

5. AI Gateway architecture AI Gateway tier

Caching layers above routing

Routing and local inference

Cached vs uncached request, side by side

6. High availability & scaling

Per-AZ isolation

Horizontal scale (GWLB)

Vertical resize and the stable endpoint

7. Security & IAM model

Least-privilege instance role

Cross-account / home-account role (base vs operational)

In-account execution and where data lives

Encryption in transit