Stop Putting API Keys in Your Client. (And the Hidden Trap in Gemini 3.5 Flash)

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Stop Putting API Keys in Your Client. (And the Hidden Trap in Gemini 3.5 Flash)

Architectural depth, slate HUD design systems, and why a naive migration to Google’s latest API will silently cripple your agentic loops.

Suman Suhag

Jun 13, 2026

Hey friends,

Let’s be completely honest: the barrier to entry for building AI apps has dropped to zero, and it’s creating some terrifying engineering habits.

With the rise of “vibe coding,” I am seeing production-level code repos with process.env.GEMINI_API_KEY sitting straight in the React client source. If your frontend can read the key, an attacker can extract it from the network tab in less than five seconds.

I wanted to build a localized developer workspace that subverts the generic, lazy SaaS templates we see everywhere, while implementing actual, defensive backend engineering. Enter the Gemini Lab Workspace a high-contrast, slate-terminal HUD sandbox designed for rapid prototyping, secure agentic execution, and zero client-side leakage.

Here is the full architectural breakdown, along with a massive warning about migrating to the new gemini-3.5-flash model.

1. The Architecture: Server-Side Isolation

To keep the application secure, I standardized a strict Node-Express API gateway to act as an orchestrated proxy.

The client never talks to Google. Instead, the workspace UI streams requests to our proxy, which utilizes the official @google/genai SDK entirely on the server. The GEMINI_API_KEY is completely isolated behind our firewall. The server processes the request, wraps the model parameters, and channels the stream back down to our workspace’s custom inline markdown engine.

2. The API Controversy: The Hidden `thinking_level` Trap

We built this workspace to leverage the newly released Gemini 3.5 Flash model. It’s incredibly fast for sub-agent deployment and raw code generation, but there is a massive catch that almost nobody is talking about right now.

If you are migrating your app from the older gemini-3-flash-preview endpoint, your code is silently breaking.

Google replaced the old integer-based thinking_budget with a new string enum called thinking_level (minimal, low, medium, high). Here is the kicker: The default level has been silently dropped from high to medium.

If you just swap the model ID in your code without explicitly configuring the SDK to high, your application is silently reasoning less than your old preview code did, while you get billed more per token. In my workspace, I explicitly exposed this parameter to a system UI slider so I can audit quality vs. token latency in real-time.

Prompt ──> Slate HUD Client ──> Node Proxy [Exposing thinking_level=”high”] ──> @google/genai SDK

3. Engineering the Slate HUD Sandbox

A powerful engine deserves an immersive interface. I opted out of flat, boring white-label UI blocks for a heavy-duty, tactical High-Contrast Slate HUD:

The Frame: Engineered using eye-safe dark slate, textured coal panels, and extensive negative space. A system telemetry HUD tracks live UTC runtime and real-time model configs.
The Draftboard: A spacious document canvas featuring automatic save states and a custom “Insert AI” tool that lets me pipe streamed suggestions straight into my active schema layouts.
State Persistence: The workspace includes an interactive sprint checklist that completely persists across hard browser refreshes using localized client storage—ensuring zero context switching during deep dev sessions.

The entire stack is strictly typed, validated against rigorous compiler checks, and passes tsc --noEmit cleanly for production.

The “vibe coding” era is fun until your API billing dashboard shows a $10,000 compromise. How are you isolating your model configurations in your sandboxes? Let’s fight it out in the comments.