The Layer 3 Switch That Ate My Firewall

It started, as these things always do, with "I can't get to Karakeep."

Not a server crash. Not a disk failure. Not even a misconfigured reverse proxy. Just a bookmark manager that worked yesterday and didn't work today. The kind of problem that should take five minutes and ends up rearranging your entire network.

The Asymmetric Routing Problem

I'd enabled Layer 3 switching on my UniFi switch. The idea is simple and appealing: instead of routing inter-VLAN traffic through the gateway (up the cable, through the firewall, back down the cable), the switch routes it directly. Faster. Lower latency. What's not to love.

What's not to love is asymmetric routing. When some VLANs route through the L3 switch and others still route through the gateway, the stateful firewall on the gateway only sees half the conversation. Outbound traffic takes one path; return traffic takes another. The firewall sees a response without a request and drops it. Everything half-works. Pings succeed (small packets, fast). SSH connects (small initial handshake). Web pages hang forever (large responses, dropped).

The fix seemed obvious: move everything to the same routing plane. The Management VLAN couldn't support L3 switching -- UniFi silently ignores the setting on the default LAN. So the plan became: migrate every service from the Management VLAN to the Infrastructure VLAN, enable L3 switching everywhere, and enjoy the speed.

The Big Bang

Forty-plus hosts. Three Proxmox nodes. A backup server. Thirty-three LXC containers. Four VMs. Two Caddy reverse proxies. Thirty DNS records. Firewall rules. Address groups. NFS mounts. Monitoring targets. Ansible inventories. Crontabs. Config files nested inside config files.

The migration took about thirty minutes of downtime. OOB management interfaces on a separate VLAN kept SSH access alive while the main network was being rewired. Every container got a new IP via a scripted batch of pct set commands. Every config file got a sed pass. Every service got verified.

It worked. Everything came up on the new VLAN. L3 switching was enabled. Inter-VLAN routing was fast. Victory.

For about ten minutes.

The 1500-Byte Ceiling

The first sign was web pages loading slowly between VLANs. Then large API responses timing out. A targeted ping with the Don't Fragment bit set confirmed it: anything over 1500 bytes between VLANs was being silently dropped.

The L3 routing ASIC in the switch is hardcoded to 1500 MTU for routed traffic. Not configurable. Not fixable in software. You can set the Linux control plane interfaces to 9216 all day long -- the hardware data plane ignores you.

Jumbo frames work perfectly within the same VLAN (Layer 2 switching). Cross-VLAN traffic through the L3 engine? 1500 bytes, take it or leave it.

Fine. NFS traffic stays within the same VLAN anyway. Clients accessing services cross-VLAN are doing web requests at 1500 MTU, which is what the internet runs on. Annoying but liveable.

The Firewall That Wasn't

Then someone tried to access the blog from their phone.

Nothing. Plex from outside? Nothing. Every externally-facing service -- reverse proxy, media server, file sharing -- unreachable from the internet. Internal access worked fine. External was dead.

The port forward was correct. The Caddy DMZ reverse proxy was running. The backends were healthy. The DNS resolved. But the traffic just... vanished.

It took an embarrassingly long time to check the firewall zones.

UniFi's zone-based firewall groups networks into zones: Internal, External, DMZ, and so on. Firewall rules reference these zones. "Allow External to DMZ on port 443." "Block DMZ to Internal." "Allow Reverse Proxy to Internal on specific ports." Standard stuff.

Here's what nobody tells you: when you enable L3 switching on a network, it gets removed from the firewall zone system entirely.

The network disappears from the zone assignment dropdown. It can't be added to Internal, DMZ, or any other zone. Every firewall rule that references that zone stops matching traffic for that network. The gateway doesn't even know the network exists anymore -- the switch owns it now.

Every single network had been moved to L3 switching. Every single network had silently left its firewall zone. The DMZ -- the one network that absolutely, critically needs firewall rules -- was floating in a security no-man's-land where nothing was blocked and nothing was allowed.

The "Allow HTTPS to DMZ" rule? No DMZ zone to match. The "Block DMZ to Internal" rule? No Internal zone to match either. The "Allow Reverse Proxy to Internal" rule that lets Caddy reach the backends? Both zones empty.

The Fix

Disable L3 switching. Put everything back on the gateway. Re-add every network to its firewall zone.

That's it. That's the fix.

There is a hybrid approach: keep your DMZ on the gateway for firewall control, let internal VLANs use L3 switching for speed. Traffic between DMZ and internal VLANs still passes through the gateway (it's the routing boundary between the two), so firewall rules apply. Internal-to-internal traffic stays on the switch.

But for my setup, the math doesn't work. The L3 switch caps routed traffic at 1500 MTU. The heavy internal traffic (NFS) already stays within a single VLAN at Layer 2 with jumbo frames. The gateway is a 10-gigabit device. The latency difference for web requests between "routed on the switch" and "routed through the gateway" is a couple of milliseconds that no human will ever perceive.

So: everything routes through the gateway. Every network is in its zone. Every firewall rule works. External access works. The blog loads.

What I Learned

L3 switching on UniFi is not what it sounds like. It's not "the same routing, but faster." It's a fundamentally different traffic path that bypasses the gateway's firewall, removes networks from the zone-based security model, and caps your MTU at 1500 regardless of your jumbo frame configuration.

The switch ACLs that replace the firewall are stateless, binary (block all or allow all between networks), and have no concept of ports, protocols, or connection tracking. They are not a firewall. They are a bouncer who can only say "everyone in" or "nobody in" with no guest list.

If you're considering UniFi L3 switching, here's the decision tree:

  • Do you need firewall rules between VLANs? Don't use L3 switching on those VLANs.
  • Do you have a DMZ? Keep it on the gateway. Always.
  • Do you use jumbo frames? L3 switching caps cross-VLAN MTU at 1500 regardless.
  • Is the gateway your bottleneck? Probably not. Modern UniFi gateways route at line rate.

The feature works. It does exactly what it says: routes traffic on the switch instead of the gateway. The problem is everything it doesn't say about what you lose when it does.