Engineering teams spend weeks fine-tuning Horizontal Pod Autoscalers, tweaking resource requests and limits, configuring Cluster Autoscaler profiles, or adopting third-party tools to push node utilization toward 80-90%. All worthwhile. But when was the last time anyone checked what hardware those nodes are actually running on? Or the gains of moving from an older to a newer CPU architecture?
If you've never explicitly set your GKE node machine type, your cluster is probably running on the default: e2-medium, a shared-core burstable VM with 2 vCPUs and 4 GB of RAM. Teams that did pick something else often landed on N1 or N2 because that's what the cluster was built on originally, and nobody revisited it. There may be some substantial GKE optimizations that can be unlocked with a VM upgrade. Let's explore three to begin.
1. Node pool server consolidation
Call it the silicon-under-the-scheduler problem. You can have the best bin-packing and autoscaling setup around, but if each node gets 30-70% less throughput per core than it would on latest gen hardware, you're optimizing on top of a slow foundation.
This is a cloud version of an old problem. On-prem, server refresh cycles showed that a handful of current-gen servers could replace a full rack of older ones. Gains in IPC (instructions per clock), memory bandwidth, and power efficiency meant fewer boxes doing more work.
The same math applies here. Moving GKE node pools from N1 or E2 to N4 means faster cores and fewer nodes for the same workload. Fewer nodes, lower TCO, unlocking budget for shiny new projects.
Benchmarking infrastructure modernization impact on GKE node pools
To prove your case for moving your GKE node pools to the latest machine series, run a thorough benchmarking and validation exercise.
- Check what machine types your node pools actually use. Many teams will be surprised.
- Spin up a test node pool with N4 instances alongside your existing one. Keep the same pod resource requests and limits.
- Compare at the cluster level: pods-per-node and total cluster cost for the same workload, not just per-VM throughput.
- Once it checks out, update your Terraform modules or GKE node pool configs to default to N4.
For more detailed guidance on Google Cloud VM modernization benchmarking, read this article I recently co-authored with the team at ProsperOps.
Changing the node pool machine type is the easiest win. It doesn't touch application code and leverages Kubernetes native ops patterns.
2. Disk performance bottlenecks
When SSDs hit the mainstream market, a slogan made the rounds: "+SSD −GHz”. The idea was simple: upgrading storage often mattered more than chasing faster clock speeds. The same logic applies in the cloud. For I/O-bound workloads like databases, your disk can be the bottleneck, not your CPU.
The VM cap nobody talks about
Most teams don't realize this: your disk performance is capped by the VM, not just by the disk. You can provision a 1TB PD-SSD rated for 30,000 IOPS, but attach it to an n2-standard-4 and the VM caps you at 15,000. Half of your provisioned IOPS is gone, but still on the bill.
If teams aren’t careful, the waste can be even greater with Hyperdisk. You can create a Hyperdisk Balanced volume provisioned for 50,000 IOPS, attach it to an n4-standard-4, and the VM will cap you at 30,000. The other 20,000 IOPS show up on your bill but not in your performance metrics.
The per-VM caps scale with vCPU count. For Hyperdisk Balanced:
Look at the 4-vCPU row. N4 allows up to 2x the IOPS of N2 at the same shape. So modernizing to N4 also raises your storage ceiling, not just your CPU performance.
If you're considering making the switch, migrating from Persistent Disk to Hyperdisk is snapshot-based: you snapshot the existing PD, create a new Hyperdisk from that snapshot, then it’s ready to use with N4. Google's migration guide walks through the full process.
PD-Balanced vs PD-SSD: size for the ceiling with small VMs
At small VM shapes, PD-Balanced and PD-SSD often hit the same VM cap. If your n2-standard-4 caps at 15,000 IOPS regardless of disk type, the SSD premium buys you nothing. Many engineering teams pay for PD-SSD performance that their instances simply cannot ingest due to these hard limits. Teams on small VMs can switch from PD-SSD to PD-Balanced and save money with no performance change. This optimization requires no architectural changes, as PD-Balanced provides sufficient performance to meet the VM's ceiling in these smaller configurations.
The takeaway here is that it’s always worth checking your VM caps before choosing your disk tier. By aligning your storage choice with the specific IOPS and throughput limits of your machine type, you ensure you aren't over-provisioning and wasting budget on performance that is effectively throttled at the hypervisor level.
The modernization double-win
Since moving from N2 to N4 raises your CPU performance and your disk I/O ceiling at the same time, there's a right-sizing opportunity here that's easy to miss. Say a team picked n2-standard-32 partly because they needed 80,000 disk IOPS (N2 at 32 vCPUs supports 120,000+ IOPS). On N4, an n4-standard-16 already gets you 80,000 IOPS, and n4-standard-32 reaches 100,000. If the workload doesn't need all 32 cores for CPU, the team was oversizing compute just to get enough disk headroom. With N4, they can shrink the VM and still get the IOPS they need.
3. Storage efficiency
Hyperdisk Storage Pools
The latest N4 series supports Hyperdisk Storage Pools, letting you pool capacity across multiple disks with deduplication and compression enabled at the pool level through the advanced capacity tier. For compressible data (logs, database snapshots, batch outputs), this lowers the effective cost per GB.
Thin provisioning means capacity in the pool is only consumed when data is actually written, not when disks are provisioned. With Hyperdisk Storage Pools, you can overprovision disks up to 5x their pool allocation, so each volume can be sized for its peak load without buying 5x the storage. Google says this pushes utilization to around 80%, compared to the ~38% that's typical when every volume is independently provisioned. The advanced capacity tier also applies dedup and compression, adding roughly 22% data reduction. Google's own example: a 1 PiB database workload that would normally require 2.6 PiB of provisioned capacity (due to low utilization) drops to roughly 1 PiB with Storage Pools, cutting capacity costs by 56%. Overall, Google claims Storage Pools can reduce block storage TCO by 30-50%.
For GKE, Storage Pools are supported starting with GKE 1.29.2. You create the pool, then provision Hyperdisk Balanced or Throughput volumes into it through your normal StorageClass workflow. The disks share the pool's capacity and performance, so you don't have to right-size each PVC individually. Pools auto-grow at 80% utilization and hold up to 1,000 disks, with a 10 TB minimum.
The point is: don't benchmark compute in isolation. Your VM shape determines how much storage performance you can actually use, and Storage Pools determine how efficiently you pay for it.
Where to start
Node pool machine types and storage provisioning have something in common: most teams configured them once and never revisited either. Both quietly drained the budget when the platform evolved underneath them.
The audit is straightforward. Run through your node pools and check the machine types. Pull up your provisioned disk IOPS and compare them against the VM caps for your instance size. If you're on PD-SSD with a small VM that caps well below what SSD offers, switch to PD-Balanced and pocket the difference. If your nodes are still on N1 or N2, the per-core throughput gap alone could justify fewer, faster nodes on N4. Check out this Google Cloud VM pricing comparison tool for an easy way to compare instance prices across geographies, machine families, and generations.










