Feature Flags Done Right
How to ship big changes safely without breaking trust
I noticed something interesting on the SBI online banking website recently.
There appears to be a refreshed UI rolling out, and what stood out to me is that users can switch between the old UI and the new UI. Whether SBI is using feature flags internally or not, that experience reflects a common pattern in mature engineering teams: controlled rollout.
When the product is critical and used by millions, you cannot afford a risky, all-at-once launch. You need a way to introduce change gradually, measure impact, and recover instantly if something goes wrong.
That is exactly what feature flags enable.
This blog explains what feature flags really are, why they matter for reliability, and how to use them in a clean, disciplined way.
1) What feature flags are, in plain terms
A feature flag is a runtime switch that changes product behavior without requiring a redeploy.
You ship the code, but it remains “off” for most users until you explicitly enable it.
This one concept gives you a powerful capability:
You can separate deploy from release.
Deploy becomes a technical event.
Release becomes a controlled product decision.
That separation is the heart of safe engineering.
2) Why feature flags are a reliability tool, not just a release trick
Most teams start with feature flags for one reason: shipping faster.
Over time, they learn the deeper value: risk control.
Feature flags help you:
Limit blast radius
If a change causes errors, it affects only the small group you enabled it for.
Roll back instantly
Turning off a flag is faster than rolling back a deployment. In many teams it is the difference between a small incident and a major outage.
Test safely in production
Staging environments rarely match real world traffic and edge cases. Flags let you validate on real conditions without full exposure.
Reduce release anxiety
When the “off switch” exists, teams ship with more confidence and fewer late-night deploy fears.
3) The most common mistake: using flags like a simple switch
A lot of teams use feature flags like this:
Ship feature
Turn on
Move on
The problem is that this treats flags like a one-time toggle, not a rollout system.
A good rollout is a process, not a button.
4) A rollout approach that works almost everywhere
If you want a simple pattern that is easy to follow and hard to mess up, use this.
Step 1: Dark launch
Ship the feature behind a flag. It is in production, but “off” for everyone.
Why this matters: you can deploy early and often without user impact.
Step 2: Enable for internal users
Turn it on for the team first, then for support and QA.
This catches obvious issues and builds confidence quickly.
Step 3: Release gradually
Start small, then ramp:
1% → 5% → 20% → 50% → 100%
This is the part many teams skip, but it is the part that makes rollouts safe.
Step 4: Watch real signals
Do not roll out by gut feel. Roll out by signals.
The minimum set I always watch:
Error rate
Latency
Drop-offs on the flow (conversion, completion rate)
Support ticket spikes
Logs for unexpected edge cases
Step 5: Keep a kill switch for risky changes
Anything critical should have an “off” switch that is tested, not assumed.
Examples of “kill switch worthy” areas:
Payments and billing
Authentication and login
Checkout and order creation
Background jobs that write data
Notifications and messaging
Step 6: Remove the flag after rollout
This is the discipline part.
If you do not remove flags, they become permanent complexity.
5) The quiet danger: flag debt
Flag debt is what happens when flags stay in your codebase forever.
Months later:
Nobody remembers what a flag controls
Code paths multiply
Testing becomes harder
Debugging becomes slower
You get “we cannot remove this because we are not sure”
That is how teams end up with dozens of flags that nobody trusts.
A simple rule prevents most of this:
Every flag must have an owner and an expiry date.
If the feature is fully rolled out and stable, remove the flag and delete the old branch. Keep your codebase clean.
6) The mistakes that cause real incidents
Here are a few mistakes I see repeatedly.
Mistake 1: Flags used as permanent configuration
Feature flags are meant for change management, not for long-term configuration. If it is a permanent behavior, it belongs in a real configuration system or product settings.
Mistake 2: No metric tied to a rollout
If you do not know what “healthy” means, you will not notice when a rollout is harming users.
Mistake 3: Turning on a flag without testing the off path
The off path is also production behavior. If it is broken, you lose your safety net.
Mistake 4: Flags mixed with unsafe data changes
Flags can hide UI, but they cannot undo data that has already been written incorrectly.
If you are doing database migrations, you need backward compatibility and a proper migration plan, not just a flag.
7) A simple checklist you can copy for your team
Before enabling a feature flag rollout, confirm:
Flag name is clear
Owner is assigned
Expiry date is set
Metrics are defined (errors, latency, product outcome)
Kill switch is tested
Rollout plan is written (1% to 100%)
Removal plan is scheduled
This checklist feels basic, but it prevents most messy rollouts.
8) The takeaway
The goal is not to add more flags.
The goal is safer change.
Feature flags, when used with discipline, give you control over risk. They let you ship continuously without fearing every release. They let you test with real users while protecting the majority of your traffic.
That is why mature products rarely “flip everything at once.”
They evolve through controlled rollouts.
Closing question
What is the best or worst feature flag situation you have seen in a team?
And do you enforce expiry dates, or do flags live forever?
_____________________________________________________________________________________
Written by Sharath Chandra Odepalli
Comments
Post a Comment