Feature Flags Done Right



How to ship big changes safely without breaking trust

I noticed something interesting on the SBI online banking website recently.

There appears to be a refreshed UI rolling out, and what stood out to me is that users can switch between the old UI and the new UI. Whether SBI is using feature flags internally or not, that experience reflects a common pattern in mature engineering teams: controlled rollout.

When the product is critical and used by millions, you cannot afford a risky, all-at-once launch. You need a way to introduce change gradually, measure impact, and recover instantly if something goes wrong.

That is exactly what feature flags enable.

This blog explains what feature flags really are, why they matter for reliability, and how to use them in a clean, disciplined way.


1) What feature flags are, in plain terms

A feature flag is a runtime switch that changes product behavior without requiring a redeploy.

You ship the code, but it remains “off” for most users until you explicitly enable it.

This one concept gives you a powerful capability:

You can separate deploy from release.

Deploy becomes a technical event.
Release becomes a controlled product decision.

That separation is the heart of safe engineering.


2) Why feature flags are a reliability tool, not just a release trick

Most teams start with feature flags for one reason: shipping faster.

Over time, they learn the deeper value: risk control.

Feature flags help you:

Limit blast radius

If a change causes errors, it affects only the small group you enabled it for.

Roll back instantly

Turning off a flag is faster than rolling back a deployment. In many teams it is the difference between a small incident and a major outage.

Test safely in production

Staging environments rarely match real world traffic and edge cases. Flags let you validate on real conditions without full exposure.

Reduce release anxiety

When the “off switch” exists, teams ship with more confidence and fewer late-night deploy fears.


3) The most common mistake: using flags like a simple switch

A lot of teams use feature flags like this:

Ship feature
Turn on
Move on

The problem is that this treats flags like a one-time toggle, not a rollout system.

A good rollout is a process, not a button.


4) A rollout approach that works almost everywhere

If you want a simple pattern that is easy to follow and hard to mess up, use this.

Step 1: Dark launch

Ship the feature behind a flag. It is in production, but “off” for everyone.

Why this matters: you can deploy early and often without user impact.

Step 2: Enable for internal users

Turn it on for the team first, then for support and QA.

This catches obvious issues and builds confidence quickly.

Step 3: Release gradually

Start small, then ramp:

1% → 5% → 20% → 50% → 100%

This is the part many teams skip, but it is the part that makes rollouts safe.

Step 4: Watch real signals

Do not roll out by gut feel. Roll out by signals.

The minimum set I always watch:

  • Error rate

  • Latency

  • Drop-offs on the flow (conversion, completion rate)

  • Support ticket spikes

  • Logs for unexpected edge cases

Step 5: Keep a kill switch for risky changes

Anything critical should have an “off” switch that is tested, not assumed.

Examples of “kill switch worthy” areas:

  • Payments and billing

  • Authentication and login

  • Checkout and order creation

  • Background jobs that write data

  • Notifications and messaging

Step 6: Remove the flag after rollout

This is the discipline part.

If you do not remove flags, they become permanent complexity.


5) The quiet danger: flag debt

Flag debt is what happens when flags stay in your codebase forever.

Months later:

  • Nobody remembers what a flag controls

  • Code paths multiply

  • Testing becomes harder

  • Debugging becomes slower

  • You get “we cannot remove this because we are not sure”

That is how teams end up with dozens of flags that nobody trusts.

A simple rule prevents most of this:

Every flag must have an owner and an expiry date.

If the feature is fully rolled out and stable, remove the flag and delete the old branch. Keep your codebase clean.


6) The mistakes that cause real incidents

Here are a few mistakes I see repeatedly.

Mistake 1: Flags used as permanent configuration

Feature flags are meant for change management, not for long-term configuration. If it is a permanent behavior, it belongs in a real configuration system or product settings.

Mistake 2: No metric tied to a rollout

If you do not know what “healthy” means, you will not notice when a rollout is harming users.

Mistake 3: Turning on a flag without testing the off path

The off path is also production behavior. If it is broken, you lose your safety net.

Mistake 4: Flags mixed with unsafe data changes

Flags can hide UI, but they cannot undo data that has already been written incorrectly.

If you are doing database migrations, you need backward compatibility and a proper migration plan, not just a flag.


7) A simple checklist you can copy for your team

Before enabling a feature flag rollout, confirm:

  • Flag name is clear

  • Owner is assigned

  • Expiry date is set

  • Metrics are defined (errors, latency, product outcome)

  • Kill switch is tested

  • Rollout plan is written (1% to 100%)

  • Removal plan is scheduled

This checklist feels basic, but it prevents most messy rollouts.


8) The takeaway

The goal is not to add more flags.

The goal is safer change.

Feature flags, when used with discipline, give you control over risk. They let you ship continuously without fearing every release. They let you test with real users while protecting the majority of your traffic.

That is why mature products rarely “flip everything at once.”
They evolve through controlled rollouts.


Closing question

What is the best or worst feature flag situation you have seen in a team?
And do you enforce expiry dates, or do flags live forever?

_____________________________________________________________________________________

Written by Sharath Chandra Odepalli

LinkedIn 

Comments

Popular posts from this blog

Stop Building: Why Less Is More in Software Development

Clarity: The Most Underrated Skill in Technology

The Transparency Era: What California's New AI Law Means for Tech Builders