Implementing Traffic Shadowing and Dark Launch in Modern API Gateways
What is Dark Launch?
Traffic Shadowing & Dark Launch using an API Gateway
Managing State and Third-Party Interactions
Dark Launch as an Option
You’ve probably bumped into the term “dark launch” when reading about software feature releases by companies like Facebook, Google and Amazon etc. In fact, Facebook coined the term when describing the launch of a new chat feature. They called it a dark launch because they deployed the code responsible for the new chat service to a small segment of their audience at a time. This allowed the engineers to monitor the new service using real production traffic without impacting the broad Facebook population’s user experience.
What is Dark Launch?
As in the Facebook example, a dark launch is an approach for incrementally releasing production-ready software features to groups of users, enabling you to assess real-time feedback and make changes before launching widely. Similarly, you can use this process for deploying code changes and exposing them only in a limited environment without any traffic going through their exposed parts—think beta testing but with no humans involved.
Traffic Shadowing & Dark Launch using an API Gateway
One key mechanism for enabling a dark launch strategy is traffic shadowing. This capability duplicates a segment of incoming ingress requests from your users and sends it to the new feature, while not disrupting the live traffic.
One way to effectively perform traffic shadowing is by using a Kubernetes-native API Gateway like Edge Stack. Additionally, integrating Prometheus helps monitor critical metrics such as latency and the occurrence of 5XX errors. These metrics can be aggregated with other service metrics, and almost every modern programming language supports robust integration with Prometheus.
Managing State and Third-Party Interactions
Not all services are easy to test with dark launch. On the one hand, stateless services, by definition, are immutable providing the ability to consistently use shadow traffic in testing, for example. However, services that mutate state or exchange code with third parties as part of their standard functionality require a different approach. To illustrate the challenges of this strategy in a very common application on a mutable service, consider this thought experiment.
Say you want to dark launch a new checkout service as part of an e-commerce application. This service not only interacts with a third-party payment service like Stripe in an “unsafe” way, but it also persists the state of a purchased shopping basket and sends a global event of a completed purchase to any other services listening. It is easy to imagine that for every user whose checkout request follows the dark launch path, they end up with two debits on their credit card and potentially two lots of goods being delivered.
So what do you do to avoid this risk of unintended consequences? When you dark launch this type of service, there are a couple of options we’ll look at:
- Run the dark service in a different profile that disables interaction with third parties and data stores.
- Alternatively, you can replace the third-party service and data stores with a virtualized representation. For example, the lightweight service virtualization tool, Hoverfly can be used to simulate third-party interactions. Additionally, for embedded data stores and middleware tools like:
- HyperSQL (in-memory MySQL substitute)
- Apache Qpid (in-memory AMQP)
- LocalStack (AWS service and datastore simulator)
Dark Launch as an Option
Dark launch a feature is just one option in your deployment toolbox, Ideally, you would combine this with other approaches, such as canary testing. For example, after testing the operational profile of a new feature by dark launching it using traffic shadowing, you could then assess the user experience by gradually canary releasing the feature to an increasing number of your users over time through canary testing.