3 Reasons AWS Lambda Is Not Ready for Prime Time
The user can give you unparseable input.
The user can give you input that’s parseable but illegal.
Something goes wrong at runtime.
The runtime raises an exception.
If you’re not familiar with Lambda, it’s a AWS feature that’s meant to give you a way to quickly write a service and let Amazon worry about all the boilerplate junk that normally goes with standing your service up in a way that people can actually talk to it. You don’t configure subnets or instances or load balancers with Lambda: you just write some code and then tell Amazon to hook you up. It’s a pretty compelling promise.
When we at Ambassador Labs tried to actually use Lambda for a real-world HTTP-based microservices we found some uncool things that make Lambda not yet ready for the world we live in:
- Lambda is a building block, not a tool
- Lambda is not well documented
- Lambda is terrible at error handling
Lung skips these uncool things, which makes sense because they’d make the tutorial collapse under its own weight, but you can’t skip them if you want to work in the real world. (Note that if you’re using Lambda for event handling within the AWS world, your life will be easier. But the really interesting case in the microservice world is Lambda and HTTP.)
It’s not just Lambda, even: AWS’ model is that they provide building blocks, and they expect others to wrap real tools around them. If you try to interact directly with AWS, it’s absurdly manual.
To wit, Lung’s tutorial shows us that manually setting up a Python Lambda is a twenty step process – and that’s a service with exactly one endpoint that uses GET and takes just one argument on the query string. Mind you, about half those steps (8-10, depending) are things you’ll have to repeat for every endpoint you create. If you have even five services, you’re looking not at 20 steps, but 50-60. Imagine 100 services. Imagine how often you’ll have to do this if you’re using versioned endpoints. Does 8-10 manual configuration per endpoint, every time you roll out a new version, sound like fun?
The root of the problem here is that we want a tool (our microservice) but AWS gives us building blocks, and leaves connecting them up to us. The blocks here are Lambda and the API Gateway, and it’s telling that Lung starts his tutorial not by creating a Lambda but rather by messing with the API Gateway – the gateway is a little annoying. Those 8-10 steps I mention above are all API Gateway stuff, and what’s worse, they’re the minimum for HTTP-only support. If you want HTTPS (and you should, it being 2016 and all), you need to add that into the mix as well.
This is, of course, the sort of thing that cries out for automation, and various folks are scrambling to fill the void. We already use Terraform for wrangling EC2, and it won’t take much for them to cope with Lambda and the API Gateway. Serverless announced Python support in their recent 0.2.0 release; Zappa’s initial release happened just two days ago. More on those as we experiment – at a first glance, though, none of these tools looks like I can trust it in production yet, so we’re still left with the manual world. (Of course, if you’re working on any of these tools, I’d welcome hearing why I’m wrong here!)
“But wait,” I hear you (and all the AWS folks) shout, “you lie! There’re all kinds of docs about Lambda and AWS online!”
Well, let’s imagine that you’re a developer at Alice’s House of Grues. You’ve been tasked with creating the Grue Locator service, which takes the name of a grue and responds with its location. Let’s further imagine that you’re a good developer:
- You’re a team player, so you and the rest of your team are going to agree on the API before you write code.
- You’re conscientious, so you’re going to test your code locally before deploying it (and you’re going to automate the tests).
- You’re a Python developer, so you’re going to write in Python.
We’ll make the API easy:
GET /v1/grue/grue_name
Given the API, you can sit down to write your tests and code locally. First questions: how does the
grue_name
Go ahead. Fire up Google. I’ll wait.
Back again? OK. You found a lot of hits, right? Then you had to spend some time differentiating info about AWS Lambda from the Python programming term
lambda
Here’s what the Python programming model actually says about the inputs and outputs:
event
dict
list, str, int, float,
NoneType
The implication here is that it’s expecting JSON over the wire, which will be deserialized into the
event
application/json
This may seem unnecessarily pedantic. Maybe it seems obvious that the handler is going to accept a JSON dictionary and deserialize it, so they shouldn’t need to spell it out. But no, this really is important: if they don’t spell out what’s valid input, they’re not telling you what to expect when – not if – the user hands in something unexpected.
Suppose the user hands in a simple string instead of a JSON-encoded dict? Suppose they hand in a JSON-encoded array? Suppose they hand in gibberish? We’ll talk about this more shortly when we get to error handling, but the problem is that you really have no idea. Instead you have to guess and test.
Likewise, here’s what they say about outputs:
Optionally, the handler can return a value. What happens to the returned value depends on the invocation type you use when invoking the Lambda function:
- If you use the invocation type (synchronous execution), AWS Lambda returns the result of the Python function call to the client invoking the Lambda function (in the HTTP response to the invocation request, serialized into JSON). For example, AWS Lambda console uses the
RequestResponse
invocation type, so when you test invoke the function using the console, the console will display the returned value.RequestResponse
- If the handler does not return anything, AWS Lambda returns null.
- If you use the invocation type (asynchronous execution), the value is discarded.
Event
This is, well, better. It actually says that it’s going to take the output and serialize it into JSON. Great. The main issue here is that we have to go somewhere else to find out how to control the invocation type (the tutorial doesn’t even mention it); at least it explicitly says that the test console uses the
RequestResponse
None
Note that AWS Lambda is hardly alone here: poor docs are endemic in the industry, enough so that I’ll be writing further about it later. But it’s definitely an issue with trying to get started with Lambda.
We mentioned a few specific cases above but let’s get more into this, because it’s a dealbreaker. How, exactly, are you meant to manage errors in Lambda?
Bear in mind that there are a lot of potential points of error in Lambda. The user can call you with bad data, your own service’s processing can fail, the network can fail, the list goes on and on… and, again, Amazon never documents how exactly you’re meant to deal with this.
Walking down some of the simple cases:
The user can give you unparseable input.
This is actually sort of simple: AWS will simply not run your Lambda if it can’t understand the input at all. Of course, the logging AWS provides (AWS CloudTrail) often didn’t actually capture anything about this case, when I was trying it, so it was a little tough to be sure exactly what was happening.
The user can give you input that’s parseable but illegal.
This is the case where the user gives you valid JSON, but it doesn’t make any sense (maybe you have a required dictionary element that they didn’t provide). Lambda doesn’t seem to have any specific way to handle this, so we have to treat it as a generic error at runtime, which leads us to…
Something goes wrong at runtime.
Maybe another service that we need is down. Maybe the user made a nonsensical request. We’d like to log this for debugging, and return an error to the caller so that they know it didn’t work. Logging should be easy: per AWS, any logging we do from Python should appear in CloudTrail, as should writes to
stdout
Worse, there doesn’t seem to be a way to get Lambda in Python to return anything but HTTP 200. If you’re writing in JavaScript, maybe… so why not Python? (I’d love to be wrong about this, by the way.)
The combination means that the most effective tool here is to always return a dictionary with an element indicating whether the request succeeded or failed, since you don’t have HTTP status codes to work with.
Finally:
The runtime raises an exception.
AWS claims that if a Lambda raises an exception, a specific JSON dictionary is returned. This pretty much seemed to work, when I did it, but of course it means that you get something totally unlike the output you see in the normal case. Combined with the lack of control over HTTP status codes, wrapping your entire service in a big
try/catch
When I first sat down to write my microservice using Lambda, I really wanted it to be the greatest thing since sliced bread. It had so very much promise, and I just loved the idea that I’d be able to whip up 50 lines of Python and let Amazon worry about deploying.
Sadly, it was too good to be true. We recently rolled out that microservice using Terraform and EC2 instances because Lambda just isn’t quite ready for the real world. Maybe things will be different soon, though. Maybe Zappa, Serverless, and Terraform will take over the world. Who knows, maybe Amazon will decide that Lambda should be the first thing to move beyond the building-block stage.