AWS re:Invent 2024: My Serverless Takeaways

Now that AWS re:Invent 2024 is officially over, let's go over the exciting new services and features launched from a Serverless developer perspective.

Best Practices for Serverless Developers
Pre-Re:invent Announcements
AWS re:Invent Announcements

Best Practices for Serverless Developers

This is a shameless self-plug :)

If you missed my breakout session with Julian Wood, the recording is up and ready.

We discuss tips and insights on building production-ready Serverless services and scaling them securely across organizations. There are plenty of action items for you to take, and all the links and slides can be found here.

Pre-Re:invent Announcements

Let's review the most exciting announcements and improvements announced before AWS re:invent. This year we have seen plenty. Honestly, this is one the best year we've had in a long time.

AWS Lambda Logs and Metrics Insights

https://aws.amazon.com/blogs/compute/simplifying-lambda-function-development-using-cloudwatch-logs-live-tail-and-metrics-insights/

AWS introduced two new features for AWS Lambda developers CloudWatch Logs Live Tail and CloudWatch Metrics Insights.

The Lambda console now natively supports CloudWatch Logs Live Tail, an interactive log streaming and analytics capability which enables developers and operators to view and analyze their Lambda function logs in real time.

So, you can now deploy your function, invoke it, and view logs in real time all from the comfort of the Lambda console instead of refreshing the CloudWatch log group stream. It's an excellent addition, but I wouldn't recommend debugging like that; it's time-consuming and inefficient. You should use your IDE, not the AWS console. Check out my Lambda Serverless guide blog post and my AWS re:Invent 2023 breakout session, where I discuss testing practices in detail.

The second feature is CloudWatch Metrics Insights.

The dashboard shows the top 10 Lambda functions in your AWS account with highest number of invocations, errors, and longest invocation duration

You can navigate to the Dashboard page in the Lambda console. This is a nice feature, but I'd rather monitor my function from one place—my production CloudWatch service dashboard. If you want to learn how to build such a dashboard, check out my post here.

AWS Lambda Console is VS-code-ish

https://aws.amazon.com/blogs/compute/introducing-an-enhanced-in-console-editing-experience-for-aws-lambda/

This brings the familiar Visual Studio Code interface and many of the features directly into the Lambda console, allowing developers to use their preferred coding environment and tools in the cloud.

There's not a lot to say about it. It looks sleeker and more familiar. Should you develop your function in the console? No. Is it still nice, and does it give a better experience when looking at the code? 100% yes.

The bottom line is that it doesn't replace an IDE and has limitations, but it provides a better viewing experience.

AWS FIS Supports Lambda Functions

https://aws.amazon.com/about-aws/whats-new/2024/10/aws-lambda-fault-injection-service-actions/

This is such a welcomed addition!

test the resilience of their applications by temporarily adding invocation latency, preventing function execution, modifying function outputs, and injecting integration errors.

Chaos engineering is a critical aspect but quite advanced aspect of Serverless testing. Until now, there has been no official support from AWS, but you can do it (check Koby's post here and here). It's much easier to simulate these errors on your AWS account. Expect Koby's insights on this feature very soon!

For now, you can follow AWS' detailed blog post.

AWS AppSync Events a.k.a Serverless Websocket APIs

https://aws.amazon.com/blogs/mobile/announcing-aws-appsync-events-serverless-websocket-apis/

I wrote a deep-dive article and shared my insights - it's really good!

You can use Serverless websockets without managing their state or infrastructure. Developers create their API and publish events broadcast to millions (oh my!) of clients subscribed to through a WebSocket connection. There's also an integration with Eventbridge over the HTTP endpoint, which furthers this feature.

DynamoDB pre-Warm Tables & It's Cheaper!

https://aws.amazon.com/blogs/database/pre-warming-amazon-dynamodb-tables-with-warm-throughput/

Warm throughput provides insight into the read and write operations your table or index can immediately support, with these values growing as usage increases. It’s the minimum throughput that your table is prepared to handle instantaneously.

The article provides common use cases that benefit from pre-warming the table and tips for defining the values to configure. This can be helpful when you have an "on-demand" table and want to be ready for spikes or high traffic right from the get-go (to reduce the chance of throttled APIs). In the past, to cope with expected spikes, you had to momentarily change the table to provisioned mode with high settings and change back to on-demand once the traffic had calmed down.

As a final note, you pay extra for this feature.

And a big one! we pay less for DynamoDB. The global tables reduction is amazing!

https://aws.amazon.com/blogs/database/new-amazon-dynamodb-lowers-pricing-for-on-demand-throughput-and-global-tables/

On-demand throughput pricing has been reduced by 50%
Global tables pricing has been reduced by up to 67%

Lambda Runtime - Python 3.13 Support

https://aws.amazon.com/pt/blogs/compute/python-3-13-runtime-now-available-in-aws-lambda/

There's not a lot to say; some experimental features will be disabled, though. I'm still waiting to see comparisons between performance/cold starts. Powertools for AWS Lambda supports it, and so do my open-source projects:

SnapStart for Lambda Functions (Python & .Net)

https://aws.amazon.com/blogs/aws/aws-lambda-snapstart-for-python-and-net-functions-is-now-generally-available/

Let's talk about SnapStart for Python and .Net. It's a very smart mechanism. It's not free like its JAVA variation. It's not provisioned concurrency, either. It's a mechanism that gets you somewhere in between provisioned concurrency and regular cold starts; your mileage may vary but don't expect a two-digit cold start.

In the end, it's a workaround and doesn't solve the real issue - cold starts.

Should you use it?

Yes, I'd place it on my critical customer facing APIs BUT also open a JIRA ticket as a tech debt to optimize my cold start and remove snap start.

Want to learn about cold start optimizations? Check out my article "Is AWS Lambda Cold Start Still an Issue?".

AWS Lambda supports Amazon S3 as a failed-event destination for asynchronous and stream event sources

https://aws.amazon.com/about-aws/whats-new/2024/11/aws-lambda-s3-failed-event-destination-stream-event-sources/

Some users might find a good use case for it. However, I'm a big fan of sending failed messages to DLQs and doing a redrive. I published an article a while back and provided complete code examples with CDK; it's easy!

Amazon Aurora Serverless v2 supports scaling to zero capacity

https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-aurora-serverless-v2-scaling-zero-capacity/

Wow! Scale to zero is a must if you call yourself #Serverless. This a great step in the right direction, now get rid of the VPC settings please 😀

Track performance of serverless applications built using AWS Lambda with Application Signals

https://aws.amazon.com/blogs/aws/track-performance-of-serverless-applications-built-using-aws-lambda-with-application-signals/

CloudWatch Application Signals was introduced last year for EC2, EKS and ECS and now it comes to Lambda (Python and Node.js runtimes only). We have metrics and traces options, people!

When you enable this, Application Signals automatically instruments your Lambda functions using enhanced AWS Distro for OpenTelemetry (ADOT) libraries, provided via a Lambda layer. This Lambda layer packages and deploy the libraries that are required for auto-instrumentation for Application Signals. Services are automatically instrumented with the OpenTelemetry SDK to collect application metrics such as availability, latency, errors, and faults for 100% traffic. Traces are collected at a default sampling ratio of 5%.

Using the pre-built, standardized dashboards of Application Signals, you can identify the root cause of performance anomalies in just a few clicks by drilling down into performance metrics for critical business operations and APIs.

I like the view of "service", a collection of Lambda functions and the possibility to define SLOs and see the actual SLI. If you want to know what these terms mean, check my post here.

It's nice to see Lambda doing OpenTelemetry integration BUT it still requires us to use Lambda layers and maintain that configuration (per region). I'd like to see this become easier without the layer requirement. If you don't know what Lambda Layers are or what I my best practices for using them, read my post here.

Node.js 22 runtime in AWS Lambda

https://aws.amazon.com/blogs/compute/node-js-22-runtime-now-available-in-aws-lambda/

I'm not a Node.js person so I highly suggest you review the article. It's also available in GovCloud which is a nice touch!

Event Source Mapping (ESM) metrics for AWS Lambda

https://aws.amazon.com/blogs/compute/introducing-new-event-source-mapping-esm-metrics-for-aws-lambda/

With these new CloudWatch metrics, you can gain visibility into the processing state of your events that are polled by Lambda Event Source Mapping (ESM) for queue-based or stream-based applications. The article explains the new metrics PolledEventCount, InvokedEventCount, FilteredOutEventCount, FailedInvokeEventCount, DeletedEventCount, DroppedEventCount, and OnFailureDestinationDeliveredEventCount, and how to use them to troubleshoot event processing issues for Lambda functions.

TL; DR—New metrics that you pay the regular fee for offer more visibility into the performance of your Lambda event source mappings. However, sometimes I wonder if there's a thing called too much information.

StepFunction Better DevEx with JSONata

https://aws.amazon.com/blogs/compute/simplifying-developer-experience-with-variables-and-jsonata-in-aws-step-functions/

With variables and JSONata, AWS Step Functions now improves the developer’s experience to write elegant workflows with simpler code in Amazon States Language (ASL) that matches with the normal programming paradigm. I'd argue that it's still not easy to define StepFunction state machines, but it's improvement in the right way. I recommend not overusing it and striving to build the most straightforward and shortest state machine possible. Sometimes, it's much easier to add a Lambda function step that runs your custom code instead of multiple JSONata-defined steps—why? because it's easier to extend when you want to make changes, it's easier to test a Lambda function with unit tests/integration tests, and the code is more readable than JSONata.

Provisioned Mode for Kafka event source mappings

https://aws.amazon.com/about-aws/whats-new/2024/11/aws-lambda-provisioned-mode-kafka-esms/

Provisioned Mode for Kafka ESM allows you to fine-tune the throughput of the ESM by provisioning and auto-scaling between a minimum and maximum number of polling resources called event pollers, and is ideal for real-time applications with strict performance requirements.

I think it's great that we get the option to optimize our ESMs and I assume it will reach other ESMs in the future.

Optimize compute resources on Amazon ECS with Predictive Scaling

https://aws.amazon.com/blogs/containers/optimize-compute-resources-on-amazon-ecs-with-predictive-scaling/

This is a new policy within Amazon ECS Service Auto Scaling, designed to anticipate demand surges by using advanced machine learning (ML) algorithms. Predictive Scaling proactively increases the desired task count, making sure of improved availability and responsiveness for your applications, while also enabling cost savings by needing less over-provisioning.

To my eyes, this optimization closes the gap from Fargate even further. Here, we scale prior to an upcoming traffic surge, because AWS has learnt the traffic history and anticipates the spike in advance. Once it passes, the scale will be reduced again to save cost. Predictive Scaling is ideal for applications that have rapidly changing demand and follow a consistent pattern.

Implementing custom domain names for private endpoints with Amazon API Gateway

https://aws.amazon.com/blogs/compute/implementing-custom-domain-names-for-private-endpoints-with-amazon-api-gateway/

This is one of the most requested features that customers of private API GWs want.

I'll need to try it and share my insights in the coming weeks.

Custom domain names are simpler and more intuitive URLs that you can use with your applications and were previously only supported with public REST API endpoints. Now you can use custom domain names to map to private REST APIs and share those custom domain names across accounts using AWS Resource Access Manager (AWS RAM).

AWS re:Invent Announcements

Plenty of announcements this year with a couple of them being truly innovative!

Let's start with the big ones.

Amazon Aurora DSQL (Preview)

https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-aurora-dsql-preview/

We all love DynamoDB the NoSQL DB - it's simple to use, fast and fully Serverless (no VPC, scales down to zero, pay for usage).

We finally get the SQL variation of this - Aurora DSQL is the SQL version of what DynamoDB represents. It's a preview, supported only in three regions, but oh boy, this is a good one but I love that we finally get a proper Serverless SQL database - no VPC, no proxies or Bastion server required, proper IAM authorization - get an endpoint URL and start working!

Amazon DynamoDB global tables previews multi-Region strong consistency

https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-dynamodb-global-tables-previews-multi-region-strong-consistency/

When you create a global table, you can configure its consistency mode. Global tables offer the following multi-Region consistency modes: Eventual consistency and Strong consistency (preview).

Amazon DynamoDB global tables now support strong consistency in multi-Region. Global tables are a mechanism that replicates tables across multiple regions. The time to replicate items (all CRUD operations - delete, update, add) was up to 2.5 seconds. In case of regional outages, you'd lose data for up to 2.5 seconds in that region (RPO). Now, this number becomes zero (at an extra cost)! It's only available in three regions and not recommended for production use, but it's a start. Read more here.

Various S3 Improvements

https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-s3-tables-apache-iceberg-tables-analytics-workloads/

https://aws.amazon.com/blogs/aws/new-amazon-s3-tables-storage-optimized-for-analytics-workloads/

S3 Tables introduce table buckets, a new bucket type that is purpose-built to store tabular data. With table buckets, you can quickly create tables and set up table-level permissions to manage access to your data lake. You can then load and query data in your tables with standard SQL, and take advantage of Apache Iceberg’s advanced analytics capabilities such as row-level transactions, queryable snapshots, schema evolution, and more.

https://aws.amazon.com/blogs/aws/introducing-queryable-object-metadata-for-amazon-s3-buckets-preview/

This feature is a companion feature that helps manage buckets with million of objects, finding objects by tags, properties and other metadata.

You can enable capture of rich metadata for any of your S3 buckets by specifying the location (an S3 table bucket and a table name) where you want the metadata to be stored. Capture of updates (object creations, object deletions, and changes to object metadata) begins right away and will be stored in the table within minutes. Each update generates a new row in the table, with a record type (CREATE, UPDATE_METADATA, or DELETE) and a sequence number.

Amazon EventBridge and AWS Step Functions announce integration with private APIs

https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-eventbridge-step-functions-integration-private-apis/

https://aws.amazon.com/blogs/aws/securely-share-aws-resources-across-vpc-and-account-boundaries-with-privatelink-vpc-lattice-eventbridge-and-step-functions/

Today, some customers use AWS Lambda functions or Amazon Simple Queue Service (Amazon SQS) queues to transfer data into VPCs. This undifferentiated heavy lifting can be replaced with a simpler and more efficient AWS RAM solution. Once the resource configuration is shared, you can use it with EventBridge and Step Functions (check out the guide for more information).

Amazon EventBridge and AWS Step Functions now support integration with private APIs powered by AWS PrivateLink and Amazon VPC Lattice.

Customers can securely integrate their legacy systems with cloud-native applications using event-driven architectures and workflow orchestration with fully managed connectivity to private HTTPS-based APIs.