Serverless Computing: An Overview of Amazon’s and Microsoft’s Services

With the serverless model, application components such as databases or data processing components are provided and operated, automated and demand-based, by the cloud service provider. The cloud user is responsible for configuring these resources, e.g. with their own code or application-specific parameters, and combining them.

The costs incurred depend on the capacities used, and scaling takes place automatically based on the load. The cloud service provider is responsible for the provision, scaling, maintenance, high availability and management of the resources.

Serverless computing is particularly convenient for workloads that are difficult to anticipate or are short-lived, for automation tasks, or for prototypes. Serverless computing is less suitable for resource-intensive, long-term, and predictable tasks because in this case, the costs can be significantly higher than with self-managed execution environments.

Building blocks

Within the framework of a “serverless computing” Advent calendar, we compared the cloud services of AWS and Azure. The windows open under the hashtag #ZEISSDigitalInnovationGoesServerless.

CategoryAWSAzure
COMPUTE
Serverless Function
AWS LambdaAzure Functions
COMPUTE
Serverless Containers
AWS Fargate
Amazon ECS/EKS
Azure Container Instances / AKS
INTEGRATION
API Management
Amazon API GatewayAzure API Management
INTEGRATION
Pub-/Sub-Messaging
Amazon SNSAzure Event Grid
INTEGRATION
Message Queues
Amazon SQSAzure Service Bus
INTEGRATION
Workflow Engine
AWS Step FunctionsAzure Logic App
INTEGRATION
GraphQL API
AWS AppSyncAzure Functions mit Apollo Server
STORAGE
Object Storage
Amazon S3Azure Storage Account
DATA
NoSQL-Datenbank
Amazon DynamoDBAzure Table Storage
DATA
Storage Query Service
Amazon Aurora ServerlessAzure SQL Database Serverless
SECURITY
Identity Provider
Amazon CognitoAzure Active Directory B2C
SECURITY
Key Management
AWS KMSAzure Key Vault
SECURITY
Web Application Firewall
AWS WAFAzure Web Application Firewall
NETWORK
Content Delivery Network
Amazon CloudFrontAzure CDN
NETWORK
Load Balancer
Application Load BalancerAzure Application Gateway
NETWORK
Domain Name Service
Amazon Route 53Azure DNS
ANALYTICS
Data Stream
Amazon KinesisAnalytics
ANALYTICS
ETL Service
AWS GlueAzure Data Factory
ANALYTICS
Storage Query Service
Amazon AthenaAzure Data Lake Analytics

We compiled an overview of the above-mentioned services and their characteristics, including some exemplary reference architectures on a poster (english version follows). This overview offers a simple introduction to the topic of serverless architecture.

Figure 1: Preview poster “Serverless Computing”

We will gladly send you the poster in original size (1000 x 700 mm). Simply send us an e-mail with your address to info.digitalinnovation@zeiss.com. Please note our privacy policy.

Best practices for serverless functions

Each function should be responsible for a single task (single responsibility principle):  this improves maintainability and reusability. Storage capacity, access rights and timeout settings can be configured more specifically.

As the allotted storage space of a Lambda function is increased, the capacity of the CPU and the network increases as well. The optimal ratio between execution time and costs should be determined by way of benchmarking.

A function should not call up another synchronous function. The wait causes unnecessary costs and increased coupling. Instead, you should use asynchronous processing, e.g. with message queues.

The deployment package of each function should be as small as possible. Large external libraries are to be avoided. This improves the cold start time. Recurring initializations of dependencies should be executed outside of the handler function so that they have to be executed only once upon the cold start. It is advisable to define operative parameters by means of a function’s environment variables. This improves the reusability.

The rights to access other cloud resources should be defined individually for each function and as restrictively as possible. Stateful database connections are to be avoided. Instead, you should use service APIs.

Amazon Athena: SQL – Without the Database

Many companies face the problem that data may be important for new applications years later, but when that time comes, they have long been deleted, or their structure has since been changed several times. Furthermore, data have often been selected, aggregated or transformed before they are first saved, i.e. they are no longer complete when they are to be used later.

For data-intensive projects in the field of data science or AI in particular, suitable data must therefore first be collected again, causing significant delays in the planned projects.

How can data lakes help?

Data lakes are an architectural pattern that aims at making data from various applications available in a centralized ecosystem in the long term. Data from every segment and department of a company are stored in a central location if possible. Unlike with traditional data warehouses, however, the raw data are always stored as well, often in an object storage system such as S3.

The advantage of this method is the fact that the information is available in its entirety, without being reduced or transformed when they are first stored like they are in traditional data warehouses. Consequently, the central data pool does not have a structure that is tailored for specific user requirements, i.e. in this case, the consumers have to deduce the meaning of the data themselves.

In order to be able to efficiently exploit the advantage of data lakes, they should be provided on a cross-departmental level. This way, the data can be retrieved anywhere they are needed.

It is possible to store the data in different zones, allowing access with different levels of abstraction. For data scientists, for example, low-level tools such as Athena are used to gain in-depth, detailed insight into the data pool, whereas more specialized data marts are preferable for technical departments.

What does Amazon Athena offer?

Amazon Athena allows for SQL queries to be executed directly on (semi-)structured data in S3 buckets, without the need for a database with a defined structure. Preparatory ETL (Extract Transform Load) processes as we know them from traditional data warehouses are not required for the work with the raw data, either.

As Amazon Athena is a serverless service, no infrastructure has to be provided. This happens automatically in the background, and is transparent for the user. On the one hand, this reduces the effort and specialist knowledge required, and on the other hand, using this service only causes costs per gigabyte of the data read from S3.

Lecture at online campus event (German only)

The following video of our first online campus event gives more detailed insight into the technical background and the possibilities of application and optimization. It shows discussions about practical experiences and a brief live demonstration in the AWS console.