CloudFront and Lambda@Edge as Serverless computing technology

Amazon Web Services (AWS) is without a doubt the most successful public cloud services platform in the world, and within its almost unlimited set of services, there are two that fit very well with the kind of integrations that can be done with Apility.io. I’m talking about the CDN Cloudfront and Lambda@Edge as a serverless computing technology that can run on the ‘edge’ nodes of Cloudfront.
If you’re a reader of our blog this will sound familiar because that’s what we did a few months ago with the Serverless technology from Cloudflare, Workers. So this article is about how to do with the AWS CloudFront and Lambda@Edge stack what we have already done with Cloudflare Workers, not trying to compare each of the technologies but giving a similar approach on another tech stack.

What is the point of Lambda@Edge?

AWS Lambda is the serverless computing technology created by Amazon Web Services that allows you to execute code without provisioning or managing servers. Following the general AWS scheme, the user only pays for the computing time he consumes, and nothing is charged when the code is not running. Very oriented to Backend services it supports the most popular languages, and the user only has to upload the code. Lambda will take care of everything necessary to execute and scale the code with high availability. It also integrates very well with other AWS services, and the code can be configured to be activated automatically from other AWS services. The AWS Lambda code can be displayed in the different regions available, from where the service is provided.

But Lamda@Edge is a particular type of serverless service offered by AWS. It is so particular because it does not run in any available AWS regions, but directly in the Content Delivery Network (CDN) Points of Presence (PoP) that make up the CloudFront service globally. This means that the code that is executed in these PoPs is only a few milliseconds away from where the client who made the request is located. AWS CloudFront currently has 132 Points of Presence distributed globally, so it makes virtually no difference where the customer is located, as it will take advantage of low latency.

Due to the novelty in this type of Serverless at the edge solutions, most of the examples found are related to the transformation and inspection of the HTTP request load between the client and the final Backend services. Redirect traffic under certain circumstances, perform security checks, validate authentications and more.

Unlike normal Lambda, the code that runs in the PoP must be Node.js. In addition, there are other limitations such as access to general AWS resources in the regions. You can take a look here.

So what can Apility API and CloudFront and Lambda@Edge do together?

We have written before how can be a good idea to block malicious users trying to register using anonymous VPN like TOR, anonymous proxies and so on. We have good examples of how to use NGINX and Openresty to block access to blacklisted IP addresses, even with API gateway solutions. So it makes sense to implement something quick to show how to use a Lambda@Edge function that connects to Apility API services to validate the remote client IP address.

The next example will show how to:

  1. Intercept all requests
  2. Extract the remote client IP address.
  3. Inject the remote client IP address as a new request header.
  4. Perform an HTTP request to Apility API badip
  5. If the request returns a list of blacklists, then create a new header parameter Apilityio-Badip and pass them to the origin server
  6. If the request is empty, then create the header parameter Apilityio-Badip empty,
  7. If the Apility API server returns an error pass ‘empty’ to the origin server.
  8. Finally, pass as a header parameter Apilityio-Elapsed-Time with the milliseconds it took to perform the request to Apility API server.

Hence, when a request is made to the Cloudfront endpoint configured, the Lambda@edge function will figure out if the IP is malicious and will pass to the origin server (never back to the remote client!) the new header parameters created.

Let’s get started!

Create a fake backend service and put Cloudfront in front of it

In our example, we want to simulate a backend service that now it will receive a new header with the list of blacklists where the IP address has been found in the case this IP address is malicious. To simulate this backend service we will use a popular debugging service called https://httpbin.org/. This service let you inspect what information is sent to the service for debugging purposes. In our example, we just want to visualize the new headers added to the request.

So, from the AWS management console go to Services > Cloudfront and from the CloudFront Distributions page click on Create Distribution. Now in the next step select ‘Web‘ as the delivery method for your content. For step 2 a new form will be displayed. Enter the following information:

  • Origin domain name: httpbin.org
  • Origin ID: enter a  human-readable name for the distribution.
  • Origin Protocol Policy: HTTPS only
  • Viewer Protocol Policy: HTTPS only
  • Cache Based on Selected Request Headers: Whitelist. In the lists below, enter the custom header ‘client-ip’ and click on ‘Add Custom‘.

We could tweak several more parameters, but it should be enough these for the example. Now click on Create Distribution at the bottom.

Now it will take a few minutes to create the distribution and propagate the DNS changes. Meanwhile, you need to keep the CloudFront domain name created for the distribution:

Apility cloudfront domain

In our example, if you enter in your browser the domain https://d2nkfm3k2zvw7l.cloudfront.net you should see the front page of the site. Since we are working with the headers, we want to use the option to examine headers adding the URI /headers. We should obtain the following page:

These are the headers that a server would get when hit by an HTTP request. Now, we are going to add the Lamda@edge functions to enrich these headers with our own.

Creating the Lamda@Edge Viewer and Origin request functions

Apility Cloudfront events

 

You can use Lambda functions to change CloudFront requests and responses at the following points:

  • After CloudFront receives a request from a viewer (viewer request)
  • Before CloudFront forwards the request to the origin (origin request)
  • After CloudFront receives the response from the origin (origin response)
  • Before CloudFront forwards the response to the viewer (viewer response)

We will use a function for each Viewer and Origin requests. The Viewer request function will extract the client IP address and save it as a new header named client-ip. Since we don’t want to cache all the requests to the origin server, we are going to cache the requests by the client-ip. If we don’t do this, then CloudFront would never execute the Origin request function and the Origin server would never be hit. We have created a Gist in Github as a generic AWS lamba@edge function to obtain the client-ip address.

The Origin request function is slightly more complex. It performs an HTTP request to the Apility.io API to figure out if the client-ip address is malicious or not, as described above. Again, we have created a Gist as an AWS lambda@edge function to add an Apility.io header. You will have to modify the code here to insert your Apility.io API KEY (In Lamda@edge is not possible to use environment variables).

To upload a new Lambda@edge function go the Management Console and go to Services > Lambda. Now click on Create Function. Select Author from Scratch and type the following info for the first function (client-ip):

  • Name: clientip-addheader
  • Runtime: Node.js 8.10
  • Role: Create new role from template
  • Role name: lambda-addheaders-role
  • Policy templates: Basic Lambda@Edge permissions (for CloudFront trigger)

Now click on Create Function and start editing the function. In this new page, paste in the ‘Function code’ in the file ‘index.js’ the content of the aws-lambdaedge-clientip-addheader.jsfunction. Don’t forget to enter your API_KEY in the X-AUTH-TOKEN parameter.Click on Save. Now go to Actions > Publish New Version. Enter a version code, for example, 0.0.1.

Go to the designer view below and click on the list at the right side on CloudFront.Scroll down to the Configure Triggers section and modify the parameters as follows:

  • Distribution: Select the CloudFront Distribution created before.
  • CloudFront Event: Origin Request
  • Enable Trigger and Replicate: Checked

And finally, don’t forget to click on Save.

Now we have to repeat the process with the other function:

Go back to the Management Console and go to Services > Lambda. Now click on Create Function. Select Author from Scratch and type the following info for the first function (client-ip):

  • Name: apilityio-add-header
  • Runtime: Node.js 8.10
  • Role: Choose an existing role
  • Role: service-role/lambda-addheaders-role

Now click on Create Function and start editing the function. In this new page, paste in the ‘Function code’ in the file ‘index.js’ the content of the aws-lambdaedge-apilityio-add-header.jsfunction. Click on Save. Now go to Actions > Publish New Version. Enter a version code, for example, 0.0.1.

Go to the designer view below and click on the list at the right side on CloudFront.Scroll down to the Configure Triggers section and modify the parameters as follows:

  • Distribution: Select the CloudFront Distribution created before.
  • CloudFront Event: Viewer Request
  • Enable Trigger and Replicate: Checked

And finally, don’t forget to click on Save.

There is an extra step to perform in the CloudFront management console: it’s necessary to invalidate the caches. So go to your CloudFront Distribution > Invalidations and click on Create Invalidation. Enter /* and click on ‘Invalidate’.

Now if we open the browser as we did at the beginning https://d2nkfm3k2zvw7l.cloudfront.net/header then we will see three new headers Apilityio-Badip, Apilityio-Elapsed-Time and Client-Ip:

{
  "headers": {
    "Accept-Encoding": "gzip", 
    "Apilityio-Badip": "", 
    "Apilityio-Elapsed-Time": "47", 
    "Cache-Control": "max-age=0", 
    "Client-Ip": "79.156.253.222", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Amazon CloudFront", 
    "X-Amz-Cf-Id": "C4s3FEe_j5qGMZzmH-r-1YNyic5LEkf9esDoEZZlOcEMGEb5CzDjwg=="
  }
}

79.156.253.222 is clean in our search engine: Hence, the Apilityio-Badip is empty and it took 47ms to process the query to our API.

Now I’m going to try from a TOR browser, and the result should be different:

{
  "headers": {
    "Accept-Encoding": "gzip", 
    "Apilityio-Badip": "TOR,STOPFORUMSPAM-180,STOPFORUMSPAM-365,STOPFORUMSPAM-1,UDGER-TOR,BOTSCOUT-1D,STOPFORUMSPAM-30,STOPFORUMSPAM-7,TOR-BLUTMAGIE-FULL,BOTSCOUT-7D,BOTSCOUT-30D,STOPFORUMSPAM-90", 
    "Apilityio-Elapsed-Time": "100", 
    "Client-Ip": "192.42.116.20", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Amazon CloudFront", 
    "X-Amz-Cf-Id": "dleGedj1acAv6Td-caK5tsWT3h8_y3ctcd5JvrO8a8qFh7UNs3P89A=="
  }
}

192.42.116.20 has been identified not only as a TOR exit, but as a node used by spammers and bots… a terrible reputation. The new header Apilityio-Badip has returned a good list of blacklists where the IP address has been found.

Why is this solution better than an integration in our code?

This is only an example of how to integrate AWS CloudFront, Lamda@Edge and Apility.io, and I think there is a lot of room for improvement: for example redirecting to captcha pages before continuing, blocking access right at the edge or lowering access levels to suspicious users. Do you have more ideas? Let us know in the comments section!