12 posts tagged with "Cloud"

Cloud

FeedMyFurBabies – Using Custom Resources in AWS CDK to create AWS IoT Core Keys and Certificates

February 11, 2024 · 9 min read

Tinkerer

In a previous blog I talked about switching from CloudFormation template to AWS CDK as my preference for infrastructure as code, for provisioning my AWS Core IoT resources; I mentioned at the time whilst using resources using AWS CDK, as it would improve my productivity to focus on iterating and building. Although I switched to CDK for the reasons I described in my previous blog, there are some CloudFormation limitations that cannot be addressed just by switching to CDK alone.

In this blog I will talk about CloudFormation Custom Resources:

What are CloudFormation Custom Resources?
What is the problem I am trying to solve?
How will I solve it?
How am I using Custom Resources with AWS CDK?

CloudFormation Custom Resources allows you to write custom logic using AWS Lambda functions to provision resources, whether these resources live in AWS (you might ask why not just use CloudFormation or CDK: keep reading), on-premise or in other public clouds. These Custom Resource Lambda functions configured within a CloudFormation template, and are hooked into a CloudFormation Stack's lifecycle during the create, update and delete phases - to allow these lifecycle stages to happen, the logic must be implemented into the Lambda function's code.

What is the problem I am trying to solve?

In my AWS IoT Core reference architecture, it relies on use of two sets of certificates and private keys; they are used to authenticate each Thing devices connecting to AWS IoT Core - this ensures that only trusted devices can establish a connection.

In the CloudFormation template version of my reference architecture, I had in the deployement instructions to manually create 2 Cetificates in the AWS Console for the IoT Core service, this is because CloudFormation doesn't directly support creation of certificates for AWS IoT Core; as shown in the screenshot below.

CloudFormation Stacks

There is nothing wrong with creating the certificates manually within the AWS Console when you are trying out my example for the purpose of learning, but it would best to be able to deploy an entire set of resources using infrastructure as code, so we can achieve consistent repeatable deployments with as minimal effort as possible. If you are someone completely new to AWS, coding and IoT, my deployment instructions would be very overwheling and the chances of you successfully deploying a fully functional example will be very unlikely.

How will I solve it?

If you got this far and actually read what was written up to this point, you probably would have guess the solution is Custom Resources: so lets talk about how the problem described above was solved.

So we know Custom Resources is part of the solution, but one important thing we need to understand is that, even though there isn't the ability to create the certificates directly using CloudFormation, but there is support for creating the certificates using the AWS SDK Boto3 Python library: create_keys_and_certificate.

create_keys_and_certificate

So essentially, we are able create the AWS IoT Core certificates using CloudFormation (in an indirectly way) but it requires the help of Custom Reources (a Lambda function) and the AWS Boto3 Python SDK.

The Python code below is what I have in the Custom Resource Lambada function, it demonstrates the use of the Boto3 SDK to create the AWS IoT Core Certificates; and as a bonus, I am leveraging the Lambda function to save the Certificates into the AWS Systems Manager Parameter Store, this makes it much more simplier by centralising the Certificates in a single location without the engineer deploying this reference architecture having to manually copying/pasting/managing the Certificates - as I have forced readers in my original version of this reference architecture deployment. The code below also manages the lifecycle of the Certificates as the CloudFormation Stacks are deleted, by deleting the Certificates it created during the create phase of the lifecycle.

The overall flow to create the certificates is: Create a CloudFormation Stack --> Invoke the Custom Resource --> invoke the Boto3 IoT "create_keys_and_certificate" API --> save the certificates in Systems Manager Parameter Store

import os
import sys
import json
import logging as logger
import requests
import boto3
from botocore.config import Config
from botocore.exceptions import ClientError

import time

logger.getLogger().setLevel(logger.INFO)


def get_aws_client(name):
    return boto3.client(
        name,
        config=Config(retries={"max_attempts": 10, "mode": "standard"}),
    )


def create_resources(thing_name: str, stack_name: str, encryption_algo: str):

    c_iot = get_aws_client("iot")
    c_ssm = get_aws_client("ssm")

    result = {}

    # Download the Amazon Root CA file and save it to Systems Manager Parameter Store
    url = "https://www.amazontrust.com/repository/AmazonRootCA1.pem"
    response = requests.get(url)

    if response.status_code == 200:
        amazon_root_ca = response.text
    else:
        f"Failed to download Amazon Root CA file. Status code: {response.status_code}"


    try:
        # Create the keys and certificate for a thing and save them each as Systems Manager Parameter Store value later
        response = c_iot.create_keys_and_certificate(setAsActive=True)
        certificate_pem = response["certificatePem"]
        private_key = response["keyPair"]["PrivateKey"]
        result["CertificateArn"] = response["certificateArn"]
    except ClientError as e:
        logger.error(f"Error creating certificate, {e}")
        sys.exit(1)  

    # store certificate and private key in SSM param store
    try:
        parameter_private_key = f"/{stack_name}/{thing_name}/private_key"
        parameter_certificate_pem = f"/{stack_name}/{thing_name}/certificate_pem"
        parameter_amazon_root_ca = f"/{stack_name}/{thing_name}/amazon_root_ca"

        # Saving the private key in Systems Manager Parameter Store
        response = c_ssm.put_parameter(
            Name=parameter_private_key,
            Description=f"Certificate private key for IoT thing {thing_name}",
            Value=private_key,
            Type="SecureString",
            Tier="Advanced",
            Overwrite=True
        )
        result["PrivateKeySecretParameter"] = parameter_private_key

        # Saving the certificate pem in Systems Manager Parameter Store
        response = c_ssm.put_parameter(
            Name=parameter_certificate_pem,
            Description=f"Certificate PEM for IoT thing {thing_name}",
            Value=certificate_pem,
            Type="String",
            Tier="Advanced",
            Overwrite=True
        )
        result["CertificatePemParameter"] = parameter_certificate_pem

        # Saving the Amazon Root CA in Systems Manager Parameter Store, 
        # Although this file is publically available to download, it is intended to provide a complete set of files to try out this working example with as much ease as possible
        response = c_ssm.put_parameter(
            Name=parameter_amazon_root_ca,
            Description=f"Amazon Root CA for IoT thing {thing_name}",
            Value=amazon_root_ca,
            Type="String",
            Tier="Advanced",
            Overwrite=True
        )
        result["AmazonRootCAParameter"] = parameter_amazon_root_ca
    except ClientError as e:
        logger.error(f"Error creating secure string parameters, {e}")
        sys.exit(1)

    try:
        response = c_iot.describe_endpoint(endpointType="iot:Data-ATS")
        result["DataAtsEndpointAddress"] = response["endpointAddress"]
    except ClientError as e:
        logger.error(f"Could not obtain iot:Data-ATS endpoint, {e}")
        result["DataAtsEndpointAddress"] = "stack_error: see log files"

    return result

# Delete the resources created for a thing when the CloudFormation Stack is deleted
def delete_resources(thing_name: str, certificate_arn: str, stack_name: str):
    c_iot = get_aws_client("iot")
    c_ssm = get_aws_client("ssm")

    try:
        # Delete all the Systems Manager Parameter Store values created to store a thing's certificate files
        parameter_private_key = f"/{stack_name}/{thing_name}/private_key"
        parameter_certificate_pem = f"/{stack_name}/{thing_name}/certificate_pem"
        parameter_amazon_root_ca = f"/{stack_name}/{thing_name}/amazon_root_ca"
        c_ssm.delete_parameters(Names=[parameter_private_key, parameter_certificate_pem, parameter_amazon_root_ca])
    except ClientError as e:
        logger.error(f"Unable to delete parameter store values, {e}")

    try:
        # Clean up the certificate by firstly revoking it then followed by deleting it
        c_iot.update_certificate(certificateId=certificate_arn.split("/")[-1], newStatus="REVOKED")
        c_iot.delete_certificate(certificateId=certificate_arn.split("/")[-1])
    except ClientError as e:
        logger.error(f"Unable to delete certificate {certificate_arn}, {e}")


def handler(event, context):
    props = event["ResourceProperties"]
    physical_resource_id = ""
    

    try:
        # Check if this is a Create and we're failing Creates
        if event["RequestType"] == "Create" and event["ResourceProperties"].get(
            "FailCreate", False
        ):
            raise RuntimeError("Create failure requested, logging")
        elif event["RequestType"] == "Create":
            logger.info("Request CREATE")

            resp_lambda = create_resources(
                thing_name=props["CatFeederThingLambdaCertName"],
                stack_name=props["StackName"],
                encryption_algo=props["EncryptionAlgorithm"]
            )

            resp_controller = create_resources(
                thing_name=props["CatFeederThingControllerCertName"],
                stack_name=props["StackName"],
                encryption_algo=props["EncryptionAlgorithm"]
            )

            # The values in the response_data could be used in the CDK code, for example used as Outputs for the CloudFormation Stack deployed
            response_data = {
                "CertificateArnLambda": resp_lambda["CertificateArn"],
                "PrivateKeySecretParameterLambda": resp_lambda["PrivateKeySecretParameter"],
                "CertificatePemParameterLambda": resp_lambda["CertificatePemParameter"],
                "AmazonRootCAParameterLambda": resp_lambda["AmazonRootCAParameter"],
                "CertificateArnController": resp_controller["CertificateArn"],
                "PrivateKeySecretParameterController": resp_controller["PrivateKeySecretParameter"],
                "CertificatePemParameterController": resp_controller["CertificatePemParameter"],
                "AmazonRootCAParameterController": resp_controller["AmazonRootCAParameter"],
                "DataAtsEndpointAddress": resp_lambda[
                    "DataAtsEndpointAddress"
                ],
            }

            # Using the ARNs of the pairs of certificates created as the PhysicalResourceId used by Custom Resource
            physical_resource_id = response_data["CertificateArnLambda"] + "," + response_data["CertificateArnController"]
        elif event["RequestType"] == "Update":
            logger.info("Request UPDATE")
            response_data = {}
            physical_resource_id = event["PhysicalResourceId"]
        elif event["RequestType"] == "Delete":
            logger.info("Request DELETE")

            certificate_arns = event["PhysicalResourceId"]
            certificate_arns_array = certificate_arns.split(",")

            resp_lambda = delete_resources(
                thing_name=props["CatFeederThingLambdaCertName"],
                certificate_arn=certificate_arns_array[0],
                stack_name=props["StackName"],
            )

            resp_controller = delete_resources(
                thing_name=props["CatFeederThingControllerCertName"],
                certificate_arn=certificate_arns_array[1],
                stack_name=props["StackName"],
            )
            response_data = {}
            physical_resource_id = certificate_arns
        else:
            logger.info("Should not get here in normal cases - could be REPLACE")

        send_cfn_response(event, context, "SUCCESS", response_data, physical_resource_id)
    except Exception as e:
        logger.exception(e)
        sys.exit(1)


def send_cfn_response(event, context, response_status, response_data, physical_resource_id):
    response_body = json.dumps({
        "Status": response_status,
        "Reason": "See the details in CloudWatch Log Stream: " + context.log_stream_name,
        "PhysicalResourceId": physical_resource_id,
        "StackId": event['StackId'],
        "RequestId": event['RequestId'],
        "LogicalResourceId": event['LogicalResourceId'],
        "Data": response_data
    })

    headers = {
        'content-type': '',
        'content-length': str(len(response_body))
    }

    requests.put(event['ResponseURL'], data=response_body, headers=headers)

How I am using Custom Resources with AWS CDK?

What I am about to describe in this section can also be applied to a regular CloudFormation template, as a matter of fact, CDK will generate a CloudFormation template behind the scenes during the Synth phase of the CDK code in the latest version of my IoT Core reference architecture implemented using AWS CDK: https://chiwaichan.co.nz/blog/2024/02/02/feedmyfurbabies-i-am-switching-to-aws-cdk/

If you want to get straight into deploying the CDK version of reference architecture, go here: https://github.com/chiwaichan/feedmyfurbabies-cdk-iot

In my CDK code, I provision the Custom Resource lambda function and the associated IAM Roles and Polices using the Python code below. The line of code "code=lambda_.Code.from_asset("lambdas/custom-resources/iot")" loads the Custom Resource Lambda function code shown earlier.

# IAM Role for Lambda Function
        custom_resource_lambda_role = iam.Role(
            self, "CustomResourceExecutionRole",
            assumed_by=iam.ServicePrincipal("lambda.amazonaws.com")
        )

        # IAM Policies
        iot_policy = iam.PolicyStatement(
            actions=[
                "iot:CreateCertificateFromCsr",
                "iot:CreateKeysAndCertificate",
                "iot:DescribeEndpoint",
                "iot:AttachPolicy",
                "iot:DetachPolicy",
                "iot:UpdateCertificate",
                "iot:DeleteCertificate"
            ],
            resources=["*"]  # Modify this to restrict to specific secrets
        )

        # IAM Policies
        ssm_policy = iam.PolicyStatement(
            actions=[
                "ssm:PutParameter",
                "ssm:DeleteParameters"
            ],
            resources=[f"arn:aws:ssm:{self.region}:{self.account}:parameter/*"]  # Modify this to restrict to specific secrets
        )

        logging_policy = iam.PolicyStatement(
            actions=[
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            resources=["arn:aws:logs:*:*:*"]
        )
       
        custom_resource_lambda_role.add_to_policy(iot_policy)
        custom_resource_lambda_role.add_to_policy(ssm_policy)
        custom_resource_lambda_role.add_to_policy(logging_policy)

        # Define the Lambda function
        custom_lambda = lambda_.Function(
            self, 'CustomResourceLambdaIoT',
            runtime=lambda_.Runtime.PYTHON_3_8,
            handler="app.handler",
            code=lambda_.Code.from_asset("lambdas/custom-resources/iot"),
            timeout=Duration.seconds(60),
            role=custom_resource_lambda_role
        )


        # Properties to pass to the custom resource
        custom_resource_props = {
            "EncryptionAlgorithm": "ECC",
            "CatFeederThingLambdaCertName": f"{cat_feeder_thing_lambda_name.value_as_string}",
            "CatFeederThingControllerCertName": f"{cat_feeder_thing_controller_name.value_as_string}",
            "StackName": f"{construct_id}",
        }

        # Create the Custom Resource
        custom_resource = CustomResource(
            self, 'CustomResourceIoT',
            service_token=custom_lambda.function_arn,
            properties=custom_resource_props
        )

When you execute a "cdk deploy" using the CLI on the CDK reference architecture, CDK will synthesize from the Python CDK code, a CloudFormation template, and then create a CloudFormation Stack using the synthesized CloudFormation template for you.

For more details on the CDK AWS IoT reference architecture and deployment instructions, please visit my blog: https://chiwaichan.co.nz/blog/2024/02/02/feedmyfurbabies-i-am-switching-to-aws-cdk/

FeedMyFurBabies – I am switching to AWS CDK

February 2, 2024 · 7 min read

Chiwai Chan

Tinkerer

I have been a bit slack on this Cat Feeder IoT project for the last 12 months or so; there have been many challenges I've faced during that time that prevented me from materialising the ideas I had - many of them sounded a little crazy if you've had a conversation with me in passing, but they are not crazy to me in my crazy mind as I know what I ramble about is technically doable.

Examples of the technical related challenges I had were:

CloudFormation: the initial version of this project was implemented using CloudFormation for the IaC, here is the repository containing both the code and deployment instructions. If you read the deployment instructions, you will notice there are a lot of manual steps required - e.g. creating 2 sets of certificates in AWS Iot Core in the AWS Console; and copying and pasting values to and from the CloudFormation Parameters and Outputs, even though at the time I made my best efforts to minimise the manual effort required while coding them. It was not a good example to get it up and running especially if you are new to AWS, Arduino or IoT; as I myself struggled at times to deploy my own example.
Terraform: I ported the CloudFormation IaC code to Terraform some time last year, you can find it here. Nothing is wrong with Terraform itself; I just keep forgetting to save or misplaced my terraform state files every time I resume this project. In reality I might leverage both Terraform and CDK for the projects/micro-services I create in the future, but it all really depends on what I am trying to achieve at the end of the day.

Deploying the AWS CDK version of this Cat Feeder IoT project

So, the commands below are the deployment instructions taken from the AWS CDK version of this project, you can find it here: https://github.com/chiwaichan/feedmyfurbabies-cdk-iot

git clone git@github.com:chiwaichan/feedmyfurbabies-cdk-iot.git
cdk feedmyfurbabies-cdk-iot
cdk deploy

git remote rm origin
git remote add origin https://git-codecommit.us-east-1.amazonaws.com/v1/repos/feedmyfurbabies-cdk-iot-FeedMyFurBabiesCodeCommitRepo
git push --set-upstream origin main

The commands above are all you need to execute in order to deploy the Cat Feeder project in CDK - assuming you have the AWS CDK and your AWS credentials configured on the machine you are calling these commands on; the first group of commands checks out the CDK code which deploys an AWS CodeCommit repository and a CodePipeline pipeline - creates the 1st CloudFormation Stack using a CloudFormation template; and the second group of commands pushes the CDK code into the newly created CodeCommit repository created in the first group of commands, which in turns trigger an execution in CodePipeline and the pipeline deploys the resources for this Cat Feeder IoT project - creates the 2nd CloudFormation Stack using a different CloudFormation template.

The two groups of commands creates the 2 CloudFormation Stacks shown in the screenshot below, the stack "feedmyfurbabies-cdk-iot" provisions the CodeCommit repository and CodePipeline - using the 1st CloudFormation template, and the stack "Deploy-feedmyfurbabies-cdk-iot-deployed-service" provisions the resources for this Cat Feeder IoT project - using the 2nd CloudFormation template.

CloudFormation Stacks

FYI, I did not come up with the pattern I just described above that deployed the two CloudFormation Stacks: one for the pipeline and the other for the AWS resources for this Cat Feeder IoT project; I only came across it during one of those AWS online workshops I was using to learn CDK and noticed this pattern and found it useful, and pretty much decided to adopt it for my projects going forward.

Test out the deployed solution

The resources that are relevant to architecture of this AWS IoT solution are shown in the diagram below.

Deployed resources

There are 2 sets of certificates and 2 sets of AWS IoT Things and policies deployed by the "Deploy-feedmyfurbabies-cdk-iot-deployed-service":

IoT Certificates

The 1st set of certificates and IoT Thing is hooked up to the AWS Lambda function (Lambda Thing) shown in the diagram, this Lambda function acts as an AWS IoT Thing (uses the certificates saved in Systems Manager Parameter prefixed with "/feedmyfurbabies-cdk-iot-deployed-service/CatFeederThingLambda") and is fully configured as one along with all the neccessary certificates and permissions to send an MQTT message to the "cat-feeder/action" topic in AWS IoT Core; this is a very convenient way to see in action how one could send MQTT messages to AWS IoT Core using Python, as well as a good way to confirm the deployment was successful by testing it out!

Before we invoke the Lambda Thing/function, we need to subscribe to the "cat-feeder/action" topic so that we could see the incoming messages sent by the Lambda function.

Subscribe to IoT Topic

Then we invoke the Lambda function in the AWS Console:

Lambda Result

Make sure you get a green box confirming the MQTT message was sent.

The code in the Lambda is written in Python and it sends a JSON payload (the dictionary variable shown in the code below) to the IoT Topic "cat-feeder/action"

Lambda Code

Now lets go back to AWS IoT Core to confirm we have received the message:

AWS IoT Core MQTT received

We can see the message received in IoT Core is the dictionary object sent by the Lambda code

Conclusion

Using CDK does not eliminate all the issues you might encounter when using CloudFormation - I have a future blog on creating and using CloudFormation Custom Resources lined up; because at the end of the day CDK just generates a CloudFormation template and handles the deloyment of the CloudFormation Stack for you without you having to manage the CloudFormation Stacks or templates; the intent of this blog is to demonstrate how little effort is required to deploy an AWS IoT solution using CDK, compared with the same architecture I shared in my Github repo 2 years ago but with instructions using a CloudFormation template deployment that was long and tedious in manual steps.

The ultimate aim of change in IaC is to just focusing on building and iterating!

I do often talk too much in my blogs, but in this instance the instructions to deploy this solution for yourself to try out is very minimal, with the majority of the content focused on the resources deployed; and what each resource is for and how they interact with each other.

Extra

You may have noticed that there are 2 sets of certificates deployed in IoT Core and 2 IoT Things in this reference architecture, this is because you can take the 2nd set of certificates (prefixed with "/feedmyfurbabies-cdk-iot-deployed-service/CatFeederThingESP32") and Thing provisioned purely for you to send MQTT message to AWS IoT Core from your own IoT hardware devices / micro-controllers.

Your own Thing

If you want to try it out, you will need to use the IoT Core Endpoint specific to your AWS Account and Region; you can either find it in the AWS IoT Core Console, or copy it from the CloudFormation Stack's Output:

IoT Core Endpoint

The Lambda Thing we tested above can be used to send MQTT messages to your own IoT device/micro-controller, as the 2nd set of certificates is configured with the neccessary IoT Core Policies to receive the MQTT messages sent to the Topic "cat-feeder/action", and the certificates is also configured with the policies to send MQTT messages to a second IoT Topic called "cat-feeder/states"

Your own Thing Architecture

I have a future blog that will demonstrate how to do this using MicroPython and a Seeed Studio XIAO ESP32C3 - so watch this space.

FeedMyFurBabies – Event-Sourcing using Amazon EventBridge

March 18, 2023 · 8 min read

Chiwai Chan

Tinkerer

In my previous AWS IoT Cat feeder project I used a Lambda function as the event handler each time the Seeed Studio AWS IoT 1-click button was pressed, the Lambda function in turn published an MQTT message to AWS Iot Core which is received by the Cat Feeder (via a Seeed Studio XIAO ESP32C3 micro-controller) to dispense food into either one of the cat bowls or both (depending on the type of press performed on the IoT button). The long term goal is to integrate the AWS IoT Cat Feeder with the Feed My Fur Babies project.

In this Part 2 of the Feed My Fur Babies blog series, I will be introducing the Event-Sourcing pattern to the https://www.feedmyfurbabies.com architecture; describe the benefits of designing an architecture around Event-Souring and an example implemented using Terraform. I recently learnt Terraform and I now prefer it over the native IaC.

Current state architecture

Here is the current state of the Cat Feeder architecture amd the IoT related resources previously deployed in AWS using CloudFormation:

Current State Architecture

The responsibilities of each of the resources deployed in the diagram prior to the introduction of the Event-Sourcing pattern into the architecture are:

AWS IoT 1-Click Button: This is an IoT button I physically press to emit an event to dispense food into one or both of the cat bowls, this button can be used anywhere where there is a WIFI connection
AWS IoT Core Certificates: Certificates are associated with resources and devices that interacts with the AWS IoT Core Service, either publishing an MQTT message to an AWS IoT Topic, or receiving an MQTT message from a Topic
AWS Lambda - IoT 1-Click Event Handler & sends an MQTT message to an Iot topic: This Lambda function is responsible for handling incoming events created by the AWS IoT 1-Click Button, as well as translating the event into an MQTT message before sending it to an AWS IoT Core Topic. This is the component in the architecture that is the main focus of this blog post, we will describe how this component will be re-architectured and decomposed to work in conjunction with the introduction of the Event-Sourcing pattern.
AWS IoT Core: This is the IoT service that manages the IoT Topics and Subcriptions to said Topics
Seeed Studio XIAO ESP32C3: a micro-controller subscribed to the IoT Topic (the one the Lambda sent MQTT messages to) that will dispense food into 1 or 2 cat bowls when it receives an MQTT message from the Topic

For further details on what role this architecture plays in the Smart IoT Cat Feeder, visit Part 2 of the Smart Cat Feeder Blog Series.

What is Event-Sourcing?

The idea of Event-Sourcing is to capture all events that occurs within a system during its lifetime, these events are stored in an immutable ledger in the sequence in which they occurred in.

One of the biggest benefits of capturing all the events of a system is that we are able to replay every single event that has ever occured within the system (partially or as a whole) at a later time (lets say 5 years later), and have the ability to selectively replay the 5 years worth of events to one or more specific downstream event bus targets: an event bus target could be a new application that was deployed into your production environment 5 years after the first event was created; what this means is that we could hydrate this new application's datastore with 5 years worth of data as if it existed at the beginning when the first event occured. Also, imagine being able to re-create entire datastores with the full history for 100s of applications (where each application has its own datastore) within your system landscape, these datastores could be hydrated with the full history of events stored in the immutable Event-Sourcing ledger, or even replay the events that occur from the very first event and up to a specific event at a given point in time (e.g. half of the entire ledge) - effectively providing you with the ability to create any datastore in any datastore engine with the data inside in a state to any given point in time.

How do we introduce Event-Sourcing into the architecture?

Step 1

We start off with the AWS Lambda function shown in the current state architecture where its responsibilites is to handle the events received from the AWS IoT 1-Click Button each time it is pressed, as well as sending an MQTT message to an AWS Iot Core Topic in response to each incoming event; essentially it has 2 distinct responsibilities

Step 2

Next, we decompose the single Lambda function into 2 separate distinct Lambda functions based on its 2 responsibilities, then we chain the 2 Lambda functions together to preserve its functionality - what we have effectively achieved by doing this is decoupling the 2 responsibilities as 2 separate units of work - resulting in 2 separate compute resources.

The benefits by a decoupled architecture are:

Each of the Lambda functions can be implemented in different languages - e.g. one in Python and the other can be in Java
Independent release cycles for each of the Lambda functions
Changes to either one of the 2 responsibilities can be made independently of each other
Each Lambda function can be scaled independently of another

Step 3

In this step we use Amazon EventBridge as the Event-Sourcing store - known as the immutable ledger we described earlier, we will also leverage EventBridge as a serverless event bus to help us receive, filter, transform, route and deliver events to downstream services (event bus targets). In this instance we will slip EventBridge in between the 2 Lambda functions and we will be storing every single IoT event sent by the IoT Button into the immutable ledge,

Benefits of adding EventBridge to the architecture:

The IoT 1-Click Lambda handler no longer directly calls the downstream Lambda function - so it is unaware of the downstream targets
The IoT events are stored in an immutable ledger in the sequence in which they occurred in
Prepare the system landscape with the ability to more easily develop micro-services in an Event-Driven architecture using the orchestration pattern

Target State Architecture

This is the end result of introducing Event-Sourcing to the architecture; it may not look like much benefits has been gained from adding Amazon EventBridge - in fact one might think that we've added more components and in effect created more moving parts and complexity. But I have decided to specifically introduce this very early into the architecture as an investment so that I am in a position to rapdily build out my micro-service architecture - reaping the rewards from the get go.

Try it out for yourself

I have created a GitHub Repository to deploy a complete working example of the resources shown in the Target State Architecture using Terraform.

I suggest you deploy this to have a play for yourself:

Clone the repository: "git clone git@github.com:chiwaichan/feedmyfurbabies-202303-eventsourcing-using-eventbridge.git"
Setup your Terraform environment
Run: "terraform init && terraform apply"

Also, check out each individual resource deployed by this Terrafrom code.

Create a test IoT 1-Click event to pass the event end-to-end through all the deployed resources

This is the IoT 1-Click Lambda function handler shown in the AWS Console

Create a test event so we can invoke the Lambda function to simulate an event as if a physical IoT Button is pressed

Here we can view the logs for this Lambda function Test invocation

The IoT 1-Click Lambda function handler sends an Event to the Custom EventBridge Event Bus named "feedmyfurbabies"

EventBridge Event Bus

The event sent to the Custom Event Bus matches on the "source" attribute with a value of "com.feedmyfurbabies" with the Custom Event Bus Rule named "feeds-rule"

EventBridge Event Bus Rule

This Lambda function is the downstream target of the Custom Event Bus Rule that was mactched by the event and is responsible for interpreting the event message and translate it into an MQTT message, then in turn sends it to the AWS IoT Core Topic "cat-feeder/action" that you can subscribe to using a micro-controller, e.g. Seeed Studio XIAO ESP32C3.

Send MQTT Message Lambda

Send MQTT Message Lambda - Monitoring

Here we can see the logs of the event received by the EventBridge Custom Bus Rule

Send MQTT Message Lambda - Logs

In the AWS Console for the AWS Iot Core Service, we can subscribe to Topics to receive an MQTT message right at the end of the downstream services - this is useful if you don't use a micro-controller

IoT Core - MQTT Client Subscribe Topic

Future State Architecture

We end up with an architecture that will enable us to easily add targets to consume events managed by the EventBridge Custom Event Bus, doing so in a way where the IoT 1-Click Lambda function has no knowledge of any newly created subscribers of the Custom Event Bus.

In a future blog I will demonstrate this.

4×4 fun with a bit of Iot, vlogging and Machine Learning – Part 1

January 27, 2023 · 7 min read

Chiwai Chan

Tinkerer

Jimny

Months prior to the very first lockdown I had gotten myself on the waitlist for a 4x4 Jimny, so I could take it to the beach without worrying about getting beached like I likely would in a regular front wheel drive hatchback; or take it to the bushes to climb some hills and see how far I would get without flipping it (badly). Knowing I wouldn't be able to drive it for an long indefinitely amount time so I decided to cancel the order back then; in some ways I was sad then but in many ways I am happy now that I have had a fair amount of time to have a good think about what else I could do with the Jimny whilst taking it on these adventures.

The time spent mulling has lead to another new blog series; this will take on a similar build approach I took while building my Iot Cat Feeder, but this time it will be on a larger scale in terms of the amount of moving parts and components; also, I would get to enjoy myself this time instead of the cats. For those that are unfamiliar with the approach I took in my prior build, I will start the blog series by proposing an idea I have in mind with a certainty of about 70% of achieving a functional prototype - this is mainly due to not having the background nor experience on most of the skills required to build out this idea.

Generally, I would create a new Part for the Blog Series as I achieve a milestone during the build, where I talk about what was achieved in the milestone and provide the details on how I got there; where possible I would include a public Github repository for any code written for the build.

So enough of my rumbling.

What is it that I am wanting to build?

As you may have already predicted what is involved in this build from the image above, yes it will involve a 4x4 - I have a Jimny on the way; and some cloud buzz words like Iot and Machine Learning.

The goals of this build is to:

Develop a solution to capture video recordings of my 4x4 adventures of the entire journey with 5+ viewpoints around the vehicle in 4K resolution, realistically I might only be about to capture full HD videos as explained further down this blog.
Capture and store the vehicle's telemetries at regular intervals as the vehicle is driven using the CAN Bus protocol, e.g. speed, RPM of the engine and any other states the car is in.
Capture other useful data not monitored by the vehicle's CAN Bus, such as GPS co-ordinates and the environment where the vehicle is at during the time - e.g. temperature, humidity, luminosity and many more using hand picked sensors.
Ingest in real time all the videos, CAN Bus and sensor data captured into an AWS Datalake

4x4

If I were able to achieve all the goals in the list above, then I would like to also achieve these goals:

Create a Digital Twin using AWS TwinMaker of the Jimny and associate all the sensors and devices captured with it
Train Machine Learning models using the data ingest in the AWS Datalake
Do something with the AWS Deeplens sitting in my draw for the past year with the ML models created above, perhaps warn me I am able to do something that will cause the Jimny to land on its roof like last time by making predictions on an ML model.
Have some sort of cloud solution that spits out a video for each of my trips so I can use it to upload to YouTube, with the video displaying some of the telemetries and sensor data captured.

AWS

At the end of the blog series I will conclude whether I was able build something that was functional, and whether or not I was able to achieve all the goals I have stated in the 2 lists above.

Where I am in terms of progress for this build?

It has been a bit of a challenge to source certain types of electronic components at the moment as some may already know, so I've only managed to source the majority of components required at this point in time.

So far I have source the following components:

Starlink RV version

Starlink RV

I had been wanting one of these for a long time so when I saw it on special I jumped on it straight away. This is the RV version so it means it can be taken anywhere with me, so I will mount this on a roof rack - one reason why I do not want to have the Jimny on its roof because it would not be fun to be somewhere with no internet for a long period of time.

Randy

The ideal location to place the Starlink is in a spot with no obstruction and as far away from everything as possible, however, when I tested it out in my tiny back yard with it sitting in the center surrounded by 2 houses (both 2 stories) and a high fence, I got the following results:

Starlink RV SpeedTest

Although the speed is as fast as you get on the one of the slowest fibre plan available in New Zealand, the upload speed is the ultimate factor that determines how many live feeds we can ingest into the Datalake; a 4K resolution video is 20Mbps so that does not leave much bandwidth for all of the other data types, results may be better depending on where I am at the time, and also, unless Starlink offers symmetrical upload speeds then we are forced with full HD feeds, FYI download speeds can be as high as 500Mbps in some parts of the world. One option is to store the data onto a NAS drive via the Home Assistant installed on the LinkStar - a device similar to a Raspberry PI, then upload the videos into the Datalake after I get home - I like to avoid this as it is too much admin.

Router / Wifi

Got a few lying around at house doing nothing.

Cameras

I also have some spare cameras to use; the feed on these can be served using the RTSP protocol, I also have a few ESP32-CAMs I recently purchased so this build will use a combination of the 2 camera types. Most webcams can be used for this.

Seeed Studio XIAO ESP32C3

I have a bunch of these as they are my go tos when I build projects using micro-controllers; they are like $5 USD: Seeed Studio XIAO ESP32C3, one of, if not the smallest ESP32s I've come across and is more reliable than other ones I've used previously.

Seeed Studio XIAO ESP32C3

I also have various sensors for use that measures:

Distance from objects
Temperature
Sound
Humidity
Luminosity
CO2

Seeed Studio LinkStar with Home Assistant

Seeed Studio LinkStar

I'll be using this to pull the feeds from the cameras, as well as saving the videos onto a NAS if we go down that route.

What is left to source?

Seeed Studio WIO ESP32 CAN - this is a kit I'll be using to interface with the CAN Bus to retrieve telemetries from the Jimny.
Jimny

Next blog

The next blog in this series I will take all the components I currently have and link it all up and detail what and how I got there.

Breaking Down Monolithic Subnets

May 28, 2022 · 5 min read

Chiwai Chan

Tinkerer

As my knowledge and experience of Cloud networking grew from designing network architectures over time and also more of lately from reviewing client network architectures, I've come to realise and appreciate the need to designing a proper network architecture that includes the long-term considerations, as early as possible - especially before a projects begins and definately before any resources are deployed into any VPCs.

In the past, I didn't have much of an interest into how a network was configured in an office, or how routing to a publicly accessible on-premise hosted application was set up when I first started in IT, this is mainly due to understanding very little about networking and also just because the networking was looked after by somebody else. It is only when I started using Cloud services where it allowed me to learn networking, much easier, perhaps because I was able to design, build and play around with my own dedicated isolated network in minutes without worrying about breaking things.

In this blog we will illustrate what a Monolithic Subnet looks like, and the problems that comes along with them; and illustrate one way to break down a Monolithic Subnet into multiple smaller Subnets - how solutions can benefit from designing workloads to leverage dedicated individual Subnets where each Subnet is for a set of common resources type or grouping. VPCs is also susceptible to becoming monolithics so I will write a separate blog about it in the future. Workloads should always be deployed across multiple AZs architecture for high-availability but to make this blog more digestable we will talk about a one AZ architecture.

We often see systems evolve over time whether they are applications or databases, that get to a point where they are too big to run, maintain or work with: these systems are commonly known as Monolithic Applications or Monolithic Databases.

Networks and the constructs of Networking can also be susceptible to becoming a monolith, early signs and symptoms could be: 1) CIDR block based rules in Security Groups or NACLs encompasses IP addresses of resources that should not be opened up to: a cause of this may be due to the number of different groups of resources within a Subnet where the IP addresses of each resource is non-deterministic – it may be difficult to design a minimum set of CIDR block values for rules to satisfy the least privilege principle. 2) conversely, CIDR block rules in Security Groups or NACLs with too many granular rules may also be a sign of a Monolithic Subnet – the common symptom are quotas of rules being reached too often.

Let’s take the example of 4 groups of compute resources, each group has a different network traffic usage behaviour than the other groups – group #1 communicates with resources in VPC X and VPC Y, while group #3 communicates with resources in VPC Y and VPC Z.

This is an example of how the groups of resources could be represented in a Subnet ordered by their Private IP address:

Often CIDR block rules that are too broad are used, which opens access to resources that should not be included. The following rules also allows in resources Groups #2 and #4 to communicate with resources in VPC X, Y and Z, when they are not expected to interaction with resources in any of those VPCs.

Conversely, implementing granular rules to follow best practice of least privilege may lead to quotas of Security Groups and NACLs to be reached; in any case, least privilege should be followed.

Solution

The solution is to break the groups of resources down into a Subnet for each Group. There is no hard rule that states a VPC must contain X number of tiers of Subnets - Subnets are used to group similar resources with similar network traffic patterns, if there are many groupings of resources then it is perfectly fine to create as many number of tiers of Subnets – one Subnet for each Grouping.

As a result, rules are more specific, targeted and makes it straight forward to implement the least privilege principle.

When groups of resources are broken down from a monolith Subnet into multiple Subnets, there are other benefits created as a by-product:

With each resource group deployed in a separate dedicated Subnet early on it will likely reduce or eliminate (a good solution is to not have a problem to begin with) future re-work that combats increased architecture complexity, which may often require re-deploying resources into new Subnets - to me this is unnecessary effort if we can avoid it, especially for resources that requires a lot of manual effort to deploy
NACLs rules are broken down and grouped into its respective resources and Subnet, which leads to fewer number of rules in a NACL – reduce possibility of reaching the quotas
When all resources are deployed within one Subnet only Security Groups could be leveraged to implement firewall rules, but when resources are broken down into multiple Subnets then NACLs can be leveraged as well
Security Posture is improved because certain traffic does not enter the Subnet from adjacent Subnets if NACL rules are implemented appropriately
Depending on how granular you break down your monolithic Subnets, if it is a very fine break down then you are setting up your network architecture to be in a position to implement tighter controls gearing towards a micro-segmentation network architecture

This solution compliments the use of networking solutions in other blogs I have written:

Leveraging AWS Prefix Lists

May 28, 2022 · 7 min read

Chiwai Chan

Tinkerer

AWS VPC Prefix List is a feature of the AWS Networking that has been around for a short while, however, I have yet to see it leveraged to its full potential, and more often than not I have not seen them used at all.

There are 2 types of Prefix Lists:

AWS-managed Prefix Lists: as the name indicates these lists are managed by AWS, and they are used to maintain a set of IP address ranges for AWS services, e.g. S3, DynamoDB and CloudFront.
Customer-managed Prefix Lists: these are created and maintained by anyone who has access to the AWS Console, AWS APIs or AWS SDKs. This is what we will be focusing on.

In this blog we will go into:

What Customer-managed Prefix Lists are
How they can be leveraged by AWS Security Groups
How they can be leveraged by AWS Subnet Route Tables
How they can be leveraged by AWS Transit Gateway Route Tables
Considerations

AWS VPC Customer-managed Prefix List is a great tool to have available as it provides the ability to track and maintain a list of CIDR block values, which can then be referenced by other AWS Networking components in their rules or route tables. Each Prefix List supports either IPv4 or IPv6 based addresses, and a number of expected Max Entries for the list must be defined; the number of entries in the list cannot exceed the Max Entries.

You can use Prefix List to maintain a list of CIDR blocks of Subnets or VPCs; or, track a list of similiar IP addresses based on a grouping of your choice, e.g. EC2 instances with a certain function - you can even track CIDR values of Subnets, VPCs and EC2 within the same list.

I have a blog on how to automatically maintain a list of EC2 instances Private IP addresses based on a Tag set against an EC2 instance: Maintain a Prefix List of EC2 Private IP Addresses using EventBridge

Let's create a Prefix List in the AWS Console

Prefix List – Security Group Reference

Customer-managed Prefix List is great option to have to centrally manage and track a list of CIDR blocks allowed to ingress an ENI by referencing Prefix Lists in Security Groups, a single Prefix List instance can be referenced by one or many Security Groups within the same account or cross-account.

Let's take a look at an example

This is especially useful in scenarios where you have fleet of EC2 instances where you like to allow the same network traffic sources to ingress on Port 22 to perform administration tasks, these fleet EC2 instances could scatter across multiple VPCs, and may even be scattered across multiple AWS accounts.

Often, we add a new Source CIDR to all Security Groups as we allow a new machine to perform administration tasks to the same fleet of EC2 instances, or even remove (or not when we forget) a CIDR Source when a machine is retired. In the past we would have modified each and every one of these Security Groups.

Here is how we can leverage Customer-managed Prefix Lists with Security Groups:

Here, under the same Security Group rules outcome we externalise the CIDR values into a Prefix List and reference the list in all 3 Security Groups; in the case of Security Groups spanning across multiple AWS accounts the Prefix Lists can be shared with other AWS accounts using Resource Access Manager (RAM). Now, we can allow a new machine to perform administration tasks across the entire fleet of EC2 instances by only adding a new CIDR Source to a single location, conversely, we can remove a machine by deleting a CIDR Source. There is also an added benefit of reduced effort in the need to identify which Security Groups have a rule for an IP address if we were to remove access across the entire fleet using this pattern – because it is maintained in a single location.

Prefix List – Subnet Route Table Reference

Another way to use Prefix Lists is to use them to centrally manage and track a list of CIDR block destinations to route traffic out of a Subnet’s Route Table to the same Target, a Prefix List can be referenced by one or many Subnet Route Tables within the same account or cross-account using RAM.

Let's take a look at an example

Below, we have a scenario with 3 different Route Tables across the two VPCs, with each Route Table with the same Transit Gateway Target for the same set of Destinations; and also the same Destinations routed to their respective Egress Only Internet Gateway (EIGW) for their VPC.

Here is how we can leverage Customer-managed Prefix Lists with Subnet Route Tables:

We have externalised the Destination CIDR values of the 3 Route Tables into 2 separate Prefix Lists: 1st Prefix List contains the CIDR block values of Destinations routed for the EIGW in their respective VPC; the 2nd Prefix List contains CIDR block values of Destinations routed for the same Transit Gateway instance all VPCs is an attachment of.

Prefix List – Transit Gateway Route Table Reference

Lastly, in a Transit Gateway Route Table you have the option to either to define static routes or have routes dynamically propagated from a Transit Gateway attachment. You also have the option to use a Prefix List for routing.

Here is how we can leverage Customer-managed Prefix Lists with Transit Gateway Route Tables:

To reference a Prefix List in a Transit Gateway Route Table, you have to reference it under the "Prefix list references" section:

Considerations

The aggregated total Max Entries of all Prefix Lists referenced by a resource (e.g. a Security Group) is counted towards the resource's quota - not the aggregated total of actual entries of all Prefix Lists. Be conscious of the Prefix List you reference in a resource, does the resource referencing the Prefix List require all the CIDR values offered in the list? if not, you are not using Prefix Lists economically.
If the same Prefix List instance is referenced by multiple AWS resources then consistency is enforced - operational effort is reduced due to fewer changes by not having to change a values in multiple locations.
Before you add or remove a CIDR value from a Prefix List, consider the flow on impact it may have to the downstream resources that reference this list, as you may inadvertently terminate some traffic flow, or worse, open up traffic to sources you don't intend to.

Conclusion

One of the things I have noticed during my short time in consulting so far is that organising Cloud resources (in particular Networking), structuring them correctly and consistently across multiple environments will set up a solid foundation for organisations in the long term, however, it is often an area that is overlooked and is only paid attention to when the rate of innovation is slowed down due to complexities and inconsistencies. Prefix Lists is a great option to have to improve consistency and operational efficiencies.

Here I have only detailed the basic use of Customer-managed Prefix Lists, but in my other blog I have a more advanced use case leveraging Prefix Lists: Work-around for cross-account Transit Gateway Security Group Reference

This solution compliments the use of networking solutions in other blogs I have written:

Maintain a Prefix List of EC2 Private IP Addresses using EventBridge

May 28, 2022 · 7 min read

Chiwai Chan

Tinkerer

AWS VPC customer-managed prefix list is a great feature to have in a tool box as it provides the ability to track and maintain a list of CIDR block values, that can be referenced by other AWS Networking component’s in their rules and tables. Each Prefix List supports either IPv4 or IPv6 based addresses, and a number of expected Max Entries for the list must be defined; the number of entries in the list cannot exceed the Max Entries. Check out my blog on AWS Prefix List to learn how it could be referenced and leveraged by other AWS Networking components.

In this blog we will:

Walk-through the proposed solution
Deploy the solution from a SAM project hosted in my GitHub repository
Stop the running EC2 instance provisioned by the SAM project's CloudFormation stack - this will de-register the Private IP address of the EC2 instance from the Prefix List (also provisioned by the CloudFormation stack)
Start the same EC2 instance - this will register the Private IP address of the provisioned EC2 instance back into the Prefix List
Manually create an EC2 instance with a Tag value of "prefix-list=eventbridge-managed-prefix-list"

In this solution we propose an architecture to maintain a list of EC2 Private IPs in a Prefix List by leveraging EventBridge to listen for EC2 Instance State Change Events.

Depending on the EC2 Instance State Change value we will perform a different action against the Prefix List using a Lambda Function: if the Instance State is “running" then we register the Private IP address into the Prefix List; or, deregister the Private IP address from the Prefix list when the Instance State is “stopping”.

When the event is received by the Lambda function, it will perform a lookup on the Tags of the EC2 instance for a Tag (e.g. prefix-list=eventbridge-managed-prefix-list) that indicates which Prefix List (or Lists) the Lambda function will register/de-register the Private IP against. The Prefix List should be maintained economically - because it affects the quotas of resources that reference this Prefix List as described by the AWS documentation: Prefix lists concepts and rules, so the Lambda function should ideally set the Prefix List Max Entries to the number of entries expected in the list before an entry is registered, or, afterwards if an entry de-registered.

By maintaining a Prefix List and leveraging this pattern in your solutions, your solutions may potentially benefit in the following ways:

Reusability of configurations which will reduce the operational burden and improve consistency.
Re-use of Prefix Lists by sharing it with other AWS accounts by leveraging Resource Access Manager
Creates an automated mechanism to track and maintain a definitive list of Private IP addresses of similarly grouped of EC2 instances with non-deterministic IP addresses
High cohesion and low Coupling designs: reduce manual flow on changes when a change is implemented
Leverage programmatic mechanisms for automatically changes and maintenance – minimise deployments and/or manual tasks
Improve Security posture: this may potentially reduce occurances of overly broad CIDR values used in rules or route tables where it is used to encompass a few number of IP address within a wide IP range

Deploying the solution

Here we will walk-through the steps involved to deploy a SAM project of this solution hosted in my GitHub repository: https://github.com/chiwaichan/prefix-list-of-ec2-private-ip-addresses-using-eventbridge

Prerequisites:

Run the following command to checkout the code

git clone git@github.com:chiwaichan/prefix-list-of-ec2-private-ip-addresses-using-eventbridge.git

cd prefix-list-of-ec2-private-ip-addresses-using-eventbridge/

Run the following command to configure the SAM deploy

sam deploy --guided

Enter the following arguments in the prompt:

Stack Name: prefix-list-of-ec2-private-ip-addresses-using-eventbridge
AWS Region: ap-southeast-2 or the value of your preferred Region
Parameter ImageID: ami-0c6120f461d6b39e9 (the Amazon Linux AMI ID in ap-southeast-2), you can use any AMI ID for your Region
Parameter SecurityGroupId: the Security Group ID to use for the EC2 instance provisioned, e.g. sg-0123456789
Parameter SubnetId: the Subnet ID of the Subnet to deploy the EC2 instance in, e.g. subnet-0123456678

Confirm the deployment

Let's check to see that everything has been deployed correctly in our AWS account.

Here we can see the list of AWS resources deployed in the CloudFormation Stack

Here we can see the details of the EC2 instance provisioned in a "Running" state. Take note of the Private IPv4 address.

This is the Prefix List provisioned; here we can see the Private IPv4 address of the EC2 instance in the Prefix list entries. Also, note that the Max Entries is currently set to 1.

Stopping the running EC2 Instance

Let's stop the EC2 instance

We should see the Private IP address of the EC2 instance removed from the Prefix List Entries, the Max Entries remains as 1 - this is because the minimum value must be 1 even when there are no Entries in the Prefix List

This is the sniplet of Python code in the Lambda function that removes the Private IP address from the Prefix List:

# if the instance state change is 'stopping' so we remove the private IP CIDR to the Prefix List
elif ec2_state == "stopping":
    if is_in_list:
        print("remove")

        response = client.modify_managed_prefix_list(
            PrefixListId=prefix_list_id,
            CurrentVersion=current_prefix_list_version,
            RemoveEntries=[
                {
                    'Cidr': private_id_address + "/32"
                },
            ]
        )

        if len(current_entries) != 1: 
            sleep(3)

            response = client.modify_managed_prefix_list(
                PrefixListId=prefix_list_id,
                MaxEntries=len(current_entries) - 1
            )
    else:
        print("not in list so no action")

Starting the stopped EC2 Instance

Let's start the EC2 instance

We should see the Private IP address of the EC2 instance added back to the Prefix List Entries. Note the description is different to what it was when we first saw it earlier.

This is the sniplet of Python code in the Lambda function that adds the Private IP address to the Prefix List:

# if the instance state change is 'running' so we add the private IP CIDR to the Prefix List
if ec2_state == "running":
    if is_in_list:
        print("already in list so no action")
    else:
        print("add")

        if len(current_entries) + 1 != prefix_list["MaxEntries"]:
            response = client.modify_managed_prefix_list(
                PrefixListId=prefix_list_id,
                MaxEntries=len(current_entries) + 1
            )

            sleep(3)

        response = client.modify_managed_prefix_list(
            PrefixListId=prefix_list_id,
            CurrentVersion=current_prefix_list_version,
            AddEntries=[
                {
                    'Cidr': private_id_address + "/32",
                    'Description': 'added by EventBridge Lambda'
                },
            ]
        )

Manually create an EC2 instance with a Prefix List Tag

Let's launch a new EC2 instance (using any AMI and deploy it in any Subnet with any Security Group) with a value of "eventbridge-managed-prefix-list" for the "prefix-list" Tag, the EventBridge and Lambda will register the Private IP address of this newly created instance into the Prefix List "eventbridge-managed-prefix-list".

Here we see the Private IP address of the new manually created EC2 instance appear in the Prefix List Entries; also, the Max Entries has been updated to 2 by the Lambda function.

FYI, You can adapted this pattern and Lambda function to add or remove Private IP addresses based on the EC2 instance state change value of your choosing.

Clean up

Delete the manually created EC2 instance; afterwards, you can see it removed from the Prefix List and the Prefix List's Max Entries decreased back down to 1 by the Lambda function
Delete the CloudFormation stack with the name "prefix-list-of-ec2-private-ip-addresses-using-eventbridge"

This solution compliments the use of networking solutions in other blogs I have written:

Work-around for cross-account Transit Gateway Security Group Reference

May 28, 2022 · 8 min read

Chiwai Chan

Tinkerer

Have you ever tried to create a Security Group with a Source or Destination rule that references another Security Group? how about referencing a Security Group from another AWS account to allow ingress network traffic over a Transit Gateway architecture? if this question peaked your interest then you should keep reading.

In this blog we will go into:

Prerequisites
What we like to have
What we probably end up doing most of the time
What we could do instead using AWS Customer-managed Prefix Lists
Considerations

This blog builds on top of the Prefix List patterns I described in this blog: AWS Prefix List, so have a read of it to provide you with a better context as you read on.

What we like to have

How many of us have tried to implement the following architecture but realised it was not technically possible?

I myself have certainly tried to implement this a couple of years ago but to no avail; recently, a client said they also tried to implement this very same pattern, as per usual I did a bit of googling and confirmed that it is still the case today.

What we probably end up doing most of the time

This is probably what most of us do to allow cross-account network traffic to ingress into an EC2 instance over a Transit Gateway architecture.

In VPC A, instead of being able to reference a Security Group (outside of AWS account A, so from either account B, C or D) as the Source traffic of an ENI (via Security Group rules) attached to the EC2 instance in VPC A, one of the current methods is to add the CIDR blocks of the source traffic in the Source rules in the Security Group in VPC A: the CIDR value could either be the entire VPC CIDR block (of VPC B, C or D) to allow all traffic from a VPC, or, a Subnet's CIDR block to narrow down the ingress traffic to flow only from within a sub-section of a source VPC; or, the specific Private IP addresses of the source EC2 instances (e.g. 172.20.15.1/32).

The approach you decide for this pattern depends on the level of security posture you are comfortable with implementing into your network architecture:

VPC CIDR block values: this will allow ingress traffic wide open from the entire source externally VPC, if you intend for all resources from a source VPC to send traffic to your target resources then this option is fine
Subnet CIDR block values: this provides a narrower approach with a slightly tighter level of network security than above, if you intend for all resources from a source Subnet to send traffic to your target resources then this option is fine
Specific CIDR values of a Private IP addresses: this option provides the tightest network security control of the 3 options, however, maintaining a list Private IP addresses of EC2 instances outside of your AWS account (whether you or a 3rd party owns the account) will require a some operational effort. The solution proposed below will provide an automated mechanism to solve this particular problem

Network security controls could be further tightened when coupled with the use of NACLs, have a read of my blog for an example of incoroporating NACLs into your network architecture: Swiss Cheese Network Security: Factorising Security Group Rules into NACLs and Security Group Rules

An example scenario that could be problematic for this architecure is that, if the Source Private IP addresses (for resources outside of the account) needs to be constantly added or removed in the Security Group in VPC A - pet EC2 instances being provisioned and terminated: this will be a burden for the operations team as they would constantly need to update the Security Group rules to relfect changes happening outside of the AWS account - this would not be a problem if we were able to reference in rules the Security Groups from other AWS accounts, perhaps one day AWS will have this ability. This is especially burdensome when you have to co-ordinate changes with 3rd party owners of the AWS accounts outside of your control, imagine having to maintain changes from a dozen external AWS accounts.

What we could do instead using AWS Customer-managed Prefix Lists

Here we propose a pattern to achieve the same outcome but instead we leverage Prefix Lists, to externalise the management of CIDR blocks in the AWS accounts (B, C and D) where the network traffic originate from, then reference the external Prefix List in each of the accounts (B, C, and D) in the Security Group rules of account A; with the help of AWS Resource Access Manager (RAM) as Prefix Lists as shared with AWS account A by account B, C and D.

In the diagram above we have 3 options for the CIDR values maintained in these Prefix Lists outside of AWS account A, these types of values are similiar to the 3 options when the rules were defined (explained earlier) in the Security Group in VPC A, but the principle of network security controls remains the same in terms of tightness.

This pattern achieves the same outcome as what we desire if Security Groups could be (it is not supported by AWS at the time of writing this blog) referenced over a Transit Gateway, but it does have its drawbacks: the Max Entries (not the actual) of a Prefix List is counted towards the Quota of the Security Group that references it – so the example illustrated above results in 3 rules (1 for each Prefix List for each account) created in the Security Group in VPC A. This patterns has merits when you want allow inversion of control to enable external AWS accounts to control what network traffic is allowed to enter with the help of using Prefix Lists shared through RAM - remember the control is essentially delegated to the external AWS accounts, so you have to trust the level of scoping for CIDR value entries is being maintained in these accounts.

Bonus - Extra tight network security controls

The pattern above solves a small to medium sized problem on its own, but if we were to combine it with the patterns detailed in these two blogs: Leveraging AWS Prefix Lists and Maintaining a Prefix List of EC2 Private IP Addresses using AWS EventBridge, we can achieve the following:

By combining the 3 patterns we will end up with a network architecture that achieves the following:

A work-around for cross-account Security Group reference over a Transit Gateway.
List of Private IP addresses of similar EC2 instances (any grouping of your choosing) automatically tracked and managed in a Prefix List within each spoke account based on a Tagged value on the EC2 instances.
The same Prefix List in each spoke account can be referenced (via Resource Access Manager) to route return traffic back from the Subnet Route Table in VPC A to the originating Transit Gateway – this could potentially fully automate routing of traffic to Transit Gateway – great for scenarios where you only want return traffic for one or two IP addresses (especially when they are pets) in account B, C or D.
The same Prefix List in each spoke account can be referenced (via Resource Access Manager) to route return traffic back from Transit Gateway to the destined source Transit Gateway Attachment – this could potentially be used to automate routing if static or propagated routing is not used in a Transit Gateway Route Table. We can narrow it down to a very small subset of distributed allowable return traffic IP block for a spoke source traffic attachment – so only a subset of return traffic is allowed to return back to the originating TGW spoke.
Depending on the narrowness of the CIDR values used in the Prefix Lists, e.g. a few distributed /32 IPs in the Prefix List for a source VPC with a 1024 addresses for it's CIDR block, if used effectively, least privilege for network security is achieved using this pattern.

Considerations

As with any patterns, services or components, the pros and cons of each one needs to be weighed against each other and thought out in the interest of the long-term overall benefits for your solution and most importantly for your organisation. Restructuring existing networking and migrating workloads into it can be difficult and time consuming - especially if manual steps to deploy your infrastructure is required. Use Prefix Lists economically so that you do not under consume the number of Max Entries set by leveraging Lambdas to automatically update Max Entries; check out my blog on Maintaining a Prefix List of EC2 Private IP Addresses using AWS EventBridge.

This solution compliments the use of networking solutions in other blogs I have written:

Swiss Cheese Network Security Factorising Security Group Rules into NACLs and Security Group Rules

May 6, 2022 · 9 min read

Chiwai Chan

Tinkerer

Introduction

Lately I've been doing some networking configuration reviews for some of the projects I've been put on; to balance out the #crazycatlady blogs I'll be blogging about some network patterns and components that don't often get much attention or get used at all in the pipeline of blogs.

Today I'll be talking about Network Access Control List (NACL) and examples of how it could be used; and most importantly why it should be used.

NACLs are firewalls rules for your Subnets like how Security Group (SG) are firewall rules for your ENIs - SGs controls what traffic are allowed to enter your ENIs and NACLs controls what network traffic is allowed to enter your Subnet. Think of an onion and its layers, the NACLs is the outer layer around your SGs, so if your traffic is blocked by NACL rules (outer layer) then it will not be able to get into your Subnet, therefore it is impossible for the traffic to reach your ENIs (next layer in).

I've only reviewed a small handful of AWS network configurations but one thing I've noticed is that I've only ever seen the same default single NACL rule used that Allows all network traffic sources to all ports going into a Subnet.

Problem

We've reached the maximum allowable limit for rules in a Security Group and attached as many Security Groups to an ENI as we are allowed to.

Short summary of the solution

Reduce the number of rules: incorporating some NACL rules into a network design could reduce the overall number of Security Group rules if used effectively - by pulling firewall rules out into the Subnet layer using NACLs; and at the same time improves security posture as traffic is checked and blocked before it enters a Subnet, as opposed to traffic getting checked and blocked at a resource layer by Security Groups after it enters a Subnet – this effectively is adopting a defence in layers approach.

Example of the problem

problem nacls

We commonly open up All Ports, Protocols and Sources/Destinations into and out of a Subnet using NACLs without leveraging Deny rules.

problem security groups

We commonly apply all Firewall rules at the resource’s ENI layer via Security Groups; after all traffic routed into a Subnet is allowed to enter.

problem intersect

The network traffic allowed into an AWS resource depends on the combination of the rules applied to the Subnet’s NACL, as well as the rules applied at the Security Groups layer: the Intersection of the 2 rule sets is what allows network traffic to be entered into an ENI – think of it like the intersection of a Venn Diagram, or, a well commonly known model called the “Swiss Cheese”.

venn diagram

This is net result of network traffic sources and ports allowed to enter an AWS resource by the 2 layers of rules – as you expect to see this is all the rules applied at the Security Group layer. Below we show the equivalent configuration in the AWS Console as depicted by the diagrams above.

problem nacl aws console problem sg aws console

Note, we have 1 Allow rule in the NACL for all Protocols, Ports and Sources; and 9 Security Group rules made up of 3 CIDR blocks with each allowed to enter the same 3 Ports.

Solution

Here we have a solution that achieves the same outcome as the example described in the problem, but we will achieve it with the use of NACLs.

solution nacls

In the NACL, instead of using a single Allow rule for network traffic for all Protocols, Ports and Sources/Destinations, we have the following 3 rules:

To allow all traffic source from 0.0.0.0/0 to enter the Subnet for Port 22
To allow all traffic source from 0.0.0.0/0 to enter the Subnet for Port 80
To allow all traffic source from 0.0.0.0/0 to enter the Subnet for Port 443

solution security groups

In the Security Group, we have the following rules:

To allow all traffic source from 10.0.0.0/8 to hit the ENI on all Ports
To allow all traffic source from 172.16.0.0/12 to hit the ENI on all Ports
To allow all traffic source from 192.168.0.0/16 to hit the ENI on all Ports

At first glance when you look at the Security Group rules you may think that it is overly permissive because all Ports are opened for the 3 CIDR blocks, however, if we apply the logic of Venn Diagram Intersects for the 2 rule sets made up of NACL and Security Groups, then you will realise the net result of traffic Source and Ports allowed into an ENI is identical to the example in the problem without using NACLs.

solution intersect

solution intersect result

Here is what the NACL and Security configuration looks like in the AWS Console for the proposed pattern:

solution nacl aws console

solution intersect result

The net result of the 2 rule sets is identical and the traffic allowed to enter into an ENI remains the same; but notice in this pattern we have 3 Allow rules for the NACL and 3 rules for the Security Group (total of 6 vs where it was previously 10). In effect, we’ve reduced the number of rules in the Security Group by a factor of 3 but achieved the same outcome by leveraging NACLs, so this pattern is useful if you constantly find yourself hitting the AWS Quota limits for the number of rules in a Security or even hitting the limit for the number of Security Groups attached to an ENI.

Now let’s consider a more problematic example where there are many more Ports used that are spread out with gaps in between, with many specific CIDR values. Under the current pattern imagine the 60 rules in a Security Group made up of combinations of 6 different Ports and 10 different Sources with the following configuration:

Port	Source
310	10.1.0.1/32
310	10.3.0.1/32
310	10.9.0.1/32
310	172.16.1.0/32
310	172.16.4.0/32
310	172.16.8.0/32
310	192.168.1.1/32
310	192.168.4.1/32
310	192.168.8.1/32
310	192.168.9.1/32
320	10.1.0.1/32
320	10.3.0.1/32
320	10.9.0.1/32
320	172.16.1.0/32
320	172.16.4.0/32
320	172.16.8.0/32
320	192.168.1.1/32
320	192.168.4.1/32
320	192.168.8.1/32
320	192.168.9.1/32
322	10.1.0.1/32
322	10.3.0.1/32
322	10.9.0.1/32
322	172.16.1.0/32
322	172.16.4.0/32
322	172.16.8.0/32
322	192.168.1.1/32
322	192.168.4.1/32
322	192.168.8.1/32
322	192.168.9.1/32
400	10.1.0.1/32
400	10.3.0.1/32
400	10.9.0.1/32
400	172.16.1.0/32
400	172.16.4.0/32
400	172.16.8.0/32
400	192.168.1.1/32
400	192.168.4.1/32
400	192.168.8.1/32
400	192.168.9.1/32
420	10.1.0.1/32
420	10.3.0.1/32
420	10.9.0.1/32
420	172.16.1.0/32
420	172.16.4.0/32
420	172.16.8.0/32
420	192.168.1.1/32
420	192.168.4.1/32
420	192.168.8.1/32
420	192.168.9.1/32
500	10.1.0.1/32
500	10.3.0.1/32
500	10.9.0.1/32
500	172.16.1.0/32
500	172.16.4.0/32
500	172.16.8.0/32
500	192.168.1.1/32
500	192.168.4.1/32
500	192.168.8.1/32
500	192.168.9.1/32

When we convert the 60 rules in the Security Group into using NACL and Security Group we get:

Port	Source
ALL or 310-500	10.1.0.1/32
ALL or 310-500	10.3.0.1/32
ALL or 310-500	10.9.0.1/32
ALL or 310-500	172.16.1.0/32
ALL or 310-500	172.16.4.0/32
ALL or 310-500	172.16.8.0/32
ALL or 310-500	192.168.1.1/32
ALL or 310-500	192.168.4.1/32
ALL or 310-500	192.168.8.1/32
ALL or 310-500	192.168.9.1/32

Port	Source
310	0.0.0.0/0
320	0.0.0.0/0
322	0.0.0.0/0
400	0.0.0.0/0
420	0.0.0.0/0
500	0.0.0.0/0

We have gone from 61 (60 SG rules + the NACL Allow all) rules down to 16 rules between the NACL and Security Group – the net result is identical. I have not stated which of the 2 tables above is for the NACL rules and which is for the Security Group rules, this is because it does not matter which attribute is used to factorise the rules into the NACL - if we remember the Intersect of a Venn Diagram – however, I suggest picking the Port or Source depending based around the network construct are you most likely hitting the rule limits – the area you want to leave wiggle room for. If we use table 2 for the Security Group rules then we’ve effectively reduced the rules by 90%.

To be able to fully take advantage of this pattern, careful consideration needs to happen at the beginning of any VPC and Subnet designs in respect to how resources are grouped within a VPC and especially within Subnet, too many grouping of dissimilar resources in terms of Source Traffic, Protocols and Ports could have consequence of too many rules; a blog in the pipeline. Off course it is best practice to implement security in all layers so if there is room left in your Security Groups you should lock down your rules by Ports and Source as much as you can.

This solution compliments the use of networking solutions in other blogs I have written:

Smart Cat Feeder – Part 4

April 3, 2022 · 5 min read

Chiwai Chan

Tinkerer

This is the Part 4 and final blog of the series where I detail my journey in learning to build an IoT solution.

Please have a read of my previous blogs to get the full context leading up to this point before continuing.

Part 1: I talked about setting up a Seeed AWS IoT Button
Part 2: I talked about publishing events to an Adruino Micro-controller from AWS
Part 3: I talked about my experience of using a 3D Printer for the first time to print a Cat Feeder

Why am I building this Feeder?

I've always wanted to dip my toes into building IoT solutions beyond doing what a typical tutorial teaches in only turning on LEDs - I wanted to build something that would used everyday. Plus, I often forget to feed the cats while I am away from home (for the day), so it would be nice to come home to a non-grumpy cat by feeding them remotely any time and from any where in the world using the internet.

What was used to build this Feeder?

A 3D Printer using PLA as the filament material.
An Arduino based micro-controller - in this case a Seeed Studio XIAO ESP32C3
A couple of motors and controllers
AWS Services
Seeed AWS IoT Button
Some code
and some cat food

So how does it work and how is it put together?

To simply describe what is built, the Feeder uses an Iot button click to trigger events over the internet to instruct the feeder to dispense food into one or both food bowls.

cat feeder

Here are some diagrams describing the architecture of the solution - the technical things that happens in-between the IoT button and the Cat Feeder.

architecture diagram seeed sequence diagram

When the Feeder receives a MQTT message from the AWS IoT Core Service, it runs the motor for 10 seconds to dispense food into either one of food bowls, and if the message contains an event value to dispense food into both bowls we can run both motors concurrently using the L298N controller.

Here's a video of some timelapse picture captured during the 3 weeks it took to 3D print the feeder.

The Feeder is made up of a small handful of basic hardware components, below is a Breadboard diagram depicting the components used and how they are all wired up together. A regular 12V 2A DC power adapter supply is used to power all the components.

breadboard diagram seeed

The code to start and stop a motor is about 10 lines of code as shown below. This is the completed version of the Arduino Sketch shown in Part 2 of this blog series when it was partially written at the time.

#include "secrets.h"
#include <WiFiClientSecure.h>
#include <MQTTClient.h>
#include <ArduinoJson.h>
#include "WiFi.h"

// The MQTT topics that this device should publish/subscribe
#define AWS_IOT_PUBLISH_TOPIC   "cat-feeder/states"
#define AWS_IOT_SUBSCRIBE_TOPIC "cat-feeder/action"

WiFiClientSecure net = WiFiClientSecure();
MQTTClient client = MQTTClient(256);

int motor1pin1 = 32;
int motor1pin2 = 33;
int motor2pin1 = 16;
int motor2pin2 = 17;

void connectAWS()
{
  WiFi.mode(WIFI_STA);
  WiFi.begin(WIFI_SSID, WIFI_PASSWORD);

  Serial.println("Connecting to Wi-Fi");
  Serial.println(AWS_IOT_ENDPOINT);

  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }

  // Configure WiFiClientSecure to use the AWS IoT device credentials
  net.setCACert(AWS_CERT_CA);
  net.setCertificate(AWS_CERT_CRT);
  net.setPrivateKey(AWS_CERT_PRIVATE);

  // Connect to the MQTT broker on the AWS endpoint we defined earlier
  client.begin(AWS_IOT_ENDPOINT, 8883, net);

  // Create a message handler
  client.onMessage(messageHandler);

  Serial.println("Connecting to AWS IOT");
  Serial.println(THINGNAME);

  while (!client.connect(THINGNAME)) {
    Serial.print(".");
    delay(100);
  }

  if (!client.connected()) {
    Serial.println("AWS IoT Timeout!");
    return;
  }

  Serial.println("About to subscribe");
  // Subscribe to a topic
  client.subscribe(AWS_IOT_SUBSCRIBE_TOPIC);

  Serial.println("AWS IoT Connected!");
}

void publishMessage()
{
  StaticJsonDocument<200> doc;
  doc["time"] = millis();
  doc["state_1"] = millis();
  doc["state_2"] = 2 * millis();
  char jsonBuffer[512];
  serializeJson(doc, jsonBuffer); // print to client

  client.publish(AWS_IOT_PUBLISH_TOPIC, jsonBuffer);

  Serial.println("publishMessage states to AWS IoT" );
}

void messageHandler(String &topic, String &payload) {
  Serial.println("incoming: " + topic + " - " + payload);

  StaticJsonDocument<200> doc;
  deserializeJson(doc, payload);
  const char* event = doc["event"];

  Serial.println(event);

  feedMe(event);  
}

void setup() {
  Serial.begin(9600);
  connectAWS();

  pinMode(motor1pin1, OUTPUT);
  pinMode(motor1pin2, OUTPUT);
  pinMode(motor2pin1, OUTPUT);
  pinMode(motor2pin2, OUTPUT);
}

void feedMe(String event) {
  Serial.println(event);

  bool feedLeft = false;
  bool feedRight = false;

  if (event == "SINGLE") {
    feedLeft = true;
  }
  if (event == "DOUBLE") {
    feedRight = true;
  }
  if (event == "LONG") {
    feedLeft = true;
    feedRight = true;
  }

  if (feedLeft) {
    Serial.println("run left");
    digitalWrite(motor1pin1, HIGH);
    digitalWrite(motor1pin2, LOW);
  }

  if (feedRight) {
    Serial.println("run right");
    digitalWrite(motor2pin1, HIGH);
    digitalWrite(motor2pin2, LOW);
  }

  delay(10000);
  digitalWrite(motor1pin1, LOW);
  digitalWrite(motor1pin2, LOW);
  digitalWrite(motor2pin1, LOW);
  digitalWrite(motor2pin2, LOW);
  delay(2000);

  Serial.println("fed");
}

void loop() {
  publishMessage();
  client.loop();
  delay(3000);
}

Demo Time

The Seeed AWS IoT Button is able to detect 3 different types of click events: Long, Single and Double, and we are able to leverage this all the way to the feeder so we will have it performing certains actions base on the click event type.

The video below demonstrates the following scenarios:

Long Click: this will dispense food into both cat bowls
Single Click: this will dispense food into Ebok's cat bowl
Double Click: this will dispense food into Queenie's cat bowl

What's next?

Build the nervous system of an ultimate nerd project I have in mind that would allow me to voice control actions controlling servos, LEDs and audio outputs, by using a mesh of Seeed XIAO BLE Sense micro-controllers and TinyML Machine Learning.

What is the problem I am trying to solve?

How will I solve it?

How I am using Custom Resources with AWS CDK?

Deploying the AWS CDK version of this Cat Feeder IoT project

Test out the deployed solution

Conclusion

Extra

Current state architecture​

What is Event-Sourcing?

How do we introduce Event-Sourcing into the architecture?

Target State Architecture

Try it out for yourself

Create a test IoT 1-Click event to pass the event end-to-end through all the deployed resources

Future State Architecture

What is it that I am wanting to build?​

Where I am in terms of progress for this build?​

Starlink RV version​

Router / Wifi​

Cameras

Seeed Studio XIAO ESP32C3​

Seeed Studio LinkStar with Home Assistant​

What is left to source?​

Next blog​

Solution

Let's create a Prefix List in the AWS Console​

Prefix List – Security Group Reference

Let's take a look at an example​

Prefix List – Subnet Route Table Reference

Let's take a look at an example​

Prefix List – Transit Gateway Route Table Reference

Considerations

Conclusion

Deploying the solution

Confirm the deployment​

Stopping the running EC2 Instance

Starting the stopped EC2 Instance

Manually create an EC2 instance with a Prefix List Tag

Clean up

What we like to have

What we probably end up doing most of the time

What we could do instead using AWS Customer-managed Prefix Lists

Bonus - Extra tight network security controls

Considerations

Introduction​

Problem​

Short summary of the solution​

Example of the problem​

Solution​

Why am I building this Feeder?​

What was used to build this Feeder?​

So how does it work and how is it put together?​

Demo Time​

What's next?​

Current state architecture

What is it that I am wanting to build?

Where I am in terms of progress for this build?

Starlink RV version

Router / Wifi

Seeed Studio XIAO ESP32C3

Seeed Studio LinkStar with Home Assistant

What is left to source?

Next blog

Let's create a Prefix List in the AWS Console

Let's take a look at an example

Let's take a look at an example

Confirm the deployment

Introduction

Problem

Short summary of the solution

Example of the problem

Solution

Why am I building this Feeder?

What was used to build this Feeder?

So how does it work and how is it put together?

Demo Time

What's next?