Skip to main content

2 posts tagged with "Athena"

Athena

View All Tags

Rockit Apple payslip Analyzer with GenAI Chatbot using Bedrock and Streamlit

· 4 min read
Chiwai Chan
Tinkerer

It's the time of year where I normally have to start doing taxes, not for myself but for my parents. Mum works at various fruit picking / packing places in Hawkes Bay throughout the year, so that means there are all sorts of Payslips from different employers for the last financial year. Occasionally mum would ask me specific details about her weekly payslips, and that usually means: download a PDF from and email -> open up the PDF -> find what's she asking for -> look at the PDF -> can't find it so ask what mum meant -> find the answer -> explain it to her.

Solution & Goal

The usual format,challenge: create a Generative AI conversational chat to enable mum to ask in her natural language specific details of,

And the goal: outsource the work to AI = more time to play. :-)

Success Criterias

  • Automatically extract details from Payslips - I've only tested it on Payslips from Rockit Apple.
  • Enable end-user to ask in Cantonese details of a Payslip
  • Retrieve data from an Athena Table where the
  • Create a Chatbot to receive question in Cantonese around the user's Payslips stored in the Athena Table, and generate a response back to the user in Cantonese

So what's the Architecture?

Architecture

Note

I've only tried it for Payslips generated by this employer: Rockit Apple

Deploy it for yourself to try out

Prerequisites

  • Python 3.12 installed - the only version I've validated
  • Pip installed
  • Node.js and npm Installed
  • CDK installed - using npm install -g aws-cdk
  • AWS CLI Profile configured

Deployment CLI Commands

  • Open up a terminal
  • And run the following commands
git clone git@github.com:chiwaichan/rockitapple-payslip-analyzer-with-genai-chatbot-using-bedrock-streamlit.git 
cd rockitapple-payslip-analyzer-with-genai-chatbot-using-bedrock-streamlit
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cdk deploy

If all goes well

You should see this as a result of calling the cdk deploy command

CDK Deploy

Check that the CloudFormation Stack is being created in the AWS Console

CloudFormation Create

Click on it to see the Events, Resources and Output for the Stack

CloudFormation Create Events

Find the link to the S3 Bucket to upload Payslip PDFs into, in the Stack's Resources, find the S3 Bucket with a with a Logical ID that starts with "sourcepayslips" and click on its Physical ID link.

S3 Buckets

Upload your PDF Payslips into here S3 Source Payslip PDFs

Find the link to the S3 Bucket where the extracted Data will be stored into for the Athena Table, in the Stack's Resources, find the S3 Bucket with a with a Logical ID that starts with "PayslipAthenaDataBucket" and click on its Physical ID link.

CloudFormation S3 Buckets

There you can find a JSON file, it should take about a few minutes to appear after you upload the PDF.

Athena Table JSON file in S3 Bucket

It was created by the Lambda shown in the architecture diagram we saw earlier, it uses Amazon Textract to extract the data from each Paylip using OCR, using the Queries based feature to extract the Text from a PDF by enabling us to use queries in natural language to configure what we want to extract out from a PDF. Find the "app.py" file shown in the folder structure in the screenshot below, you can modify the wording of the Questions the Lambda function uses to extract the details from the Payslip, to suit the specific needs based on the wording of your Payslip; the result of each Question extracted is saved to the Athena table using the column name shown next to the Question.

Textract Queries

What it looks like in action

Go to the CloudFormation Stack's Outputs to get the URL to open the Streamlit Application's frontend.

Click the value for the Key "StreamlitFargateServiceServiceURL"

Streamlit URL

That will take you to a Streamlit App hosted in the Fargate Container shown in the architecture diagram

Streamlit App

Lets try out some examples

Example 1 Example 2 Example 3 Example 4 1 payslip

Things don't always go well

Error

You can tweak the Athena Queries generated by the LLM by providing specific examples tailoured to your Athena Table and its column names and values - known as a Few-Shot Learning. Modify this file to tweak the Queries feed into the Few-shot examples used by Bedrock and the Streamlit app.

Few Shot Examples

Thanks to this repo

I was able to learn and build my first GenAI app: AWS Samples - genai-quickstart-pocs

I based my app on the example for Athena, I wrapped the Streamlit app into a Fargate Container and added Textract to extract Payslips details from PDFs and this app was the output of that.

Storing Electricity Price Data on AWS

· 2 min read
Chiwai Chan
Tinkerer

For a personal project of mine, I like to be able to analyse the pattern of New Zealand's electricity Spot Prices; to identify the cheapest hours during the day to pull power from the grid, as well as, the best time of the day to sell back to the grid.

I will be creating a series of blogs as I build out the fragments of my project. Over time, I will integrate the individual fragments into a bigger overall solution. One of the drivers for analysing the New Zealand Spot Prices: is the aim in reducing the payback period of my Solar and Tesla Powerwall purchase. I have had the 2 systems for over a year at the time when this blog was published.

In this blog, I will explain how I will be collecting Spot Prices from electricityinfo.co.nz using one of their APIs; each Spot Price data reading will then be stored as a JSON file in an S3 bucket where it will be query-able using SQL. Using the same pattern, I will also track the actual cost per unit of power I am paying for from pulling power from the grid, my electricity provider is Flick Electric and I will also leverage their APIs to retrieve the pricing data.

power price reporting

An architecture diagram of the solution. The orchestration of retrieval and storage of the data using AWS serverless components.

query-athena

Querying price data stored as JSON file in an S3 bucket using SQL in Athena.

The source code for this AWS SAM project can be found in my Github repository: https://github.com/chiwaichan/athena-spot-prices

In order for this solution to work you must have a set of credentials for Flick Electric, otherwise you can modify the SAM template to disable the Lambda Function's scheduler that triggers the Lambda to retrieve data. This Lambda function retrieves the credentials from AWS Secrets Manager, so you will need to create a Secret before deploying this solution as demonstrated in the AWS CLI shown in the screenshot below.

create secrets manager value

In a follow up blog, I will demonstrate the use of these Athena tables using a reporting service called QuickSight.