Query S3 Data Through Athena Using Cypress Task

Ahmed Alsaab
2 min readApr 28, 2023

Going to try and keep this one short. Code for this project can be found on the repository here.

Use Case

If you’re reading this, you probably have your own use case. For me, the reason I needed something like this is simply because we store some data in S3 after a user has interacted with our web services, this data is processed before it lands in S3 and I figured it would be worth automating that 1) our data lands in S3 accordingly and 2) the integrity of the data is upheld post processing.

AthenaExpress

You can query S3 with Athena using the base AWS SDK. However, I opted to use the AthenaExpress package as it makes it a bit more streamlined. AthenaExpress is a wrapper for the SDK and you can read up about it here.

Cypress Config

As described in the comments within this code snippet, you have need to figure out how you want to authenticate before you proceed to send any queries. Either use the AWS CLI to create some temporary credentials for an eligible AWS account or open up the AWS interface and get your access and secret keys directly.

The athenaExpressConfig object uses the aws instance set to authenticate the request along with the an optional s3 bucket object where you want the query results to be dumped in AWS itself and the db is simply the database you want to use.

One thing to note here is that the aws instance will by default attempt to read credentials from the ~s3 directory on your machine unless you override this using aws.config.update. In the example above, I do use temp credentials, so no extra code was needed but I’ve left a comment in to demonstrate how to manually set credentials if you need to.

Cypress Task

Nothing too fancy here. It’s a Cypress task which accepts a query as the parameter and uses the .query method from athenaExpress to fire off the request along with the config that was declared earlier. Do note that you could extend the resolve time since this is a promise using something like setTimeout() , especially if you’re sending large/complex queries that take a bit of time.

Alternatively, this could be moved to a custom command if you wanted to, and you could also then use Cypress.promise instead.

Usage

The first argument when using cy.task is always the task name, and then followed by any parameters you’ve setup for the task. Make sure you include sql in the object argument to tell athenaExpress that this is the query you would like to execute.

--

--