Skip to content

AWS Athena Integration

k9 Security Logical Integration Architecture - Athena
Figure 1. Logical Integration Architecture for Athena

You can create custom reporting and automation workflows by querying k9 Security's reports with SQL using AWS Athena. AWS Athena is a service that enables you to query structured data in S3 using SQL. k9 Security's parquet data formats are a robust and performant structured data integration option.

In this guide, we'll create Athena tables that model the latest principal access summary and resource access summary data so that you can answer questions like:

  • which principals have stale API access key or password credentials?
  • which principals can administer IAM?
  • who can delete data in RDS?
  • which data resources are overly-accessible
  • which principals have excessive access to data

Note: Automation to deploy these tables is currently in private preview in k9 Security's compliance solution. Contact for access.

Getting Started

Athena reads all data in a folder into a table using Hive-style partitions. So k9 Security delivers the latest report files using S3 object keys with this format:


The parameters within the key format are:

  • secure_inbox_bucket_name: the name of the customer or partner's secure inbox in S3 where reports are delivered, e.g. safeco-k9-reports
  • customer_id: the k9 Customer ID, e.g. C123456
  • csp: the cloud service provider name, currently only aws
  • account_id: the id of the aws account being analyzed, e.g. 123456789012
  • file_type: the type of k9 report file:
    • ic: The Identity Center report
    • pas: the Principal Access Summary report
    • principals: the Principals report
    • resources: the Resources report
    • ras: The Resource Access Summary report

For example, the key for customer C123456's latest principal access summary for AWS account 123456789012 is: /hive/parquet/pas/customer_id=C123456/csp=aws/account_id=123456789012/principal-access-summaries.latest.parquet

Create 'k9' database

Create an Athena database which will contain the Athena tables used with k9 Security. We suggest creating a k9 database within the AwsDataCatalog.

You can do this using the console or CLI. Feel free to use another catalog or database name. We'll use the k9 database name in the SQL statements that follow.

Create tables

Now create an principal_access_summaries_latest table within the k9 database with:

CREATE EXTERNAL TABLE `principal_access_summaries_latest`(
  `analysis_time` timestamp,
  `principal_name` string,
  `principal_arn` string,
  `principal_type` string,
  `principal_tags` map<string,string>,
  `service_name` string,
  `access_capability` string,
  `resource_arn` string,
  `resource_tag_confidentiality` string)
  `customer_id` string,
  `csp` string,
  `account_id` string)

Run a table repair to load the partitions:

MSCK REPAIR TABLE k9.principal_access_summaries_latest;

See the create_k9_report_tables_athena.sql for a script that automates this process for the principals, principal access summary, and resource access summary tables.

Now you can query!

Query the access reports

Now you can query the latest k9 access summary data with SQL.

Identify IAM administrators

First, lets see who can administer IAM:

-- query for IAM admins in account 
SELECT pas.principal_name, pas.principal_arn, pas.service_name, pas.access_capability
FROM k9.principal_access_summaries_latest as pas
  AND service_name='IAM' 
  AND access_capability='administer-resource'
  -- AND customer_id='C123456' AND account_id='123456789012'
ORDER BY pas.principal_name ASC

Note: the where clause does not specify either a customer_id nor account_id so the query will return results for all accounts monitored by k9.

This query will return results like:

principal_name principal_arn service_name access_capability
AWS-CodePipeline-Service arn:aws:iam::123456789012:role/AWS-CodePipeline-Service IAM administer-resource
AWSReservedSSO_AdministratorAccess_437be9d757c9ea2f arn:aws:iam::123456789012:role/aws-reserved/ IAM administer-resource
AWSServiceRoleForOrganizations arn:aws:iam::123456789012:role/aws-service-role/ IAM administer-resource
AccountAdminAccessRole-Sandbox arn:aws:iam::123456789012:role/AccountAdminAccessRole-Sandbox IAM administer-resource
cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 arn:aws:iam::123456789012:role/cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 IAM administer-resource
ci arn:aws:iam::123456789012:user/ci IAM administer-resource
demo-admin arn:aws:iam::123456789012:role/demo-admin IAM administer-resource
k9-dev-appeng arn:aws:iam::123456789012:role/k9-dev-appeng IAM administer-resource
skuenzli arn:aws:iam::123456789012:user/skuenzli IAM administer-resource
training arn:aws:iam::123456789012:user/training IAM administer-resource

Identify principals that can delete data in RDS

Now let's identify which principals can delete data in RDS with this query:

-- who can delete RDS DB clusters
SELECT ras.service_name, ras.resource_name, ras.principal_name, ras.access_capability
FROM k9.resource_access_summaries_latest as ras
  AND service_name='RDS'
  AND access_capability='delete-data'
  AND customer_id='C123456' AND account_id='123456789012'

This query will return results like:

service_name resource_name principal_name access_capability
RDS int-test-pg-01 ci delete-data
RDS int-test-pg-01 skuenzli delete-data
RDS int-test-pg-01 training delete-data
RDS int-test-pg-01 AccountAdminAccessRole-Sandbox delete-data
RDS int-test-pg-01 AWS-CodePipeline-Service delete-data
RDS int-test-pg-01 AWSReservedSSO_AdministratorAccess_437be9d757c9ea2f delete-data
RDS int-test-pg-01 cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 delete-data
RDS int-test-pg-01 k9-dev-appeng delete-data

If you have any questions getting set up or querying the data, please reach out to


Q: Should I serve access analysis data to my users with direct queries to S3 using Athena?

That depends on your data integration and performance requirements.

Querying data from Athena is convenient for ad-hoc analysis and joining with other data in S3. Performance is good at 1-2 seconds per query for most queries. Querying Athena directly is a good option when many people aren't waiting for the results.

But if you are integrating access analysis data for viewing in another system, you will probably want to load the data into that system's data store and query from there. This will make it easier to join access analysis data with that system's data and probably decrease query times to 10-100 milliseconds.

Last update: May 20, 2024