Skip to content

AWS Athena Integration

k9 Security Logical Integration Architecture - Athena
Figure 1. Logical Integration Architecture for Athena

You can create custom reporting and automation workflows by querying k9 Security's reports with SQL using AWS Athena. AWS Athena is a service that enables you to query structured data in S3 using SQL. k9 Security's csv data formats work well for this.

In this guide, we'll create Athena tables that model the latest principal access summary and resource access summary data so that you can answer questions like:

  • which principals can administer IAM?
  • who can delete data in RDS?

Getting Started

Athena will read all data in a folder into a table using Hive-style partitions. So k9 Security built a special delivery mechanism to deliver the latest report files using S3 object keys in this format:

s3://<secure_inbox_bucket_name>/hive/customer_id=<customer_id>/csp=<csp_name>/account_id=<account_id>/file_type=<file_type>/<fully_qualified_file_type>.latest.csv

The parameters within that format are:

  • secure_inbox_bucket_name: the name of the customer or partner's secure inbox in S3 where reports are delivered, e.g. safeco-k9-reports
  • customer_id: the k9 Customer ID, e.g. C123456
  • csp: the cloud service provider name, currently only aws
  • account_id: the id of the aws account being analyzed, e.g. 123456789012
  • file_type: the type of k9 report file:
    • csv_principals: the Principals report
    • csv_pas: the Principal Access Summary report
    • csv_resources: the Resources report
    • csv_ras: The Resource Access Summary report

Examples:

For example, the key for customer C123456's latest principal access summary for AWS account 123456789012 is: hive/customer_id=C10001/csp=aws/account_id=123456789012/file_type=csv_pas/principal-access-summaries.latest.csv

Create 'iam' database

Create an Athena database which will contain the Athena tables used with k9 Security. We suggest creating an iam database within the AwsDataCatalog.

You can do this using the console or CLI. Feel free to use another catalog or database name. We'll use the iam database name in the SQL statements that follow.

Create tables

Now create an k9_principal_access_summaries_latest table within the iam database with:

-- NOTE: set your own <SECURE_INBOX_BUCKET_NAME> in LOCATION  
CREATE EXTERNAL TABLE `k9_principal_access_summaries_latest`(
  `analysis_time` timestamp, 
  `principal_name` string, 
  `principal_arn` string, 
  `principal_type` string, 
  `principal_tags` map<string,string>, 
  `service_name` string, 
  `access_capability` string, 
  `resource_arn` string, 
  `resource_tag_confidentiality` string)
PARTITIONED BY ( 
  `customer_id` string,
  `csp` string,
  `account_id` string, 
  `file_type` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://<SECURE_INBOX_BUCKET_NAME>/hive'
TBLPROPERTIES (
  'has_encrypted_data'='false')

Run a table repair to load the partitions:

MSCK REPAIR TABLE iam.k9_principal_access_summaries_latest;

See the create_k9_report_tables_athena.sql for a script that automates this process for the principal and resource access summary tables.

Now you can query!

Query the access reports

Now you can query the latest k9 access summary data with SQL.

Identify IAM administrators

First, lets see who can administer IAM:

-- query for IAM admins in account 
SELECT pas.principal_name, pas.principal_arn, pas.service_name, pas.access_capability
FROM iam.k9_principal_access_summaries_latest as pas
WHERE 1=1
  AND service_name='IAM' 
  AND access_capability='administer-resource'
  -- AND customer_id='C123456' AND account_id='123456789012'
ORDER BY pas.principal_name ASC
;

Note: the where clause does not specify either a customer_id nor account_id so the query will return results for all accounts monitored by k9.

This query will return results like:

principal_name principal_arn service_name access_capability
AWS-CodePipeline-Service arn:aws:iam::123456789012:role/AWS-CodePipeline-Service IAM administer-resource
AWSReservedSSO_AdministratorAccess_437be9d757c9ea2f arn:aws:iam::123456789012:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_AdministratorAccess_437be9d757c9ea2f IAM administer-resource
AWSServiceRoleForOrganizations arn:aws:iam::123456789012:role/aws-service-role/organizations.amazonaws.com/AWSServiceRoleForOrganizations IAM administer-resource
AccountAdminAccessRole-Sandbox arn:aws:iam::123456789012:role/AccountAdminAccessRole-Sandbox IAM administer-resource
cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 arn:aws:iam::123456789012:role/cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 IAM administer-resource
ci arn:aws:iam::123456789012:user/ci IAM administer-resource
demo-admin arn:aws:iam::123456789012:role/demo-admin IAM administer-resource
k9-dev-appeng arn:aws:iam::123456789012:role/k9-dev-appeng IAM administer-resource
skuenzli arn:aws:iam::123456789012:user/skuenzli IAM administer-resource
training arn:aws:iam::123456789012:user/training IAM administer-resource

Identify principals that can delete data in RDS

Now let's identify which principals can delete data in RDS with this query:

-- who can delete RDS DB clusters
SELECT ras.service_name, ras.resource_name, ras.principal_name, ras.access_capability
FROM iam.k9_resource_access_summaries_latest as ras
WHERE 1=1
  AND file_type='csv_ras'
  AND service_name='RDS'
  AND access_capability='delete-data'
  AND customer_id='C123456' AND account_id='123456789012'
;

This query will return results like:

service_name resource_name principal_name access_capability
RDS int-test-pg-01 ci delete-data
RDS int-test-pg-01 skuenzli delete-data
RDS int-test-pg-01 training delete-data
RDS int-test-pg-01 AccountAdminAccessRole-Sandbox delete-data
RDS int-test-pg-01 AWS-CodePipeline-Service delete-data
RDS int-test-pg-01 AWSReservedSSO_AdministratorAccess_437be9d757c9ea2f delete-data
RDS int-test-pg-01 cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 delete-data
RDS int-test-pg-01 k9-dev-appeng delete-data

If you have any questions getting set up or querying the data, please reach out to support@k9security.io

FAQ

Q: Should I serve access analysis data to my users with direct queries to S3 using Athena?

That depends on your data integration and performance requirements.

Querying data from Athena is convenient for ad-hoc analysis and joining with other data in S3. Performance is good at 1-2 seconds per query for most queries. Querying Athena directly is a good option when many people aren't waiting for the results.

But if you are integrating access analysis data for viewing in another system, you will probably want to load the data into that system's data store and query from there. This will make it easier to join access analysis data with that system's data and probably decrease query times to 10-100 milliseconds.


Last update: May 30, 2023