AWS Athena Integration

You can create custom reporting and automation workflows by querying k9 Security's reports with SQL using AWS Athena. AWS Athena is a service that enables you to query structured data in S3 using SQL. k9 Security's csv data formats work well for this.
In this guide, we'll create Athena tables that model the latest principal access summary and resource access summary data so that you can answer questions like:
- which principals can administer IAM?
- who can delete data in RDS?
Getting Started
Athena will read all data in a folder into a table using Hive-style partitions. So k9 Security built a special delivery mechanism to deliver the latest report files using S3 object keys in this format:
s3://<secure_inbox_bucket_name>/hive/customer_id=<customer_id>/csp=<csp_name>/account_id=<account_id>/file_type=<file_type>/<fully_qualified_file_type>.latest.csv
The parameters within that format are:
secure_inbox_bucket_name
: the name of the customer or partner's secure inbox in S3 where reports are delivered, e.g.safeco-k9-reports
customer_id
: the k9 Customer ID, e.g.C123456
csp
: the cloud service provider name, currently onlyaws
account_id
: the id of the aws account being analyzed, e.g.123456789012
file_type
: the type of k9 report file:csv_principals
: the Principals reportcsv_pas
: the Principal Access Summary reportcsv_resources
: the Resources reportcsv_ras
: The Resource Access Summary report
Examples:
For example, the key for customer C123456
's latest principal access summary for AWS account 123456789012
is: hive/customer_id=C10001/csp=aws/account_id=123456789012/file_type=csv_pas/principal-access-summaries.latest.csv
Create 'iam' database
Create an Athena database which will contain the Athena tables used with k9 Security. We suggest creating an iam
database within the AwsDataCatalog
.
You can do this using the console or CLI. Feel free to use another catalog or database name. We'll use the iam
database name in the SQL statements that follow.
Create tables
Now create an k9_principal_access_summaries_latest
table within the iam
database with:
-- NOTE: set your own <SECURE_INBOX_BUCKET_NAME> in LOCATION
CREATE EXTERNAL TABLE `k9_principal_access_summaries_latest`(
`analysis_time` timestamp,
`principal_name` string,
`principal_arn` string,
`principal_type` string,
`principal_tags` map<string,string>,
`service_name` string,
`access_capability` string,
`resource_arn` string,
`resource_tag_confidentiality` string)
PARTITIONED BY (
`customer_id` string,
`csp` string,
`account_id` string,
`file_type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://<SECURE_INBOX_BUCKET_NAME>/hive'
TBLPROPERTIES (
'has_encrypted_data'='false')
Run a table repair to load the partitions:
MSCK REPAIR TABLE iam.k9_principal_access_summaries_latest;
See the create_k9_report_tables_athena.sql for a script that automates this process for the principal and resource access summary tables.
Now you can query!
Query the access reports
Now you can query the latest k9 access summary data with SQL.
Identify IAM administrators
First, lets see who can administer IAM:
-- query for IAM admins in account
SELECT pas.principal_name, pas.principal_arn, pas.service_name, pas.access_capability
FROM iam.k9_principal_access_summaries_latest as pas
WHERE 1=1
AND service_name='IAM'
AND access_capability='administer-resource'
-- AND customer_id='C123456' AND account_id='123456789012'
ORDER BY pas.principal_name ASC
;
Note: the where clause does not specify either a customer_id
nor account_id
so the query will return results for all accounts monitored by k9.
This query will return results like:
principal_name | principal_arn | service_name | access_capability |
---|---|---|---|
AWS-CodePipeline-Service | arn:aws:iam::123456789012:role/AWS-CodePipeline-Service | IAM | administer-resource |
AWSReservedSSO_AdministratorAccess_437be9d757c9ea2f | arn:aws:iam::123456789012:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_AdministratorAccess_437be9d757c9ea2f | IAM | administer-resource |
AWSServiceRoleForOrganizations | arn:aws:iam::123456789012:role/aws-service-role/organizations.amazonaws.com/AWSServiceRoleForOrganizations | IAM | administer-resource |
AccountAdminAccessRole-Sandbox | arn:aws:iam::123456789012:role/AccountAdminAccessRole-Sandbox | IAM | administer-resource |
cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 | arn:aws:iam::123456789012:role/cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 | IAM | administer-resource |
ci | arn:aws:iam::123456789012:user/ci | IAM | administer-resource |
demo-admin | arn:aws:iam::123456789012:role/demo-admin | IAM | administer-resource |
k9-dev-appeng | arn:aws:iam::123456789012:role/k9-dev-appeng | IAM | administer-resource |
skuenzli | arn:aws:iam::123456789012:user/skuenzli | IAM | administer-resource |
training | arn:aws:iam::123456789012:user/training | IAM | administer-resource |
Identify principals that can delete data in RDS
Now let's identify which principals can delete data in RDS with this query:
-- who can delete RDS DB clusters
SELECT ras.service_name, ras.resource_name, ras.principal_name, ras.access_capability
FROM iam.k9_resource_access_summaries_latest as ras
WHERE 1=1
AND file_type='csv_ras'
AND service_name='RDS'
AND access_capability='delete-data'
AND customer_id='C123456' AND account_id='123456789012'
;
This query will return results like:
service_name | resource_name | principal_name | access_capability |
---|---|---|---|
RDS | int-test-pg-01 | ci | delete-data |
RDS | int-test-pg-01 | skuenzli | delete-data |
RDS | int-test-pg-01 | training | delete-data |
RDS | int-test-pg-01 | AccountAdminAccessRole-Sandbox | delete-data |
RDS | int-test-pg-01 | AWS-CodePipeline-Service | delete-data |
RDS | int-test-pg-01 | AWSReservedSSO_AdministratorAccess_437be9d757c9ea2f | delete-data |
RDS | int-test-pg-01 | cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 | delete-data |
RDS | int-test-pg-01 | k9-dev-appeng | delete-data |
If you have any questions getting set up or querying the data, please reach out to support@k9security.io
FAQ
Q: Should I serve access analysis data to my users with direct queries to S3 using Athena?
That depends on your data integration and performance requirements.
Querying data from Athena is convenient for ad-hoc analysis and joining with other data in S3. Performance is good at 1-2 seconds per query for most queries. Querying Athena directly is a good option when many people aren't waiting for the results.
But if you are integrating access analysis data for viewing in another system, you will probably want to load the data into that system's data store and query from there. This will make it easier to join access analysis data with that system's data and probably decrease query times to 10-100 milliseconds.