AWS Athena Integration
You can create custom reporting and automation workflows by querying k9 Security's reports with SQL using AWS Athena. AWS Athena is a service that enables you to query structured data in S3 using SQL. k9 Security's parquet data formats are a robust and performant structured data integration option.
In this guide, we'll create Athena tables that model the latest principal access summary and resource access summary data so that you can answer questions like:
- which principals have stale API access key or password credentials?
- which principals can administer IAM?
- who can delete data in RDS?
- which data resources are overly-accessible
- which principals have excessive access to data
Note: Automation to deploy these tables is currently in private preview in k9 Security's compliance solution. Contact support@k9security.io for access.
Getting Started
Athena reads all data in a folder into a table using Hive-style partitions. So k9 Security delivers the latest report files using S3 object keys with this format:
s3://<secure_inbox_bucket_name>/hive/parquet/<file_type>/customer_id=<customer_id>/csp=<csp_name>/account_id=<account_id>/<fully_qualified_file_type>.latest.parquet
The parameters within the key format are:
secure_inbox_bucket_name
: the name of the customer or partner's secure inbox in S3 where reports are delivered, e.g.safeco-k9-reports
customer_id
: the k9 Customer ID, e.g.C123456
csp
: the cloud service provider name, currently onlyaws
account_id
: the id of the aws account being analyzed, e.g.123456789012
file_type
: the type of k9 report file:ic
: The Identity Center reportpas
: the Principal Access Summary reportprincipals
: the Principals reportresources
: the Resources reportras
: The Resource Access Summary report
For example, the key for customer C123456
's latest principal access summary for AWS account 123456789012
is: /hive/parquet/pas/customer_id=C123456/csp=aws/account_id=123456789012/principal-access-summaries.latest.parquet
Create 'k9' database
Create an Athena database which will contain the Athena tables used with k9 Security. We suggest creating a k9
database within the AwsDataCatalog
.
You can do this using the console or CLI. Feel free to use another catalog or database name. We'll use the k9
database name in the SQL statements that follow.
Create tables
Now create an principal_access_summaries_latest
table within the k9
database with:
-- NOTE: set your own <SECURE_INBOX_BUCKET_NAME> in LOCATION
CREATE EXTERNAL TABLE `principal_access_summaries_latest`(
`analysis_time` timestamp,
`principal_name` string,
`principal_arn` string,
`principal_type` string,
`principal_tags` map<string,string>,
`service_name` string,
`access_capability` string,
`resource_arn` string,
`resource_tag_confidentiality` string)
PARTITIONED BY (
`customer_id` string,
`csp` string,
`account_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://<SECURE_INBOX_BUCKET_NAME>/hive/parquet/pas'
TBLPROPERTIES (
'parquet.compression'='SNAPPY');
Run a table repair to load the partitions:
MSCK REPAIR TABLE k9.principal_access_summaries_latest;
See the create_k9_report_tables_athena.sql for a script that automates this process for the principals, principal access summary, and resource access summary tables.
Now you can query!
Query the access reports
Now you can query the latest k9 access summary data with SQL.
Identify IAM administrators
First, lets see who can administer IAM:
-- query for IAM admins in account
SELECT pas.principal_name, pas.principal_arn, pas.service_name, pas.access_capability
FROM k9.principal_access_summaries_latest as pas
WHERE 1=1
AND service_name='IAM'
AND access_capability='administer-resource'
-- AND customer_id='C123456' AND account_id='123456789012'
ORDER BY pas.principal_name ASC
;
Note: the where clause does not specify either a customer_id
nor account_id
so the query will return results for all accounts monitored by k9.
This query will return results like:
principal_name | principal_arn | service_name | access_capability |
---|---|---|---|
AWS-CodePipeline-Service | arn:aws:iam::123456789012:role/AWS-CodePipeline-Service | IAM | administer-resource |
AWSReservedSSO_AdministratorAccess_437be9d757c9ea2f | arn:aws:iam::123456789012:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_AdministratorAccess_437be9d757c9ea2f | IAM | administer-resource |
AWSServiceRoleForOrganizations | arn:aws:iam::123456789012:role/aws-service-role/organizations.amazonaws.com/AWSServiceRoleForOrganizations | IAM | administer-resource |
AccountAdminAccessRole-Sandbox | arn:aws:iam::123456789012:role/AccountAdminAccessRole-Sandbox | IAM | administer-resource |
cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 | arn:aws:iam::123456789012:role/cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 | IAM | administer-resource |
ci | arn:aws:iam::123456789012:user/ci | IAM | administer-resource |
demo-admin | arn:aws:iam::123456789012:role/demo-admin | IAM | administer-resource |
k9-dev-appeng | arn:aws:iam::123456789012:role/k9-dev-appeng | IAM | administer-resource |
skuenzli | arn:aws:iam::123456789012:user/skuenzli | IAM | administer-resource |
training | arn:aws:iam::123456789012:user/training | IAM | administer-resource |
Identify principals that can delete data in RDS
Now let's identify which principals can delete data in RDS with this query:
-- who can delete RDS DB clusters
SELECT ras.service_name, ras.resource_name, ras.principal_name, ras.access_capability
FROM k9.resource_access_summaries_latest as ras
WHERE 1=1
AND service_name='RDS'
AND access_capability='delete-data'
AND customer_id='C123456' AND account_id='123456789012'
;
This query will return results like:
service_name | resource_name | principal_name | access_capability |
---|---|---|---|
RDS | int-test-pg-01 | ci | delete-data |
RDS | int-test-pg-01 | skuenzli | delete-data |
RDS | int-test-pg-01 | training | delete-data |
RDS | int-test-pg-01 | AccountAdminAccessRole-Sandbox | delete-data |
RDS | int-test-pg-01 | AWS-CodePipeline-Service | delete-data |
RDS | int-test-pg-01 | AWSReservedSSO_AdministratorAccess_437be9d757c9ea2f | delete-data |
RDS | int-test-pg-01 | cdk-hnb659fds-cfn-exec-role-123456789012-us-east-1 | delete-data |
RDS | int-test-pg-01 | k9-dev-appeng | delete-data |
If you have any questions getting set up or querying the data, please reach out to support@k9security.io
FAQ
Q: Should I serve access analysis data to my users with direct queries to S3 using Athena?
That depends on your data integration and performance requirements.
Querying data from Athena is convenient for ad-hoc analysis and joining with other data in S3. Performance is good at 1-2 seconds per query for most queries. Querying Athena directly is a good option when many people aren't waiting for the results.
But if you are integrating access analysis data for viewing in another system, you will probably want to load the data into that system's data store and query from there. This will make it easier to join access analysis data with that system's data and probably decrease query times to 10-100 milliseconds.