This article is for anyone who wants to query Athena and analyze the results in pandas. I think it's especially useful when analyzing with Jupyter notebook.
Install PyAthena.
pip install PyAthena
Use the connect function. Specify the AWS key and the path of S3 that spits out the result of querying with Athena. If you execute it using the function pd.read_sql, you can get the execution result in the form of pandas.
from pyathena import connect
import pandas as pd
aws_access_key_id = 'Your aws access key id'
aws_secret_access_key = 'Your aws secret access key'
conn = connect(aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
s3_staging_dir='Your s3 path',
region_name='ap-northeast-1')
df = pd.read_sql("SELECT * FROM sample", conn)
Recommended Posts