Note that I was a little addicted when trying to gzip the Pandas DataFrame and save it as a CSV file in an Amazon S3 bucket.
import gzip
from io import BytesIO
import pandas as pd
import boto3
def save_to_s3(df: pd.DataFrame, bucket: str, key: str):
"""Pandas DataFrame.csv.Save to Amazon S3 as gz"""
buf = BytesIO()
with gzip.open(buf, mode="wt") as f:
df.to_csv(f)
s3 = boto3.client("s3")
s3.put_object(Bucket=bucket, Key=key, Body=buf.getvalue())
The points are as follows.
--Enter BytesIO ()
because the first argument of gzip.open
is a file-like object that represents the gzip format.
--Since the output of pandas.DataFrame.to_csv
is a string, mode
of gzip.open
specifies" write text ( wt
) ".
At first I thought that if I specified compression =" gzip "
in pandas.DataFrame.to_csv
, it would not be necessary to compress it explicitly, but if I entered a file-like object in to_csv
, it would be . The compression
option seems to be ignored and couldn't be used.
Recommended Posts