I heard from my boss that the Fast API seems to be good, so I touched it. It's boring to simply make a GET request and return characters, so I created an API to convert a PDF file to a TIF image.
FastAPI is a Python web framework similar to Flask.
root
├─app.py
├─Dockerfile
├─requirements.txt
└─test.pdf
Dockerfile
Dockerfile
FROM python:3.8
#Install poppler required for PDF conversion
RUN apt-get update && \
apt-get install -y poppler-utils
#Python module installation
COPY requirements.txt .
RUN pip install --upgrade pip && \
pip install -r requirements.txt && \
rm requirements.txt
#Create a folder to temporarily save the converted file
RUN rm -rf /app && \
mkdir -p /app/data/
#Place the program
COPY app.py /app/app.py
EXPOSE 8000
WORKDIR /app
CMD ["uvicorn", "app:api", "--host", "0.0.0.0", "--port", "8000"]
This time I used the image python: 3.8
, but anything is fine as long as Python works and poppler can be installed.
requirements.txt
requirements.txt:requirements.txt
fastapi
uvicorn
python-multipart
pdf2image
fastapi
and ʻuvicornare required when using FastAPI Requires
python-multipartwhen uploading files Requires
pdf2image` to convert PDF files to images
app.py
app.py
import os
from base64 import b64encode
import uvicorn
from fastapi import FastAPI, File, UploadFile
from pdf2image import convert_from_bytes
from PIL import Image
api = FastAPI()
@api.post("/")
async def post(file: UploadFile = File(...)):
pdf_file = await file.read()
tif_file = convert(pdf_file)
return tif_file
def convert(pdf_file):
output_folder = "./data"
file_name = "temporary"
output_file_path = f"{output_folder}/{file_name}.tif"
#Convert all pages of PDF to jpg and save
image_path = convert_from_bytes(
pdf_file=pdf_file,
thread_count=5,
fmt="jpg",
output_folder=output_folder,
output_file=file_name,
paths_only=True,
)
#Load all jpg images
images = [Image.open(image) for image in image_path]
#Convert all jpg images to one TIF image and save
images[0].save(
output_file_path, format="TIFF", save_all=True, append_images=images[1:],
)
#Read all jpg images and base64 encode
with open(output_file_path, "rb") as f:
tif_file = b64encode(f.read())
#Deletes all saved images and returns a binary of TIFF images
for image in image_path:
os.remove(image)
os.remove(output_file_path)
return tif_file
if __name__ == "__main__":
uvicorn.run(api)
Note that if you do not set paths_only = True
in convert_from_bytes
, it will consume a lot of memory.
Build
docker build -t fastapi .
Run
docker run --rm -it -p 8000:8000 fastapi
> curl -X POST -F 'file=@./test.pdf' http://localhost:8000 | base64 -di > ./test.tif
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 206M 100 206M 100 309k 27.0M 41409 0:00:07 0:00:07 --:--:-- 47.2M
It is base64 encoded and returned, so you need to base64 decode and write it.
I needed python-multipart
to upload the file, and I had some stumbling blocks, but I found the Fast API very easy to write.
Recommended Posts