Create an API to convert PDF files to TIF images with FastAPI and Docker

Introduction

I heard from my boss that the Fast API seems to be good, so I touched it. It's boring to simply make a GET request and return characters, so I created an API to convert a PDF file to a TIF image.

What is FastAPI

FastAPI is a Python web framework similar to Flask.

Development environment

Implementation

Directory structure

root
├─app.py
├─Dockerfile
├─requirements.txt
└─test.pdf

Dockerfile

Dockerfile


FROM python:3.8

#Install poppler required for PDF conversion
RUN apt-get update && \
    apt-get install -y poppler-utils

#Python module installation
COPY requirements.txt .
RUN pip install --upgrade pip && \
    pip install -r requirements.txt && \
    rm requirements.txt

#Create a folder to temporarily save the converted file
RUN rm -rf /app && \
    mkdir -p /app/data/

#Place the program
COPY app.py /app/app.py

EXPOSE 8000
WORKDIR /app
CMD ["uvicorn", "app:api", "--host", "0.0.0.0", "--port", "8000"]

This time I used the image python: 3.8, but anything is fine as long as Python works and poppler can be installed.

requirements.txt

requirements.txt:requirements.txt


fastapi
uvicorn
python-multipart
pdf2image

fastapi and ʻuvicornare required when using FastAPI Requirespython-multipartwhen uploading files Requirespdf2image` to convert PDF files to images

app.py

app.py


import os
from base64 import b64encode

import uvicorn
from fastapi import FastAPI, File, UploadFile
from pdf2image import convert_from_bytes
from PIL import Image

api = FastAPI()


@api.post("/")
async def post(file: UploadFile = File(...)):
    pdf_file = await file.read()
    tif_file = convert(pdf_file)
    return tif_file


def convert(pdf_file):
    output_folder = "./data"
    file_name = "temporary"
    output_file_path = f"{output_folder}/{file_name}.tif"

    #Convert all pages of PDF to jpg and save
    image_path = convert_from_bytes(
        pdf_file=pdf_file,
        thread_count=5,
        fmt="jpg",
        output_folder=output_folder,
        output_file=file_name,
        paths_only=True,
    )

    #Load all jpg images
    images = [Image.open(image) for image in image_path]

    #Convert all jpg images to one TIF image and save
    images[0].save(
        output_file_path, format="TIFF", save_all=True, append_images=images[1:],
    )

    #Read all jpg images and base64 encode
    with open(output_file_path, "rb") as f:
        tif_file = b64encode(f.read())

    #Deletes all saved images and returns a binary of TIFF images
    for image in image_path:
        os.remove(image)
    os.remove(output_file_path)
    return tif_file


if __name__ == "__main__":
    uvicorn.run(api)

Note that if you do not set paths_only = True in convert_from_bytes, it will consume a lot of memory.

Run

Start Docker

  1. Build

    docker build -t fastapi .
    
  2. Run

    docker run --rm -it -p 8000:8000 fastapi
    

API request

 > curl -X POST -F 'file=@./test.pdf' http://localhost:8000 | base64 -di > ./test.tif
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  100  206M  100  206M  100  309k  27.0M  41409  0:00:07  0:00:07 --:--:-- 47.2M

It is base64 encoded and returned, so you need to base64 decode and write it.

Summary

I needed python-multipart to upload the file, and I had some stumbling blocks, but I found the Fast API very easy to write.

Recommended Posts

Create an API to convert PDF files to TIF images with FastAPI and Docker
Convert garbled scanned images to PDF with Pillow and PyPDF
Convert PDF files to PNG files with GIMP
Create an API with Django
Minimum Makefile and buildout.cfg to create an environment with buildout
Convert PDF to image with ImageMagick
Create an alias for Route53 to CloudFront with the AWS API
Convert files written in python etc. to pdf with syntax highlighting
Challenge to create time axis list report with Toggl API and Python
How to create an NVIDIA Docker environment
Convert from PDF to CSV with pdfplumber
Prepare an environment to touch grib2 format files with python (Docker edition)
[Python] Quickly create an API with Flask
Convert HEIC files to PNG files with Python
I tried to automate internal operations with Docker, Python and Twitter API + bonus
Create an easy-to-read pdf of laws and government ordinances using the law api
Create a clean DB for testing with FastAPI and unittest the API with pytest
Convert DICOM to PNG with Ascending and Descending
Convert PDF to image (JPEG / PNG) with Python
Convert PDFs to images in bulk with Python
How to convert SVG to PDF and PNG [Python]
Convert multiple jpg files to one PDF file
Batch convert PSD files in directory to PDF
Create an API server quickly with Python + Falcon
I made a program to convert images into ASCII art with Python and OpenCV
Extract images and tables from pdf with python to reduce the burden of reporting
Create a simple API just to input and output JSON files ~ Python / Flask edition ~
I want to convert an image to WebP with lollipop
Images created with matplotlib shift from dvi to pdf
Beginners try to convert Word files to PDF at once
How to share folders with Docker and Windows with tensorflow
Convert the image in .zip to PDF with Python
An easy way to create an import module with jupyter
Create a batch of images and inflate with ImageDataGenerator
[Linux] Create a self-signed certificate with Docker and apache
Steps to create a Job that pulls a Docker image and tests it with Github Actions
Steps to set up Pipenv, create a CRUD app with Flask, and containerize it with Docker
Convert 202003 to 2020-03 with pandas
Easy to use Nifty Cloud API with botocore and python
How to convert an array to a dictionary with Python [Application]
Probably the easiest way to create a pdf with Python3
Create an environment for "Deep Learning from scratch" with Docker
Create an LCD (16x2) game with Raspberry Pi and Python
Demosaic Bayer FITS files and convert them to color TIFF
Batch convert image files uploaded to MS Forms / Google Forms to PDF
Send experiment results (text and images) to slack with Python
Convert images to sepia toning with PIL (Python Imaging Library)
I'm trying to create an authentication / authorization process with Django
Create an authentication feature with django-allauth and CustomUser in Django
Create a web API that can deliver images with Django
Create a Todo app with Django ① Build an environment with Docker
[Python Kivy] How to create an exe file with pyinstaller
Convert video to black and white with ffmpeg + python + opencv
How to create dataframes and mess with elements in pandas
Upload and delete files to Google Cloud Storages with django-storage
Script to convert between Xcode language files and tab-delimited text
Until API made with Flask + MySQL is converted to Docker
I tried to create an article in Wiki.js with SQLAlchemy
Create Cognito user list in S3 with SQS Deploy queue function and API to Lambda with SAM