camelot is not good at dotted lines and often fails, so when I looked it up, I found the following reference article
Since camelot is extracted with opencv, it seems that you can rewrite the dotted line, so I extracted the dotted line with Hough transform and overwrote it with the solid line and it worked.
[Process the dotted line as a solid line with camelot]( % e3% 82% 92% e5% ae% 9f% e7% b7% 9a% e3% 81% a8% e3% 81% 97% e3% 81% a6% e5% 87% a6% e7% 90% 86% e3 % 81% 99% e3% 82% 8b /)
I will use the dotted PDF next to this article
Linear detection by Hough transform of OpenCV
Straight line extraction with Hough transform
Extract only horizontal straight lines by Hough transform
import cv2
import numpy as np
import camelot
#Patch creation
def my_threshold(imagename, process_background=False, blocksize=15, c=-2):
img = cv2.imread(imagename)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 50, 150, apertureSize=3)
lines = cv2.HoughLinesP(
edges, rho=1, theta=np.pi / 180, threshold=80, minLineLength=3000, maxLineGap=50
for line in lines:
x1, y1, x2, y2 = line[0]
#Y1 if horizontal==y2, x1 for vertical==Filter by x2 if
cv2.line(img, (x1, y1), (x2, y2), (0, 0, 0), 1)
if process_background:
threshold = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, blocksize, c
threshold = cv2.adaptiveThreshold(
return img, threshold
camelot.parsers.lattice.adaptive_threshold = my_threshold
tables = camelot.read_pdf("data.pdf", pages="all")
Since the dotted line part does not react, it is vertically connected.
Recommended Posts