Problem with table extraction #589
IrinaOganezova
started this conversation in
Ask for help with specific PDFs
Replies: 3 comments
-
Hi @IrinaOganezova, and thanks for providing the PDF. Parsing this in an automated way might be tricky (though possible, if you programmatically identify the location of the columns and headers), but a more manual approach like this could work: table_settings = {
"horizontal_strategy": "text",
"vertical_strategy": "explicit",
"explicit_vertical_lines": [ 70, 280, 325, 370, 410, 450, 488 ]
} That will identify a table that contains your data — but which you might also want to clean up by first using |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thank you very much for help ! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The lines have different length, the extraction of pdf with following parameters
{"vertical_strategy": "lines",
"horizontal_strategy": "text",
"intersection_tolerance": 15,
"snap_tolerance": 3,}
results is
0 None 24
1 None
Please help with solution
Beta Was this translation helpful? Give feedback.
All reactions