Spacing between collumns is converted to 1 space #417
EntiusGJ
started this conversation in
Ask for help with specific PDFs
Replies: 3 comments 1 reply
-
Hi @EntiusGJ Appreciate your interest in the library. Request you to share the PDF so that I can look into this in more detail. Before sharing the PDF, please ensure that you redact any sensitive information from it. Looking at the screenshot, I will recommend 2 options:
Have a look at https://github.com/jsvine/pdfplumber#table-extraction-methods to know more on how to use pdfplumber for table extraction. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Good day to you all 😊
Sorry for the delay but been busy lately.
I am fairly new to Python and not into the syntax, I am asking help again.
My simple program is this and try to incorporate the suggested strategy.
ts = {"vertical_strategy": "lines", "horizontal_strategy": "text"}
PdfFilename = os.path.basename(InFilename)
(file, ext) = os.path.splitext(PdfFilename)
#username = input("Hit any key......")
with open(OutFilename, 'w') as f:
f.write('....... \n' )
f.write('')
with pdfplumber.open(InFilename) as pdf :
pages = pdf.pages
for page in pdf.pages:
#print(page.number)
text = page.extract_text (table_settings=ts)
#print (text)
f.write(text)
# end for pdf.Pages
#end with PDF file
When adding the ‘ts’ variable the program executes the 2 f.write commands and bails out.
Any suggestions how to tackle this?
Thanks in advance and stay healthy.
Met vriendelijke groet / With kind regards
Gerard Entius
Versterstraat 36 | 9301 Bloemfontein | * +27 (0) 798 723 122
From: Samkit Jain ***@***.***>
Sent: 18 April, 2021 15:20
To: jsvine/pdfplumber ***@***.***>
Cc: Gerard Entius ***@***.***>; Mention ***@***.***>
Subject: Re: [jsvine/pdfplumber] Spacing between collumns is converted to 1 space (#417)
Hi @EntiusGJ <https://github.com/EntiusGJ> Appreciate your interest in the library. Request you to share the PDF so that I can look into this in more detail. Before sharing the PDF, please ensure that you redact any sensitive information from it.
Looking at the screenshot, I will recommend 2 options:
1. Use the table extraction strategy as
2. {
3. "horizontal_strategy": "lines",
4. "vertical_strategy": "text"
}
You may also try with "vertical_strategy": "lines" in case there are any hidden vertical separators in the table.
5. Use the table extraction strategy as
6. {
7. "horizontal_strategy": "lines",
8. "vertical_strategy": "explicit",
9. "explicit_vertical_lines": [] # < List of X coordinates of the vertical separators
}
in case the position of the vertical separators in the table never changes.
Have a look at https://github.com/jsvine/pdfplumber#table-extraction-methods to know more on how to use pdfplumber for table extraction.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#417 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AL4MRA7JDIYWSZVB24DGKDTTJLMANANCNFSM43EDYZBQ> . <https://github.com/notifications/beacon/AL4MRA23K6MEQALYCQ33ICLTJLMANA5CNFSM43EDYZB2YY3PNVWWK3TUL52HS4DFWFCGS43DOVZXG2LPNZBW63LNMVXHJKTDN5WW2ZLOORPWSZGOAAEY52Q.gif>
…--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
|
Beta Was this translation helpful? Give feedback.
1 reply
-
Hi,
Sorry fort his late reply, but it is project with no high priority.
Attached is the PDF I want to decode and store the data in a database. I Also attached the text as I can retrieve is now. The first and last line I added for testing purposes and the personal info is masked with “?”.
Thanks for your help and please let me know hot to get a improved ( TAB separated? ) output.
Stay healthy 😊
Met vriendelijke groet / With kind regards
Gerard Entius
Versterstraat 36 | 9301 Bloemfontein | * +27 (0) 798 723 122
From: Samkit Jain ***@***.***>
Sent: 18 April, 2021 15:20
To: jsvine/pdfplumber ***@***.***>
Cc: Gerard Entius ***@***.***>; Mention ***@***.***>
Subject: Re: [jsvine/pdfplumber] Spacing between collumns is converted to 1 space (#417)
Hi @EntiusGJ <https://github.com/EntiusGJ> Appreciate your interest in the library. Request you to share the PDF so that I can look into this in more detail. Before sharing the PDF, please ensure that you redact any sensitive information from it.
Looking at the screenshot, I will recommend 2 options:
1. Use the table extraction strategy as
2. {
3. "horizontal_strategy": "lines",
4. "vertical_strategy": "text"
}
You may also try with "vertical_strategy": "lines" in case there are any hidden vertical separators in the table.
5. Use the table extraction strategy as
6. {
7. "horizontal_strategy": "lines",
8. "vertical_strategy": "explicit",
9. "explicit_vertical_lines": [] # < List of X coordinates of the vertical separators
}
in case the position of the vertical separators in the table never changes.
Have a look at https://github.com/jsvine/pdfplumber#table-extraction-methods to know more on how to use pdfplumber for table extraction.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#417 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AL4MRA7JDIYWSZVB24DGKDTTJLMANANCNFSM43EDYZBQ> . <https://github.com/notifications/beacon/AL4MRA23K6MEQALYCQ33ICLTJLMANA5CNFSM43EDYZB2YY3PNVWWK3TUL52HS4DFWFCGS43DOVZXG2LPNZBW63LNMVXHJKTDN5WW2ZLOORPWSZGOAAEY52Q.gif>
…--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
.......
Makro SA / A Division of Masstores (Pty) Ltd
Corner Eland Str & N8,Kwagga Fontein,Bloemfontein,
Company Reg No: 1991/006805/07
VAT Reg No : 4300119155
NLA Reg No : RG0000488 Reprint Date : Mon 19/10/2020
Registered Status : Distributor Reprint Time : 19:33
Liquor Store Lic : FSGL 02/10/28/12 Reprint Store : 23 MAKRO
Grocers Wine Lic : FSGL 02/12/10/12 Page : 1 of 2
COPY TAX INVOICE
????????? POS No : 59
????????? Invoice No : 24
Sales Date : Mon 19/10/2020
Sales Str : 23 MAKRO BLOEMFONTEIN
VAT Reg No : NOT APPLICABLE Cashier ID : 30
Unique Ref : 0590242319102020
????????? Cust. Ref :
?????????
Order ID :
?????????
????????? Orig Inv Ref :
QTY UNIT/PK WEIGHT (Kg)
BARCODE DESCRIPTION DIS SGL INC PK INC VAT CD TOTAL EXC TOTAL INC
2 1
06001241006862 ILLOVO WHITE SUGAR 10KG 02 284.00 284.00 2 493.92 568.00
1 1
06009900265322 MOSTRA DI CAFE C/BEAN 1KG, FORZA 11 162.88 162.88 2 141.63 162.88
1 1
06001069036096 BOBTAIL DOG FD 7KG PUPPY MIN CHNK 02 134.00 134.00 2 116.52 134.00
1 1
06001156920550 BOKOMO PRONUTRO 1.5KG, 104.95 104.95 2 91.26 104.95
1 1
06001019000252 BABYSOFT TOILET ROLLS 2-PLY 18'S 02 99.95 99.95 2 86.91 99.95
1 1
06004612001008 SUPABAKE CAKE FLOUR 10KG 11 87.07 87.07 0 87.07 87.07
1 1
06001324011189 BROOKES OROS ORANGE SQUASH 5LT 11 85.45 85.45 2 74.31 85.45
1 1
07613033677724 NESCAFE CLASSIC COFFEE 200G 02 70.00 70.00 2 60.87 70.00
1 1
06001087004695 SUNLIGHT AUTO WASH PWD 2KG, 11 56.00 56.00 2 48.70 56.00
3 1
06009629660347 DENNY MUSHROOM WHTE 250G 02 15.00 15.00 0 45.00 45.00
1 1
06009606260324 NUTRIFIC CEREAL 900G 11 38.71 38.71 2 33.66 38.71
2 1
06009682952595 ROYCO REG PASTA SAUCE 45G, SOUR 11 15.63 15.63 2 27.18 31.26
1 1
06009522301262 C&B MAYONNAISE 750G, REGULAR 11 30.16 30.16 2 26.23 30.16
2 1
06001571126155 PASTA GRANDE SCREWS 500G 11 13.25 13.25 2 23.04 26.50
2 1
06009880412587 TWIZZA 2LT, LEMON/LIME 11 10.40 10.40 2 18.10 20.80
Please Note : This is a copy of your original invoice and is for record keeping purposes only.Makro SA / A Division of Masstores (Pty) Ltd
Corner Eland Str & N8,Kwagga Fontein,Bloemfontein,
Company Reg No: 1991/006805/07
VAT Reg No : 4300119155
NLA Reg No : RG0000488 Reprint Date : Mon 19/10/2020
Registered Status : Distributor Reprint Time : 19:33
Liquor Store Lic : FSGL 02/10/28/12 Reprint Store : 23 MAKRO
Grocers Wine Lic : FSGL 02/12/10/12 Page : 2 of 2
COPY TAX INVOICE
??????????? POS No : 59
??????????? Invoice No : 24
Sales Date : Mon 19/10/2020
Sales Str : 23 MAKRO BLOEMFONTEIN
VAT Reg No : NOT APPLICABLE Cashier ID : 30
Unique Ref : 0590242319102020
???????????? Cust. Ref
????????????
Order ID :
????????????
???????????? Orig Inv Ref :
QTY UNIT/PK WEIGHT (Kg)
BARCODE DESCRIPTION DIS SGL INC PK INC VAT CD TOTAL EXC TOTAL INC
TOTALS Total VAT 186.33 1 374.37 1 560.70
21 ARTICLES ON THIS INVOICE Including invoice rounding of -0.03
VAT SUMMARY
Vat Code Vat % Goods Amount Vat Amount
0 0.00 132.07 0.00
2 15.00 1 242.30 186.33
PAYMENT SUMMARY
CARD PAYMENT 528497******3294 CREDIT CARD 1 560.70
You Saved 96.32
Please Note : This is a copy of your original invoice and is for record keeping purposes only.
Converter 1.0<->OK<->0<->no-error
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Good day,
I am new to PdfPlumber but got one application working. Started on the next but this one is tabular text but the space between the columns is only 1 space. The text in the one of the columns contains also spaces and therefore impossible to distinct the separate columns.
Is PDFPlumber recognizing tabular formatted lines? Or can I enable this. Is there a option to separate the columns by a Tab character or other non printable one?
This is an example of the body of an invoice:
This is the text:
04015400264613 PAMPERS ACTIVE JUMBO PACK MINI 94'S 02 185.00 185.00 2 321.74 370.00
06009900265322 MOSTRA DI CAFE C/BEAN 1KG, FORZA 11 162.88 162.88 2 141.63 162.88
any suggestions are welcome :-)
Thanks in advance,
Gerard
Beta Was this translation helpful? Give feedback.
All reactions