Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results between servers with the same code and PDF file #260

Open
zekriHichem opened this issue Mar 22, 2023 · 2 comments
Open

Comments

@zekriHichem
Copy link

zekriHichem commented Mar 22, 2023

Describe the bug
I am running the same Python 3.8.6 code with the same PDF file and the same version of the pdf2image 1.16.2 library on two different servers. On one server, the code works perfectly and produces the expected output (a list of PIL image objects), but on the other server, the code returns an empty list.

I have tried checking the versions of Python and all dependencies (including pdf2image) on both servers to ensure that they are the same, but the issue persists. I have also tried running the code on both servers with a different PDF file to see if the issue is specific to the PDF file, but this did not help.

I am not seeing any error messages or logs that indicate what might be causing the issue on the server that returns an empty list.

Can you provide any guidance on how to troubleshoot this issue further? Is there anything specific about the pdf2image library or the environment that might be causing inconsistencies in the output between servers?

To Reproduce

Unfortunately, I am unable to reproduce the issue consistently. When running the same code with the same PDF file and the same version of the pdf2image library on two different servers, one server produces the expected output (a list of PIL image objects), but the other server returns an empty list.

If you have any guidance on how to troubleshoot this issue further, or any ideas as to what might be causing inconsistencies in the output between servers, I would greatly appreciate it.

Expected behavior
The code should produce the same output (a list of PIL image objects) on both servers.

Actual Results:
On one server, the code works perfectly and produces the expected output. On the other server, the code returns an empty list.

code
The b64_pdf is not empty.

from pdf2image import convert_from_bytes
file = BytesIO(base64.b64decode(b64_pdf.encode(UTF_8)))
logger.info(f"------------01-----------{file == None}")
try:
       images = convert_from_bytes(file.read(), fmt="JPEG" ,dpi=300, thread_count=4)
       logger.info(f"-----------02----------{images == None}")
       logger.info(f"-----------02----------{images == []}")
except Exception as e:
         logger.exception(f"Error on given input : {e}")
         raise DecompressionBombError(
                message="Exceeded the max pixel count for Pillow"
            )

Desktop (please complete the following information):

  • Kubernetes cluster
  • python 3.8.6
  • Pdf2image Version 1.16.2
@zekriHichem
Copy link
Author

It is possible that the issue you are experiencing is related to a difference in the amount of memory allocated on the two servers. When using the pdf2image library to extract images from a PDF, this can take a considerable amount of memory depending on the size of the PDF and the number of images to be extracted.

It is important to check that the servers have similar memory specifications and that the same amount of memory is allocated to the Python process when running your code. If one of the servers has less memory or if the amount allocated to the Python process is lower, this could explain why the code fails on that server.

It was my problem, but I augmented the memory and that worked for me.

But it doesn't return any error or exception, it's just an empty list. Normally, if there was a memory problem, it would have returned an exception of the type 'out of memory' or something like that.

@andrew-cybsafe
Copy link

I've also come across this. Looking through the poppler bug list, it could be related to https://gitlab.freedesktop.org/poppler/poppler/-/issues/1403.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants