Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tika-Tesseract for Windows #18

Open
lemoncalamitous opened this issue Feb 28, 2019 · 2 comments
Open

Tika-Tesseract for Windows #18

lemoncalamitous opened this issue Feb 28, 2019 · 2 comments

Comments

@lemoncalamitous
Copy link

Hi,

This is not an issue but rather a question. I am currently using this Dockerfile and manage to have containerized Tika Server running successfully using Docker for Windows.

My question is that.. can the steps on Dockerfile be replicated using purely Windows approach?

I cannot do text extraction from images using tika-server-1.20.jar on my end, while that works using the containerized Tika Server. I have setup Tesseract correctly with data file/s on Windows, but only Tesseract works, not my Tika Server using the .jar file.

Pardon my ignorance but please enlighten me. Thank you!

@epugh
Copy link

epugh commented Oct 8, 2019

Have you looked at the https://github.com/KevM/tikaondotnet project? One of my colleagues uses it.

I'd love to hear how you installed Tesseract on windows!

@dameikle
Copy link
Contributor

Hi @lemoncalamitous.

Whilst I've not tried on Windows personally, knowing the TesseractOCRParser, this should be possible. The default config assume tesseract is on the system path, if not or you don't want to have it on it, you'll need to specify the path it using a custom configuration[1].

If you still have issues, either drop me a mail directly or jump on the Tika Users mailing list

[1] See 'Overriding Default Configuration' in https://cwiki.apache.org/confluence/display/tika/TikaOCR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants