Seems pretty self explanatory. I have the pytesseract module installed on my computer, when running the python shell or in a separate python program I can import it just fine, but when trying to import it in an inkscape extension I get this:
Traceback (most recent call last): File "DocToTXT.py", line 3, in <module> import pytesseract ModuleNotFoundError: No module named 'pytesseract'
I'm on Ubuntu 20.04.4. Seems like it cant be found in the terminal either... but IDLE can find it no problem. Attempting pip3 just gives me a Requirement already satisfied message. Maybe i just messed up the installation somehow...
UPDATE: I forgot to add it to pythonpath. That explains why it wasn't working in the terminal, but it still doesnt want to load for inkscape. Any advice? Is it possible I need to add something to my .inx file?
Still doesn't work, even copying your inx word for word & changing file names to reflect that. Must be something with my pytesseract installation then. Frustrating...
Yeah, its still a module not found error. I know you need to use PIL, and that imports just fine, its just Pytesseract that seems to be throwing a fuss. I'm wondering if I should just reinstall ubuntu and see if that helps any....
So, I did a bit more testing, and it absolutely seems like Inkscape just isn't finding my installed libraries. I decided "screw it" and just directly plopped pytesseract into the inkscape extensions folder and it worked without issue. I tried using sys.append but that didnt seem to work either, though its entirely possible I'm just messing up the ubuntu file system.
I have other libraries I would like to use, any suggestions on how I can fix this?
It has the module included, and I was able to get it to run on a clean windows 10 machine at one of my relatives, just by extracting into the extensions folder
So, question, how did you get around the issue of it being unable to find Tesseract? Every time I try to test outside of inkscape I can get pytesseract to work fine, but inside I always get a TesseractNotFound error, or when directly setting tesseract_cmd to where my tesseract binary is, getting a PermissionError: [Errno 13] Permission denied: error. And yes, I included tesseract itself in the path, I know this error occurs if you send a folder path instead of a file path, but I get the error either way.
No, I'm using my own extension, but I ran into the same problems with both. I understand i shouldn't need to specify a path, but it seems I'm cursed to have everything that shouldn't be a problem be a problem. I also seem to consistently get errors with libraries like Cairo and Reportlab that have .so files, kicking up a... well okay i dont remember the exact error, and I'm not in a place i can reproduce it at the moment.
Worth noting, most of these functions work completely fine outside of an inkscape extension, its specifically when trying to use inkscape they have these issues.
I think it's more a case that the python environment ( you can have several venvs on one machine ) does not have access to the module.
In Ubuntu you can type whereis python ( or python3 ) etc
I did actually get pytesseract working on Windows 11. However it has to be installed using pip for the Inkscape python setup instead of just system wide.
That would be very difficult for a normal Windows user to to.
I ended up just putting the tiny pytesseract.py file and a require dependancy ( packaging ) into the extension folder.
The extension should work on windows and ubuntu without installing the pytesseract module system wide.
However, looking at tesseract itself - from my point of view it may be simpler to use use subprocess.run module ( python standard library ) to run a tesseract command line.
Seems pretty self explanatory. I have the pytesseract module installed on my computer, when running the python shell or in a separate python program I can import it just fine, but when trying to import it in an inkscape extension I get this:
Traceback (most recent call last):
File "DocToTXT.py", line 3, in <module>
import pytesseract
ModuleNotFoundError: No module named 'pytesseract'
Any ideas on what I've done wrong?
I'm using Ubuntu 21.10
with
pip3 install pytesserac
I was able to use
in a basic inkscape extension with no problem
perhaps try just running python3 from the terminal and trying import pytesseract first ?
I'm on Ubuntu 20.04.4. Seems like it cant be found in the terminal either... but IDLE can find it no problem. Attempting pip3 just gives me a Requirement already satisfied message. Maybe i just messed up the installation somehow...
UPDATE: I forgot to add it to pythonpath. That explains why it wasn't working in the terminal, but it still doesnt want to load for inkscape. Any advice? Is it possible I need to add something to my .inx file?
<?xml version="1.0" encoding="UTF-8"?>
<inkscape-extension xmlns="http://www.inkscape.org/namespace/inkscape/extension">
<name>Tesseract</name>
<id>org.inkscape.ink_tesseract</id>
<!-- Parameters Here -->
<effect>
<object-type>path</object-type>
<effects-menu>
<submenu name="AAA"/>
</effects-menu>
</effect>
<script>
<command location="inx" interpreter="python">ink_tesseract.py</command>
</script>
</inkscape-extension>
Success :)
Still doesn't work, even copying your inx word for word & changing file names to reflect that. Must be something with my pytesseract installation then. Frustrating...
Is it still a failure of the module not loading ? or is a another problem.
You do have to feed a PIL Image into pytesseract.
That means base64 > Pil Image for an embedded image, or the path from the image tag > PIL Image for a linked image.
Yeah, its still a module not found error. I know you need to use PIL, and that imports just fine, its just Pytesseract that seems to be throwing a fuss. I'm wondering if I should just reinstall ubuntu and see if that helps any....
So, I did a bit more testing, and it absolutely seems like Inkscape just isn't finding my installed libraries. I decided "screw it" and just directly plopped pytesseract into the inkscape extensions folder and it worked without issue. I tried using sys.append but that didnt seem to work either, though its entirely possible I'm just messing up the ubuntu file system.
I have other libraries I would like to use, any suggestions on how I can fix this?
I wrote this:
https://gitlab.com/inklinea/ink-tesseract
It has the module included, and I was able to get it to run on a clean windows 10 machine at one of my relatives, just by extracting into the extensions folder
So, question, how did you get around the issue of it being unable to find Tesseract? Every time I try to test outside of inkscape I can get pytesseract to work fine, but inside I always get a TesseractNotFound error, or when directly setting tesseract_cmd to where my tesseract binary is, getting a
PermissionError: [Errno 13] Permission denied:
error. And yes, I included tesseract itself in the path, I know this error occurs if you send a folder path instead of a file path, but I get the error either way.Are you using the extension I posted above ?
On Ubuntu you shouldn't need to specify a path, the path
pytesseract.tesseract_cmd
is only needed for WindowsNo, I'm using my own extension, but I ran into the same problems with both. I understand i shouldn't need to specify a path, but it seems I'm cursed to have everything that shouldn't be a problem be a problem. I also seem to consistently get errors with libraries like Cairo and Reportlab that have .so files, kicking up a... well okay i dont remember the exact error, and I'm not in a place i can reproduce it at the moment.
Worth noting, most of these functions work completely fine outside of an inkscape extension, its specifically when trying to use inkscape they have these issues.
I think it's more a case that the python environment ( you can have several venvs on one machine ) does not have access to the module.
In Ubuntu you can type whereis python ( or python3 ) etc
I did actually get pytesseract working on Windows 11. However it has to be installed using pip for the Inkscape python setup instead of just system wide.
That would be very difficult for a normal Windows user to to.
I ended up just putting the tiny pytesseract.py file and a require dependancy ( packaging ) into the extension folder.
The extension should work on windows and ubuntu without installing the pytesseract module system wide.
However, looking at tesseract itself - from my point of view it may be simpler to use use subprocess.run module ( python standard library ) to run a tesseract command line.