Tesseract Training Data
tesseract_download(lang, datapath = NULL, progress = interactive())
lang |
three letter code for language, see tessdata repository. |
datapath |
destination directory where to download store the file |
progress |
print progress while downloading |
Tesseract uses training data to perform OCR. Most systems default to English training data. To improve OCR performance for other languages you can to install the training data from your distribution. For example to install the spanish training data:
tesseract-ocr-spa (Debian, Ubuntu)
tesseract-langpack-spa
(Fedora, EPEL)
On Windows and MacOS you can install languages using the tesseract_download function
which downloads training data directly from github
and stores it in a the path on disk given by the TESSDATA_PREFIX
variable.
## Not run: if(is.na(match("fra", tesseract_info()$available))) tesseract_download("fra") french <- tesseract("fra") text <- ocr("https://jeroen.github.io/images/french_text.png", engine = french) cat(text) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.