Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

pdf_ocr

OCR text extraction


Description

Perform OCR text extraction. This requires you have the tesseract package.

Usage

pdf_ocr_text(
  pdf,
  pages = NULL,
  opw = "",
  upw = "",
  language = "eng",
  dpi = 600
)

pdf_ocr_data(
  pdf,
  pages = NULL,
  opw = "",
  upw = "",
  language = "eng",
  dpi = 600
)

Arguments

pdf

file path or raw vector with pdf data

pages

which pages of the pdf file to extract

opw

string with owner password to open pdf

upw

string with user password to open pdf

language

passed to tesseract to specify the languge of the engine.

dpi

resolution to render image that is passed to tesseract::ocr.

See Also

Other pdftools: pdftools, qpdf, rendering


pdftools

Text Extraction, Rendering and Converting of PDF Documents

v3.0.1
MIT + file LICENSE
Authors
Jeroen Ooms [aut, cre] (<https://orcid.org/0000-0002-4035-0289>)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.