pdftools: pdf_ocr – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

pdftools

pdf_ocr

OCR text extraction

Description

Perform OCR text extraction. This requires you have the tesseract package.

Usage

pdf_ocr_text(
  pdf,
  pages = NULL,
  opw = "",
  upw = "",
  language = "eng",
  dpi = 600
)

pdf_ocr_data(
  pdf,
  pages = NULL,
  opw = "",
  upw = "",
  language = "eng",
  dpi = 600
)

Arguments

`pdf`	file path or raw vector with pdf data
`pages`	which pages of the pdf file to extract
`opw`	string with owner password to open pdf
`upw`	string with user password to open pdf
`language`	passed to tesseract to specify the languge of the engine.
`dpi`	resolution to render image that is passed to tesseract::ocr.

pdf_ocr

Description

Usage

Arguments

See Also

pdftools

We don't support your browser anymore