PDF utilities
Utilities based on libpoppler for extracting text, fonts, attachments and metadata from a pdf file.
pdf_info(pdf, opw = "", upw = "") pdf_text(pdf, opw = "", upw = "") pdf_data(pdf, font_info = FALSE, opw = "", upw = "") pdf_fonts(pdf, opw = "", upw = "") pdf_attachments(pdf, opw = "", upw = "") pdf_toc(pdf, opw = "", upw = "") pdf_pagesize(pdf, opw = "", upw = "")
pdf |
file path or raw vector with pdf data |
opw |
string with owner password to open pdf |
upw |
string with user password to open pdf |
font_info |
if TRUE, extract font-data for each box. Be careful, this requires a very recent version of poppler and will error otherwise. |
Note that pdf_data
requires a recent version of libpoppler
which might not be available on all Linux systems.
When using pdf_data
in R packages, condition use on
poppler_config()$has_pdf_data
which shows if this function can be
used on the current system. For Ubuntu 16.04 (Xenial) and 18.04 (Bionic)
you can use the PPA
with backports of Poppler 0.74.0.
Poppler is pretty verbose when encountering minor errors in PDF files,
in especially pdf_text
. These messages are usually safe
to ignore, use suppressMessages
to hide them altogether.
Other pdftools:
pdf_ocr_text()
,
qpdf
,
rendering
# Just a random pdf file pdf_file <- file.path(R.home("doc"), "NEWS.pdf") info <- pdf_info(pdf_file) text <- pdf_text(pdf_file) fonts <- pdf_fonts(pdf_file) files <- pdf_attachments(pdf_file)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.