Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

scrapenames

Resolve names using Global Names Recognition and Discovery.


Description

Uses the Global Names Recognition and Discovery service, see http://gnrd.globalnames.org/

Note: this function sometimes gives data back and sometimes not. The API that this function is extremely buggy.

Usage

scrapenames(
  url = NULL,
  file = NULL,
  text = NULL,
  engine = NULL,
  unique = NULL,
  verbatim = NULL,
  detect_language = NULL,
  all_data_sources = NULL,
  data_source_ids = NULL,
  return_content = FALSE,
  ...
)

Arguments

url

An encoded URL for a web page, PDF, Microsoft Office document, or image file, see examples

file

When using multipart/form-data as the content-type, a file may be sent. This should be a path to your file on your machine.

text

Type: string. Text content; best used with a POST request, see examples

engine

(optional) (integer) Default: 0. Either 1 for TaxonFinder, 2 for NetiNeti, or 0 for both. If absent, both engines are used.

unique

(optional) (logical) If TRUE (default), response has unique names without offsets.

verbatim

(optional) Type: boolean, If TRUE (default to FALSE), response excludes verbatim strings.

detect_language

(optional) Type: boolean, When TRUE (default), NetiNeti is not used if the language of incoming text is determined not to be English. When FALSE, NetiNeti will be used if requested.

all_data_sources

(optional) Type: boolean. Resolve found names against all available Data Sources.

data_source_ids

(optional) Type: string. Pipe separated list of data source ids to resolve found names against. See list of Data Sources http://resolver.globalnames.org/data_sources

return_content

(logical) return OCR'ed text. returns text string in x$meta$content slot. Default: FALSE

...

Further args passed to crul::verb-GET

Details

One of url, file, or text must be specified - and only one of them.

Value

A list of length two, first is metadata, second is the data as a data.frame.

Author(s)

Scott Chamberlain

Examples

## Not run: 
# Get data from a website using its URL
scrapenames('https://en.wikipedia.org/wiki/Spider')
scrapenames('https://en.wikipedia.org/wiki/Animal')
scrapenames('https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0095068')
scrapenames('https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0080498')
scrapenames('http://ucjeps.berkeley.edu/cgi-bin/get_JM_treatment.pl?CARYOPHYLLACEAE')

# Scrape names from a pdf at a URL
url <- 'https://journals.plos.org/plosone/article/file?id=
10.1371/journal.pone.0058268&type=printable'
scrapenames(url = sub('\n', '', url))

# With arguments
scrapenames(url = 'https://www.mapress.com/zootaxa/2012/f/z03372p265f.pdf',
  unique=TRUE)
scrapenames(url = 'https://en.wikipedia.org/wiki/Spider',
  data_source_ids=c(1, 169))

# Get data from a file
speciesfile <- system.file("examples", "species.txt", package = "taxize")
scrapenames(file = speciesfile)

nms <- paste0(names_list("species"), collapse="\n")
file <- tempfile(fileext = ".txt")
writeLines(nms, file)
scrapenames(file = file)

# Get data from text string
scrapenames(text='A spider named Pardosa moesta Banks, 1892')

# return OCR content
scrapenames(url='https://www.mapress.com/zootaxa/2012/f/z03372p265f.pdf',
  return_content = TRUE)

## End(Not run)

taxize

Taxonomic Information from Around the Web

v0.9.100
MIT + file LICENSE
Authors
Scott Chamberlain [aut] (<https://orcid.org/0000-0003-1444-9135>), Eduard Szoecs [aut], Zachary Foster [aut, cre], Zebulun Arendsee [aut], Carl Boettiger [ctb], Karthik Ram [ctb], Ignasi Bartomeus [ctb], John Baumgartner [ctb], James O'Donnell [ctb], Jari Oksanen [ctb], Bastian Greshake Tzovaras [ctb], Philippe Marchand [ctb], Vinh Tran [ctb], Maëlle Salmon [ctb], Gaopeng Li [ctb], Matthias Grenié [ctb], rOpenSci [fnd] (https://ropensci.org/)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.