solrium: update_csv – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

solrium

update_csv

Update documents with CSV data

Description

Update documents with CSV data

Usage

update_csv(conn, files, name, separator = ",", header = TRUE,
  fieldnames = NULL, skip = NULL, skipLines = 0, trim = FALSE,
  encapsulator = NULL, escape = NULL, keepEmpty = FALSE,
  literal = NULL, map = NULL, split = NULL, rowid = NULL,
  rowidOffset = NULL, overwrite = NULL, commit = NULL, wt = "json",
  raw = FALSE, ...)

Arguments

`conn`	A solrium connection object, see SolrClient
`files`	Path to a single file to load into Solr
`name`	(character) Name of the core or collection
`separator`	Specifies the character to act as the field separator. Default: ','
`header`	TRUE if the first line of the CSV input contains field or column names. Default: `TRUE`. If the fieldnames parameter is absent, these field names will be used when adding documents to the index.
`fieldnames`	Specifies a comma separated list of field names to use when adding documents to the Solr index. If the CSV input already has a header, the names specified by this parameter will override them. Example: fieldnames=id,name,category
`skip`	A comma separated list of field names to skip in the input. An alternate way to skip a field is to specify it's name as a zero length string in fieldnames. For example, `fieldnames=id,name,category&skip=name` skips the name field, and is equivalent to `fieldnames=id,,category`
`skipLines`	Specifies the number of lines in the input stream to discard before the CSV data starts (including the header, if present). Default: `0`
`trim`	If true remove leading and trailing whitespace from values. CSV parsing already ignores leading whitespace by default, but there may be trailing whitespace, or there may be leading whitespace that is encapsulated by quotes and is thus not removed. This may be specified globally, or on a per-field basis. Default: `FALSE`
`encapsulator`	The character optionally used to surround values to preserve characters such as the CSV separator or whitespace. This standard CSV format handles the encapsulator itself appearing in an encapsulated value by doubling the encapsulator.
`escape`	The character used for escaping CSV separators or other reserved characters. If an escape is specified, the encapsulator is not used unless also explicitly specified since most formats use either encapsulation or escaping, not both.
`keepEmpty`	Keep and index empty (zero length) field values. This may be specified globally, or on a per-field basis. Default: `FALSE`
`literal`	Adds fixed field name/value to all documents. Example: Adds a "datasource" field with value equal to "products" for every document indexed from the CSV `literal.datasource=products`
`map`	Specifies a mapping between one value and another. The string on the LHS of the colon will be replaced with the string on the RHS. This parameter can be specified globally or on a per-field basis. Example: replaces "Absolutely" with "true" in every field `map=Absolutely:true`. Example: removes any values of "RemoveMe" in the field "foo" `f.foo.map=RemoveMe:&f.foo.keepEmpty=false`
`split`	If TRUE, the field value is split into multiple values by another CSV parser. The CSV parsing rules such as separator and encapsulator may be specified as field parameters. See https://wiki.apache.org/solr/UpdateCSV#split for examples.
`rowid`	If not null, add a new field to the document where the passed in parameter name is the field name to be added and the current line/rowid is the value. This is useful if your CSV doesn't have a unique id already in it and you want to use the line number as one. Also useful if you simply want to index where exactly in the original CSV file the row came from
`rowidOffset`	In conjunction with the rowid parameter, this integer value will be added to the rowid before adding it the field.
`overwrite`	If true (the default), check for and overwrite duplicate documents, based on the uniqueKey field declared in the solr schema. If you know the documents you are indexing do not contain any duplicates then you may see a considerable speed up with &overwrite=false.
`commit`	Commit changes after all records in this request have been indexed. The default is commit=false to avoid the potential performance impact of frequent commits.
`wt`	(character) One of json (default) or xml. If json, uses `jsonlite::fromJSON()` to parse. If xml, uses `xml2::read_xml()` to parse
`raw`	(logical) If `TRUE`, returns raw data in format specified by `wt` param
`...`	curl options passed on to crul::HttpClient

Note

SOLR v1.2 was first version to support csv. See https://issues.apache.org/jira/browse/SOLR-66

Examples

## Not run: 
# start Solr: bin/solr start -f -c -p 8983

# connect
(conn <- SolrClient$new())

if (!conn$collection_exists("helloWorld")) {
  conn$collection_create(name = "helloWorld", numShards = 2)
}

df <- data.frame(id=1:3, name=c('red', 'blue', 'green'))
write.csv(df, file="df.csv", row.names=FALSE, quote = FALSE)
conn$update_csv("df.csv", "helloWorld", verbose = TRUE)

# give back raw xml
conn$update_csv("df.csv", "helloWorld", wt = "xml")
## raw json
conn$update_csv("df.csv", "helloWorld", wt = "json", raw = TRUE)

## End(Not run)

solrium

General Purpose R Interface to 'Solr'

v1.1.4

MIT + file LICENSE

Authors

Scott Chamberlain [aut, cre] (<https://orcid.org/0000-0003-1444-9135>), rOpenSci [fnd] (https://ropensci.org/)

Initial release

update_csv

Description

Usage

Arguments

Note

See Also

Examples

solrium

We don't support your browser anymore