ff: splitPathFile – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

splitPathFile

Analyze pathfile-strings

Description

splitPathFile splits a vector of pathfile-strings into path- and file-components without loss of information. unsplitPathFile restores the original pathfile-string vector. standardPathFile standardizes a vector of pathfile-strings: backslashes are replaced by slashes, except for the first two leading backslashes indicating a network share. tempPathFile returns - similar to tempfile - a vector of filenames given path(s) and file-prefix(es) and an optional extension. fftempfile returns - similar to tempPathFile - a vector of filenames following a vector of pathfile patterns that are intrepreted in a ff-specific way.

Usage

splitPathFile(x)
unsplitPathFile(splitted)
standardPathFile(x)
tempPathFile(splitted=NULL, path=splitted$path, prefix=splitted$file, extension=NULL)
fftempfile(x)

Arguments

`x`	a character vector of pathfile strings
`splitted`	a return value from `splitPathFile`
`path`	a character vector of path components
`prefix`	a character vector of file components
`extension`	optional extension like "csv" (or NULL)

Details

dirname and basename remove trailing file separators and therefore cannot distinguish pathfile string that contains ONLY a path from a pathfile string that contains a path AND file. Therefore file.path(dirname(pathfile), basename(pathfile)) cannot always restore the original pathfile string.
splitPathFile decomposes each pathfile string into three parts: a path BEFORE the last file separator, the file separator, the filename component AFTER the last file separator. If there is no file separator in the string, splitPathFile tries to guess whether the string is a path or a file component: ".", ".." and "~" are recognized as path components. No tilde expansion is done, see path.expand. Backslashes are converted to the current .Platform$file.sep using splitPathFile except for the first two leading backslashes indicating a network share.
unsplitPathFile restores the original pathfile-string vector up to translated backslashes.
tempPathFile internally uses tempfile to create its filenames, if an extension is given it repeats filename creation until none of them corresponds to an existing file.
fftempfile takes a path-prefix pattern as input, splits it, will replace an empty path by getOption("fftempdir") and will use getOption("ffextension") as extension.

Value

A list with components

`path`	a character vector of path components
`fsep`	a character vector of file separators or ""
`file`	a character vector of file components

Note

There is no gurantee that the path and file components contain valid path- or file-names. Like basename, splitPathFile can return ".", ".." or even "", however, all these make sense as a prefix in tempPathFile.

Author(s)

Jens Oehlschlägel

Examples

pathfile <- c("", ".", "/.", "./", "./.", "/"
  , "a", "a/", "/a", "a/a", "./a", "a/.", "c:/a/b/c", "c:/a/b/c/"
  , "..", "../", "/..", "../..", "//", "\\\\a\\", "\\\\a/"
  , "\\\\a/b", "\\\\a/b/", "~", "~/", "~/a", "~/a/")
  splitted <- splitPathFile(pathfile)
  restored <- unsplitPathFile(splitted)
  stopifnot(all(gsub("\\\\","/",restored)==gsub("\\\\","/",pathfile)))

  dirnam <- dirname(pathfile)
  basnam <- basename(pathfile)

  db  <- file.path(dirnam,basnam)
  ident = gsub("\\\\","/",db) == gsub("\\\\","/",pathfile)
  sum(!ident)

  do.call("data.frame", c(list(ident=ident, pathfile=pathfile
   , dirnam=dirnam, basnam=basnam), splitted))

  ## Not run: 
    message("show the difference between tempfile and fftempfile")
    do.call("data.frame", c(list(ident=ident, pathfile=pathfile, dirnam=dirnam, basnam=basnam)
, splitted, list(filename=tempPathFile(splitted), fftempfile=fftempfile(pathfile))))

    message("for a single string splitPathFile is slower, 
for vectors of strings it scales much better than dirname+basename")

    system.time(for (i in 1:1000){
      d <- dirname(pathfile)
      b <- basename(pathfile)
    })
    system.time(for (i in 1:1000){
      s <- splitPathFile(pathfile)
    })

    len <- c(1,10,100,1000)
    timings <- matrix(0, 2, length(len), dimnames=list(c("dir.base.name", "splitPathFile"), len))
    for (j in seq(along=len)){
      l <- len[j]
      r <- 10000 / l
      x <- rep("\\\\a/b/", l)
      timings[1,j] <- system.time(for (i in 1:r){
          d <- dirname(x)
          b <- basename(x)
        })[3]
      timings[2,j] <- system.time(for (i in 1:r){
          s <- splitPathFile(x)
        })[3]
    }
    timings
  
## End(Not run)

ff

Memory-Efficient Storage of Large Data on Disk and Fast Access Functions

v4.0.4

GPL-2 | GPL-3 | file LICENSE

Authors

Daniel Adler [aut], Christian Gläser [aut], Oleg Nenadic [aut], Jens Oehlschlägel [aut, cre], Martijn Schuemie [aut], Walter Zucchini [aut]

Initial release

2020-10-13