Character String Editing and Miscellaneous Character Handling Functions
This suite of functions was written to implement many of the features
of the UNIX sed
program entirely within S (function sedit
).
The substring.location
function returns the first and last position
numbers that a sub-string occupies in a larger string. The substring2<-
function does the opposite of the builtin function substring
.
It is named substring2
because for S-Plus there is a built-in
function substring
, but it does not handle multiple replacements in
a single string.
replace.substring.wild
edits character strings in the fashion of
"change xxxxANYTHINGyyyy to aaaaANYTHINGbbbb", if the "ANYTHING"
passes an optional user-specified test
function. Here, the
"yyyy" string is searched for from right to left to handle
balancing parentheses, etc. numeric.string
and all.digits
are two examples of test
functions, to check,
respectively if each of a vector of strings is a legal numeric or if it contains only
the digits 0-9. For the case where old="*$" or "^*"
, or for
replace.substring.wild
with the same values of old
or with
front=TRUE
or back=TRUE
, sedit
(if wild.literal=FALSE
) and
replace.substring.wild
will edit the largest substring
satisfying test
.
substring2
is just a copy of substring
so that
substring2<-
will work.
sedit(text, from, to, test, wild.literal=FALSE) substring.location(text, string, restrict) # substring(text, first, last) <- setto # S-Plus only replace.substring.wild(text, old, new, test, front=FALSE, back=FALSE) numeric.string(string) all.digits(string) substring2(text, first, last) substring2(text, first, last) <- value
text |
a vector of character strings for |
from |
a vector of character strings to translate from, for |
to |
a vector of character strings to translate to, for |
string |
a single character string, for |
first |
a vector of integers specifying the first position to replace for
|
last |
a vector of integers specifying the ending positions of the character
substrings to be replaced. The default is to go to the end of
the string. When |
setto |
a character string or vector of character strings used as replacements,
in |
old |
a character string to translate from for |
new |
a character string to translate to for |
test |
a function of a vector of character strings returning a logical vector
whose elements are |
wild.literal |
set to |
restrict |
a vector of two integers for |
front |
specifying |
back |
specifying |
value |
a character vector |
sedit
returns a vector of character strings the same length as text
.
substring.location
returns a list with components named first
and last
, each specifying a vector of character positions corresponding
to matches. replace.substring.wild
returns a single character string.
numeric.string
and all.digits
return a single logical value.
substring2<-
modifies its first argument
Frank Harrell
Department of Biostatistics
Vanderbilt University School of Medicine
fh@fharrell.com
x <- 'this string' substring2(x, 3, 4) <- 'IS' x substring2(x, 7) <- '' x substring.location('abcdefgabc', 'ab') substring.location('abcdefgabc', 'ab', restrict=c(3,999)) replace.substring.wild('this is a cat','this*cat','that*dog') replace.substring.wild('there is a cat','is a*', 'is not a*') replace.substring.wild('this is a cat','is a*', 'Z') qualify <- function(x) x==' 1.5 ' | x==' 2.5 ' replace.substring.wild('He won 1.5 million $','won*million', 'lost*million', test=qualify) replace.substring.wild('He won 1 million $','won*million', 'lost*million', test=qualify) replace.substring.wild('He won 1.2 million $','won*million', 'lost*million', test=numeric.string) x <- c('a = b','c < d','hello') sedit(x, c('=','he*o'),c('==','he*')) sedit('x23', '*$', '[*]', test=numeric.string) sedit('23xx', '^*', 'Y_{*} ', test=all.digits) replace.substring.wild("abcdefabcdef", "d*f", "xy") x <- "abcd" substring2(x, "bc") <- "BCX" x substring2(x, "B*d") <- "B*D" x
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.