Inner and Terminal Nodes
A class for representing inner and terminal nodes in trees and functions for data partitioning.
partynode(id, split = NULL, kids = NULL, surrogates = NULL, info = NULL) kidids_node(node, data, vmatch = 1:ncol(data), obs = NULL, perm = NULL) fitted_node(node, data, vmatch = 1:ncol(data), obs = 1:nrow(data), perm = NULL) id_node(node) split_node(node) surrogates_node(node) kids_node(node) info_node(node) formatinfo_node(node, FUN = NULL, default = "", prefix = NULL, ...)
id |
integer, a unique identifier for a node. |
split |
an object of class |
kids |
a list of |
surrogates |
a list of |
info |
additional information. |
node |
an object of class |
data |
a |
vmatch |
a permutation of the variable numbers in |
obs |
a logical or integer vector indicating a subset of the
observations in |
perm |
a vector of integers specifying the variables
to be permuted prior before splitting (i.e., for
computing permutation variable importances). The
default |
FUN |
function for formatting the |
default |
a character used if the |
prefix |
an optional prefix to be added to the returned character. |
... |
further arguments passed to
|
A node represents both inner and terminal nodes in a tree structure.
Each node has a unique identifier id
. A node consisting only
of such an identifier (and possibly additional information
in info
) is a terminal node.
Inner nodes consist of a primary split
(an object of class partysplit
)
and at least two kids (daughter nodes).
Kid nodes are objects of class partynode
itself, so the tree structure is defined recursively.
In addition, a list of partysplit
objects
offering surrogate splits can be supplied. Like
partysplit
objects, partynode
objects aren't connected to the actual data.
Function kidids_node()
determines how
the observations in data[obs,]
are partitioned
into the kid nodes and returns the number of the list element
in list kids
each observations belongs to
(and not it's identifier).
This is done by evaluating split
(and possibly
all surrogate splits) on data
using
kidids_split
.
Function fitted_node()
performs all
splits recursively and returns the identifier id
of the terminal node each observation in
data[obs,]
belongs to. Arguments vmatch
,
obs
and perm
are passed to kidids_split
.
Function formatinfo_node()
extracts the the info
from node
and formats it to a character
vector using the following
strategy: If is.null(info)
, the default
is returned.
Otherwise, FUN
is applied for formatting. The default function uses
as.character
for atomic objects and applies capture.output
to print(info)
for other objects. Optionally, a prefix
can be added
to the computed character string.
All other functions are accessor functions for
extracting information from objects of class partynode
.
The constructor partynode()
returns an object of class partynode
:
id |
a unique integer identifier for a node. |
split |
an object of class |
kids |
a list of |
surrogates |
a list of |
info |
additional information. |
kidids_split()
returns an integer vector describing
the partition of the observations into kid nodes by their position
in list kids
.
fitted_node()
returns the node identifiers (id
)
of the terminal nodes each observation belongs to.
Hothorn T, Zeileis A (2015). partykit: A Modular Toolkit for Recursive Partytioning in R. Journal of Machine Learning Research, 16, 3905–3909.
data("iris", package = "datasets") ## a stump defined by a binary split in Sepal.Length stump <- partynode(id = 1L, split = partysplit(which(names(iris) == "Sepal.Length"), breaks = 5), kids = lapply(2:3, partynode)) ## textual representation print(stump, data = iris) ## list element number and node id of the two terminal nodes table(kidids_node(stump, iris), fitted_node(stump, data = iris)) ## assign terminal nodes with probability 0.5 ## to observations with missing `Sepal.Length' iris_NA <- iris iris_NA[sample(1:nrow(iris), 50), "Sepal.Length"] <- NA table(fitted_node(stump, data = iris_NA, obs = !complete.cases(iris_NA))) ## a stump defined by a primary split in `Sepal.Length' ## and a surrogate split in `Sepal.Width' which ## determines terminal nodes for observations with ## missing `Sepal.Length' stump <- partynode(id = 1L, split = partysplit(which(names(iris) == "Sepal.Length"), breaks = 5), kids = lapply(2:3, partynode), surrogates = list(partysplit( which(names(iris) == "Sepal.Width"), breaks = 3))) f <- fitted_node(stump, data = iris_NA, obs = !complete.cases(iris_NA)) tapply(iris_NA$Sepal.Width[!complete.cases(iris_NA)], f, range)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.