Get Vector with Counts for Positional Attribute.
The return value is an integer vector. The length of the vector is the number of unique tokens in the corpus / the number of unique ids. The order of the counts corresponds to the number of ids.
get_count_vector(corpus, p_attribute, registry = Sys.getenv("CORPUS_REGISTRY"))
corpus |
a CWB corpus |
p_attribute |
a positional attribute |
registry |
registry directory |
an integer vector
registry <- use_tmp_registry() y <- get_count_vector( corpus = "REUTERS", p_attribute = "word", registry = registry ) df <- data.frame(token_id = 0:(length(y) - 1), count = y) df[["token"]] <- cl_id2str( "REUTERS", p_attribute = "word", id = df[["token_id"]], registry = registry ) df <- df[,c("token", "token_id", "count")] # reorder columns df <- df[order(df[["count"]], decreasing = TRUE),] head(df)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.