keras: dataset_reuters – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

dataset_reuters

Reuters newswire topics classification

Description

Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with dataset_imdb() , each wire is encoded as a sequence of word indexes (same conventions).

Usage

dataset_reuters(
  path = "reuters.npz",
  num_words = NULL,
  skip_top = 0L,
  maxlen = NULL,
  test_split = 0.2,
  seed = 113L,
  start_char = 1L,
  oov_char = 2L,
  index_from = 3L
)

dataset_reuters_word_index(path = "reuters_word_index.pkl")

Arguments

`path`	Where to cache the data (relative to `~/.keras/dataset`).
`num_words`	Max number of words to include. Words are ranked by how often they occur (in the training set) and only the most frequent words are kept
`skip_top`	Skip the top N most frequently occuring words (which may not be informative).
`maxlen`	Truncate sequences after this length.
`test_split`	Fraction of the dataset to be used as test data.
`seed`	Random seed for sample shuffling.
`start_char`	The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character.
`oov_char`	words that were cut out because of the `num_words` or `skip_top` limit will be replaced with this character.
`index_from`	index actual words with this index and higher.

Value

Lists of training and test data: train$x, train$y, test$x, test$y with same format as dataset_imdb(). The dataset_reuters_word_index() function returns a list where the names are words and the values are integer. e.g. word_index[["giraffe"]] might return 1234.

dataset_reuters

Description

Usage

Arguments

Value

See Also

keras

We don't support your browser anymore