Reuters newswire topics classification
Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with
dataset_imdb()
, each wire is encoded as a sequence of word indexes (same
conventions).
dataset_reuters( path = "reuters.npz", num_words = NULL, skip_top = 0L, maxlen = NULL, test_split = 0.2, seed = 113L, start_char = 1L, oov_char = 2L, index_from = 3L ) dataset_reuters_word_index(path = "reuters_word_index.pkl")
path |
Where to cache the data (relative to |
num_words |
Max number of words to include. Words are ranked by how often they occur (in the training set) and only the most frequent words are kept |
skip_top |
Skip the top N most frequently occuring words (which may not be informative). |
maxlen |
Truncate sequences after this length. |
test_split |
Fraction of the dataset to be used as test data. |
seed |
Random seed for sample shuffling. |
start_char |
The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character. |
oov_char |
words that were cut out because of the |
index_from |
index actual words with this index and higher. |
Lists of training and test data: train$x, train$y, test$x, test$y
with same format as dataset_imdb()
. The dataset_reuters_word_index()
function returns a list where the names are words and the values are
integer. e.g. word_index[["giraffe"]]
might return 1234
.
Other datasets:
dataset_boston_housing()
,
dataset_cifar100()
,
dataset_cifar10()
,
dataset_fashion_mnist()
,
dataset_imdb()
,
dataset_mnist()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.