Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

BreastCancer

Wisconsin Breast Cancer Database


Description

The objective is to identify each of a number of benign or malignant classes. Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database therefore reflects this chronological grouping of the data. This grouping information appears immediately below, having been removed from the data itself. Each variable except for the first was converted into 11 primitive numerical attributes with values ranging from 0 through 10. There are 16 missing attribute values. See cited below for more details.

Usage

data(BreastCancer)

Format

A data frame with 699 observations on 11 variables, one being a character variable, 9 being ordered or nominal, and 1 target class.

[,1] Id Sample code number
[,2] Cl.thickness Clump Thickness
[,3] Cell.size Uniformity of Cell Size
[,4] Cell.shape Uniformity of Cell Shape
[,5] Marg.adhesion Marginal Adhesion
[,6] Epith.c.size Single Epithelial Cell Size
[,7] Bare.nuclei Bare Nuclei
[,8] Bl.cromatin Bland Chromatin
[,9] Normal.nucleoli Normal Nucleoli
[,10] Mitoses Mitoses
[,11] Class Class

Source

  • Creator: Dr. WIlliam H. Wolberg (physician); University of Wisconsin Hospital ;Madison; Wisconsin; USA

  • Donor: Olvi Mangasarian (mangasarian@cs.wisc.edu)

  • Received: David W. Aha (aha@cs.jhu.edu)

These data have been taken from the UCI Repository Of Machine Learning Databases at

and were converted to R format by Evgenia Dimitriadou.

References

1. Wolberg,W.H., \& Mangasarian,O.L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the National Academy of Sciences, 87, 9193-9196.
- Size of data set: only 369 instances (at that point in time)
- Collected classification results: 1 trial only
- Two pairs of parallel hyperplanes were found to be consistent with 50% of the data
- Accuracy on remaining 50% of dataset: 93.5%
- Three pairs of parallel hyperplanes were found to be consistent with 67% of data
- Accuracy on remaining 33% of dataset: 95.9%

2. Zhang,J. (1992). Selecting typical instances in instance-based learning. In Proceedings of the Ninth International Machine Learning Conference (pp. 470-479). Aberdeen, Scotland: Morgan Kaufmann.
- Size of data set: only 369 instances (at that point in time)
- Applied 4 instance-based learning algorithms
- Collected classification results averaged over 10 trials
- Best accuracy result:
- 1-nearest neighbor: 93.7%
- trained on 200 instances, tested on the other 169
- Also of interest:
- Using only typical instances: 92.2% (storing only 23.1 instances)
- trained on 200 instances, tested on the other 169

Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.

Examples

data(BreastCancer)
summary(BreastCancer)

mlbench

Machine Learning Benchmark Problems

v2.1-3
GPL-2
Authors
Friedrich Leisch and Evgenia Dimitriadou.
Initial release
2021-01-21

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.