Olive oil data
This data set represents eight chemical measurements on different specimen of olive oil produced in various regions in Italy (northern Apulia,
southern Apulia, Calabria, Sicily, inland Sardinia and coast Sardinia, eastern and western Liguria, Umbria) and further classifiable
into three macro-areas: Centre-North, South, Sardinia.
The data set is used to evaluate the pdfCluster
ability of recunstructing the macro-area membership.
data(oliveoil)
This data frame contains 572 rows, each corresponding to a different specimen of olive oil, and 10 columns. The first and the second column correspond to the macro-area and the region of origin of the olive oils respectively; here, the term "region" refers to a geographical area and only partially to administrative borders. Columns 3-10 represent the following eight chemical measurements on the acid components for the oil specimens: palmitic, palmitoleic, stearic, oleic, linoleic, linolenic, arachidic, eicosenoic.
Since the raw data are of compositional nature, ideally totalling 10000, some preliminary transformations of data are advisable. In particular, Azzalini and Torelli (2007) adopt an additive log-ratio transformation (ALR). If x_j denotes the j^{th} chemical measurement (j=1,…,8), the ALR transformation is y_j= \log x_j/x_k, j\neq k, where k is an arbitrary but fixed variable. However, in this data set, the raw data do not always sum up exactly to 10000, because of measurement errors. Moreover, some 0's are present in the data, corresponding to measurements below the instrument sensitivity level. Therefore, it is suggested to add 1 to all raw data and normalize them by dividing each entry by the corresponding row sum ∑_j (x_j+1).
Forina, M., Lanteri, S. Armanino, C., Casolino, C., Casale, M., Oliveri, P. (2008). V-PARVUS. An Extendible Package of programs for explorative data analysis, classification and regression analysis. Dip. Chimica e Tecnologie Farmaceutiche ed Alimentari, Università di Genova.
Azzalini A., Torelli N. (2007). Clustering via nonparametric density estimation. Statistics and Computing, 17, 71-80.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.