STATIS, a three-way method for data analysis. Application to environmental data

https://doi.org/10.1016/j.chemolab.2004.03.005Get rights and content

Abstract

The present paper deals with the data exploration of three-way environmental data with the use of “Structuration des Tableaux A Trois Indices de la Statistique” (STATIS). The performance of the method is compared with Tucker3 and PARAFAC2, two more commonly used methods in chemometric N-way data analysis. The features of STATIS are demonstrated on real data sets. Due to its robust properties, lack of special requirements for data preprocessing and ability to deal with sets of two-way tables (matrices) that do not have the same dimension for columns or rows, STATIS appears as a very attractive three-way exploratory tool.

Introduction

Environmental data sets can be multidimensional and have a complex structure. Usually, they are collected as sets (tables) of objects and variables obtained under different experimental circumstances or for various sampling periods, etc. Putting all tables together results in data with three-way structure [1]. An example for such data is when in samples collected at different sampling sites, the concentrations of several chemical components are measured during certain period of time (sites×parameters×time). There are many tools helping to explore and interpret three- or higher way structure of the data. The most popular ones in chemometrics are PARAFAC [2], [3], PARAFAC2 [4], [5] and Tucker3 [6], [7]. A software tool, called CUBATCH, for applying these methods was recently presented [8].

The aim of this paper is to present a method, called STATIS [9], [10], [11], which can also be applied for exploratory analysis of three-way data sets, and to compare its performance with N-way methods for the analysis of environmental data. The abbreviation STATIS stands for “Structuration des Tableaux A Trois Indices de la Statistique”, which could be translated in English as “structuring three-way data sets in statistics”.

Section snippets

STATIS

STATIS is an exploratory tool for three-way data analysis. Its main idea is to compare different data tables (matrices) obtained under various experimental conditions, but containing the same number of rows and/or columns [12]. By analogy to N-way methods, the three-way data set is denoted by X with dimensions I, J and K, corresponding to the number of rows, columns and tables, respectively [1]. Thus, an element of X is xijk, where i=1, …, I, j=1, …, J and k=1, …, K.

Each direction is called a

PARAFAC2

In some cases, the slabs (tables) constituting X do not have the same numbers of rows or columns. The N-way method, which can deal with that problem, is PARAFAC2. The objective of the method is to model new Y data containing the covariance matrices of the set of two-way Xk matrices of X. If Xk matrices of X are arranged as frontal slabs and have different columns dimension, the new Y has dimension I×I×K. After unfolding of Y as Y(K×II), the PARAFAC2 model can be written as follows:Y(K×II)=(C|⊗|C

Data

The data set consists of the annual mean concentrations of nine chemical components (H+, NH4+, Na+, K+, Ca2+, Mg2+, Cl, NO3 and SO42−) monitored during 12 years at six sampling sites (Reutte, Kufstein, Innervillgraten, Sonnblick, Nasswald and Lobau), 15 years at Haunsberg and Werfenweng, 10 years at Litschau and Lunz, 9 years at the Nassfeld site [18]. The data do not have perfect trilinear structure, since for one mode (years) the dimensions of the data are not the same for all data tables.

Results and discussion

First, STATIS is performed on the non-preprocessed data set with perfect trilinear structure. Each data table Xk (I×Jk) contains different sampling sites (I=11), characterized by chemical components (J=9) measured in a certain year k. The K tables are arranged as frontal slabs in X. PCA of the RV matrix reflecting the similarity between tables is presented in Fig. 2a.

Tables 1 (year 1990), 2 (year 1991), 3 (year 1992) and 6 (year 1995) are the most different from the mean covariance (see Fig. 2a

Conclusions

STATIS is a three-way method for exploratory data analysis. It is best understood starting from an unfolded two-way table. For an I×J×K data set this is obtained by juxtaposition of K (I×J) two-way tables. To analyze the resulting table by PCA, the variance–covariance matrix is used. This is normally obtained by summing the K variance–covariance matrices of the K individual tables constituting the unfolded table. STATIS first weights the variance–covariance matrices of each table according to

References (18)

There are more references available in the full text version of this article.

Cited by (47)

  • Toward automated chromatographic fingerprinting: A non-alignment approach to gas chromatography mass spectrometry data

    2016, Analytica Chimica Acta
    Citation Excerpt :

    The main idea to master the distortion of bilinear/trilinear structure of the data due to shifting peaks was the consideration of a mathematical transformation of pieces (segments) of the chromatograms using sums of squares and cross product (SSCP) matrices. SSCP matrices are positive, squared and symmetric, similar to variance-covariance matrix [31], which are utilised for instance in PARAFAC2, STATIS and the calculation of RV-coefficients [32,19,33–35]. Particularly the indirect fitting algorithm for PARAFAC2 [36] served as major inspiration for the development of the new approach.

View all citing articles on Scopus
View full text