Recent developments in sensor technology place affordable NIR spectrometers in the hands of end-consumers. Some manufacturers promote their products with the promise of a quick and easy method to obtain nutritional information about all kinds of foodstuffs. Basis for such an application are data-driven methods that require large, representative data sets to derive their models. Collection of such a data set from scratch is a very time-consuming process that is hard to crowd-source because the data needs to meet certain quality standards. Imaging devices, on the other hand, allow for rapid acquisition of high quality data. Similarly, there already exist large-scale databases of spectral signatures. This raises the question whether hyperspectral image recordings and existing databases can be combined to serve as a basis for the data-driven methods. The underlying question is: Is it possible to compare spectral measurements from one source to measurements from another source?
To approach this question, we conducted a comparative study with hyperspectral data from several research groups that operate in the field of spectral data analysis. Each group obtained measurements from a defined set of various food samples and reported the raw data back to us. We then homogenized the data for the subsequent analysis using standard data processing techniques. The processed dataset is made available for the broader research community and can be downloaded from this website. Exploratory data analysis indicates that quantitative chemometrics across sensors is not yet possible. However, application specific classification experiments show that classification across sensors is possible within reasonable limits. Overall, our study shows the need for both a standard procedure for data acquisition and sharing as well as a common, sensor-independent representation of spectral signatures.