Experimental data

1. Introduction

After the execution of an experiment, the experimented should have collected a set of information:

raw data, as provided by the sensors used in the experiment;
pre-processed data, according to the format mentioned in the Eurobench data format;
a report document, providing informal description of the test conducted.

We focus here on the structure and organization of the pre-processed data, as they are the input information that will be used to compute automatically the Performance Indicator metrics.

2. Datafile naming

Although each centre may have a well-defined data organization, we are required to define a Eurobench-compatible structure, to make sure the automatic scoring mechanism can handle the data submitted for benchmarking.

The structure proposed only constraints the pre-processed data, as the other ones (raw data, report document) are not used within the benchmarking process.

Generally speaking pre-processed data should follow the following pattern:

subject_X_cond_Y_run_Z_[type].csv

Such contextualized format provides the following information:

[type] is a string related to the type of information stored in the file. This is the root name of the file. All possible types are described in the Eurobench data format.
X, Y, Z are integer respectively associated to the number of subjects involved, the number of different conditions being tested, and the number of repetitions per condition.
The subject / cond / run namespaces are only used when it makes sense:
- An experiment involving a unique subject (or only an humanoid) would have have pre-processed files following the pattern: cond_Y_run_Z_[type].csv.
- An experiment with various subjects but unique condition would follow the pattern subject_X_run_Z_[type].csv.
- An experiment in which there is a single repetition of a trial would have datafiles following the pattern : subject_X_cond_Y_[type].csv.
- At the extreme, an experiment involving a unique subject (or an humanoid), with single condition and no repetition will be described by files with pattern: [type].csv.

Unless if specified different, all datafile submitted for benchmarking will have a name using this pattern.

Aside the pre-processed data file, we can foresee the following additional required files:

files containing information (anthropometric and or other aspects) about the subject involved: subject_i_info.yaml.
files containing information on the robotic device used: robot_info.yaml (urdf format may be used, the protocol should state it).
files containing the settings of the controlled variables: condition_i.yaml.

These files can be repeated using the following patterns:

user information: if a single human subject is involved, the data file can be named subject_info.yaml. If N subject are involved, then we should provide N files following the pattern: subject_i_info.yaml, where i < N identifies the subject.
robot device information: we assume only one robotic device is used per experiments, so that the file can be directly named robot_info.yaml. Any change of the control setting of the robot should be mentioned in the condition file.
Condition setting file: we identified three cases:
- if no parameter is set, no file is required.
- if the condition setting contains information specific to the user (like an adjustment of a pushing force that may differ per subject), then we should get the : subject_i_condition_j.yaml, where j should be an integer referring to the Y different condition setting being considered.
- if the conditional settings are common for all subjects, then we should get only Y different files, named condition_j.yaml.

Such model gives us a contextualized filename approach: based on the filename, we can deduce all the context of a data file: its content, the subject involved, the condition setting used, the repetition id. A contextualized filename is unique in the complete experiment.

2.1. illustration

Case 1: An experiment is conducted with 4 human subjects and a robotic device. The experiment involves 3 condition settings (for instance, same testbed but (1): without the robotic device, (2) with the robotic device totally passive, and (3) with the robotic device in active mode). These 3 settings are common across subjects. Each setting has been tested and repeated 4 times, except for the second subject that could only perform two repetitions. The files expected are the following:

subject_{1,2,3,4}_info.yaml
robot_info.urdf
condition_{1,2,3}.yaml
subject_{1,3,4}_cond_{1,2,3}_run_{1,2,3,4}_[type].csv
subject_{1,3,4}_cond_{1,2,3}_run_{1,2}_[type].csv

In the two last groups of files, the [type] string should refer to one of the pre-processed format described in X.

Case 2: Another experiment involves 2 subjects and a robotic device. The experiment contains 3 different condition settings. One of the configuration parameters has to be adjusted to the subject involved. Each condition is only repeated once.

subject_{1,2}_info.yaml
robot_info.urdf
subject_{1,2}_condition_{1,2,3}.yaml
subject_{1,2}_cond_{1,2,3}_[type].csv

Case 3: an experiment compares two different parameter settings of an humanoid. The robot has been asked to repeat three times each configuration.

robot_info.urdf
condition_{1,2}.yaml
cond_{1,2}_run_{1,2,3}_[type].csv

3. Data file organization

Given the previous filename pattern, the simplest file organization is to place all the data-files into a single folder.

For personal matters the experimenter can prefer organizing the data into folders. We envision two possibilities:

The contextualized filename pattern is maintained across folder: this means that by copying all files in folders into a single folder, and keeping their name, we would get all data still organized following the previous filename pattern. This means the folder organization does not affect the naming of the file, and the possibility of understanding the context of a file given its name.
The filename is simplified according to the folder organisation. In that case, the contextualized filename could be obtained using the folder hierarchy.

We propose focusing on the first approach for now.