Annotating the Behavior of Scientific Modules Using Data Examples: A Practical Approach

Authors: Khalid Belhajjame
Year: 2014
Venue: EDBT
Product of the Action: No

A major issue that arises when designing scientific experiments (i.e., workflows) is that of identifying the modules (which are often “black boxes”), that are suitable for performing the steps of the experiment. To assist scientists in the task of identifying suitable modules, semantic annotations have been proposed and used to describe scientific modules. Different facets of the module can be described using semantic annotations. Our experience with scientists from modern sciences such as bioinformatics, biodiversity and astronomy, however, suggests that most of semantic annotations that are available are confined to the description of the domain of input and output parameters of modules. Annotations specifying the behavior of the modules, as to the tasks they play, are rarely specified. To address this issue, we argue in this paper that data examples are an intuitive and effective means for understanding the behavior of scientific modules. We present a heuristic for automatically generating data examples that annotate scientific modules without relying on the existence of the module specifications, and show through an empirical evaluation that uses real-world scientific modules the effectiveness of the heuristic proposed. The data examples generated can be utilized in a range of scientific module management operations. To demonstrate this, we present the results of two real-world exercises that show that: (i) Data examples are an intuitive means for human users to understand the behavior of scientific modules, and that (ii) data examples are an effective ingredient for matching scientific modules.