A workflow for replicable data synthesis and simulation research in mental health

A key goal of scientific studies is that they are replicable, i.e. they can be repeated with different data. A step towards this goal is reproducibility, which means enabling other researchers to repeat a study using the same data. In principle, data synthesis and simulation studies have fewer potential barriers to reproducibility and replicability than many experimental designs as they are predominantly performed within scientific computing environments. However, in practice mental health projects of this type frequently have low levels of both reproducibility and replicability. Inadequately reported and inaccessible code and data, sometimes enclosed within restrictive licensing and custodianship arrangements make it hard for other researchers to reproduce simulation studies. Even when these barriers are overcome, the implementation of these projects rarely facilitates generalisation to other source data or decision contexts by abstracting datasets to data-structures and converting (single-purpose) programs to (multiple-purpose) code libraries. The Health Services and Outcomes Research team at Orygen have sought to address these issues by developing software to comprehensively document, standardise and partially automate a workflow that spans the life-cycle of synthesis and simulation projects from initial ingest of raw data dissemination of results. This presentation outlined the key features of that workflow and demonstrate the novel toolkit we have developed with worked examples using both toy and real project data. Some of the legal, technical, funding and skills issues involved in implementing the workflow were also be discussed.

A workflow for replicable data synthesis and simulation research in mental health

Matthew Hamilton

PhD Candidate