Publications
For an up-to-date list of my peer-reviewed publications and preprints, see:
Google Scholar
Showing 26 of 26 publications
1
Reproducibility and reusability of the results of data-based modeling studies are essential. Yet, there has been—so far—no broadly supported format for the specification of parameter estimation problems in systems biology. Here, we introduce PEtab, a format which facilitates the specification of parameter estimation problems using Systems Biology Markup Language (SBML) models and a set of tab-separated value files describing the observation model and experimental data as well as parameters to be estimated. We already implemented PEtab support into eight well-established model simulation and parameter estimation toolboxes with hundreds of users in total. We provide a Python library for validation and modification of a PEtab problem and currently 20 example parameter estimation problems based on recent studies.
2
AMICI: high-performance sensitivity analysis for large ordinary differential equation models
Abstract Summary Ordinary differential equation models facilitate the understanding of cellular signal transduction and other biological processes. However, for large and comprehensive models, the computational cost of simulating or calibrating can be limiting. AMICI is a modular toolbox implemented in C++/Python/MATLAB that provides efficient simulation and sensitivity analysis routines tailored for scalable, gradient-based parameter estimation and uncertainty quantification. Availabilityand implementation AMICI is published under the permissive BSD-3-Clause license with source code publicly available on https://github.com/AMICI-dev/AMICI. Citeable releases are archived on Zenodo. Supplementary information Supplementary data are available at Bioinformatics online.
3
Benchmarking of numerical integration methods for ODE models of biological systems
Ordinary differential equation (ODE) models are a key tool to understand complex mechanisms in systems biology. These models are studied using various approaches, including stability and bifurcation analysis, but most frequently by numerical simulations. The number of required simulations is often large, e.g., when unknown parameters need to be inferred. This renders efficient and reliable numerical integration methods essential. However, these methods depend on various hyperparameters, which strongly impact the ODE solution. Despite this, and although hundreds of published ODE models are freely available in public databases, a thorough study that quantifies the impact of hyperparameters on the ODE solver in terms of accuracy and computation time is still missing. In this manuscript, we investigate which choices of algorithms and hyperparameters are generally favorable when dealing with ODE models arising from biological processes. To ensure a representative evaluation, we considered 142 published models. Our study provides evidence that most ODEs in computational biology are stiff, and we give guidelines for the choice of algorithms and hyperparameters. We anticipate that our results will help researchers in systems biology to choose appropriate numerical methods when dealing with ODE models.
4
Prevalence and Risk Factors of Infection in the Representative COVID-19 Cohort Munich
Given the large number of mild or asymptomatic SARS-CoV-2 cases, only population-based studies can provide reliable estimates of the magnitude of the pandemic. We therefore aimed to assess the sero-prevalence of SARS-CoV-2 in the Munich general population after the first wave of the pandemic. For this purpose, we drew a representative sample of 2994 private households and invited household members 14 years and older to complete questionnaires and to provide blood samples. SARS-CoV-2 seropositivity was defined as Roche N pan-Ig ≥ 0.4218. We adjusted the prevalence for the sampling design, sensitivity, and specificity. We investigated risk factors for SARS-CoV-2 seropositivity and geospatial transmission patterns by generalized linear mixed models and permutation tests. Seropositivity for SARS-CoV-2-specific antibodies was 1.82% (95% confidence interval (CI) 1.28–2.37%) as compared to 0.46% PCR-positive cases officially registered in Munich. Loss of the sense of smell or taste was associated with seropositivity (odds ratio (OR) 47.4; 95% CI 7.2–307.0) and infections clustered within households. By this first population-based study on SARS-CoV-2 prevalence in a large German municipality not affected by a superspreading event, we could show that at least one in four cases in private households was reported and known to the health authorities. These results will help authorities to estimate the true burden of disease in the population and to take evidence-based decisions on public health measures.
5
pyPESTO: a modular and scalable tool for parameter estimation for dynamic models
Mechanistic models are important tools to describe and understand biological processes. However, they typically rely on unknown parameters, the estimation of which can be challenging for large and complex systems. pyPESTO is a modular framework for systematic parameter estimation, with scalable algorithms for optimization and uncertainty quantification. While tailored to ordinary differential equation problems, pyPESTO is broadly applicable to black-box parameter estimation problems. Besides own implementations, it provides a unified interface to various popular simulation and inference methods. Availability and implementation pyPESTO is implemented in Python, open-source under a 3-Clause BSD license. Code and documentation are available on GitHub (https://github.com/icb-dcm/pypesto).
6
BayesFlow: Amortized Bayesian workflows with neural networks
Modern Bayesian inference involves a mixture of computational techniques for estimating, validating, and drawing conclusions from probabilistic models as part of principled workflows for data analysis. Typical problems in Bayesian workflows are the approximation of intractable posterior distributions for diverse model types and the comparison of competing models of the same process in terms of their complexity and predictive performance. This manuscript introduces the Python library BayesFlow for simulation-based training of established neural network architectures for amortized data compression and inference. Amortized Bayesian inference, as implemented in BayesFlow, enables users to train custom neural networks on model simulations and re-use these networks for any subsequent application of the models. Since the trained networks can perform inference almost instantaneously, the upfront neural network training is quickly amortized.
7
pyABC: Efficient and robust easy-to-use approximate Bayesian computation
The Python package pyABC provides a framework for approximate Bayesian computation (ABC), a likelihood-free parameter inference method popular in many research areas. At its core, it implements a sequential Monte-Carlo (SMC) scheme, with various algorithms to adapt to the problem structure and automatically tune hyperparameters. To scale to computationally expensive problems, it provides efficient parallelization strategies for multi-core and distributed systems. The package is highly modular and designed to be easily usable. In this major update to pyABC, we implement several advanced algorithms that facilitate efficient and robust inference on a wide range of data and model types. In particular, we implement algorithms to accurately account for measurement noise, to adaptively scale-normalize distance metrics, to robustly handle data outliers, to elucidate informative data points via regression models, to circumvent summary statistics via optimal transport based distances, and to avoid local optima in acceptance threshold sequences by predicting acceptance rate curves. Further, we provide, besides previously existing support of Python and R, interfaces in particular to the Julia language, the COPASI simulator, and the PEtab standard.
8
Efficient parameterization of large-scale dynamic models based on relative measurements
Motivation Mechanistic models of biochemical reaction networks facilitate the quantitative understanding of biological processes and the integration of heterogeneous datasets. However, some biological processes require the consideration of comprehensive reaction networks and therefore large-scale models. Parameter estimation for such models poses great challenges, in particular when the data are on a relative scale. Results Here, we propose a novel hierarchical approach combining (i) the efficient analytic evaluation of optimal scaling, offset, and error model parameters with (ii) the scalable evaluation of objective function gradients using adjoint sensitivity analysis. We evaluate the properties of the methods by parameterizing a pan-cancer ordinary differential equation model (> 1000 state variables, > 4000 parameters) using relative protein, phospho-protein and viability measurements. The hierarchical formulation improves optimizer performance considerably. Furthermore, we show that this approach allows estimating error model parameters with negligible computational overhead when no experimental estimates are available, pro-viding an unbiased way to weight heterogeneous data. Overall, our hierarchical formulation is applicable to a wide range of models, and allows for the efficient parameterization of large-scale models based on heterogeneous relative measurements. Contact jan.hasenauer@helmholtz-muenchen.de Supplementary information Supplementary information are available at bioRxiv online. Supplementary code and data are available online at http://doi.org/10.5281/zenodo.2593839 and http://doi.org/10.5281/zenodo.2592186.
9
From first to second wave: follow-up of the prospective COVID-19 cohort (KoCo19) in Munich (Germany)
Background In the 2nd year of the COVID-19 pandemic, knowledge about the dynamics of the infection in the general population is still limited. Such information is essential for health planners, as many of those infected show no or only mild symptoms and thus, escape the surveillance system. We therefore aimed to describe the course of the pandemic in the Munich general population living in private households from April 2020 to January 2021. Methods The KoCo19 baseline study took place from April to June 2020 including 5313 participants (age 14 years and above). From November 2020 to January 2021, we could again measure SARS-CoV-2 antibody status in 4433 of the baseline participants (response 83%). Participants were offered a self-sampling kit to take a capillary blood sample (dry blood spot; DBS). Blood was analysed using the Elecsys ® Anti-SARS-CoV-2 assay (Roche). Questionnaire information on socio-demographics and potential risk factors assessed at baseline was available for all participants. In addition, follow-up information on health-risk taking behaviour and number of personal contacts outside the household (N = 2768) as well as leisure time activities (N = 1263) were collected in summer 2020. Results Weighted and adjusted (for specificity and sensitivity) SARS-CoV-2 sero-prevalence at follow-up was 3.6% (95% CI 2.9–4.3%) as compared to 1.8% (95% CI 1.3–3.4%) at baseline. 91% of those tested positive at baseline were also antibody-positive at follow-up. While sero-prevalence increased from early November 2020 to January 2021, no indication of geospatial clustering across the city of Munich was found, although cases clustered within households. Taking baseline result and time to follow-up into account, men and participants in the age group 20–34 years were at the highest risk of sero-positivity. In the sensitivity analyses, differences in health-risk taking behaviour, number of personal contacts and leisure time activities partly explained these differences. Conclusion The number of citizens in Munich with SARS-CoV-2 antibodies was still below 5% during the 2nd wave of the pandemic. Antibodies remained present in the majority of SARS-CoV-2 sero-positive baseline participants. Besides age and sex, potentially confounded by differences in behaviour, no major risk factors could be identified. Non-pharmaceutical public health measures are thus still important.
10
Efficient exact inference for dynamical systems with noisy measurements using sequential approximate Bayesian computation
Motivation Approximate Bayesian Computation (ABC) is an increasingly popular method for likelihood-free parameter inference in systems biology and other fields of research, since it allows analysing complex stochastic models. However, the introduced approximation error is often not clear. It has been shown that ABC actually gives exact inference under the implicit assumption of a measurement noise model. Noise being common in biological systems, it is intriguing to exploit this insight. But this is difficult in practice, since ABC is in general highly computationally demanding. Thus, the question we want to answer here is how to efficiently account for measurement noise in ABC. Results We illustrate exemplarily how ABC yields erroneous parameter estimates when neglecting measurement noise. Then, we discuss practical ways of correctly including the measurement noise in the analysis. We present an efficient adaptive sequential importance sampling based algorithm applicable to various model types and noise models. We test and compare it on several models, including ordinary and stochastic differential equations, Markov jump processes, and stochastically interacting agents, and noise models including normal, Laplace, and Poisson noise. We conclude that the proposed algorithm could improve the accuracy of parameter estimates for a broad spectrum of applications. Availability The developed algorithms are made publicly available as part of the open-source python toolbox pyABC (https://github.com/icb-dcm/pyabc). Contact jan.hasenauer@uni-bonn.de Supplementary information Supplementary information is available at bioRxiv online. Supplementary code and data are available online at http://doi.org/10.5281/zenodo.3631120.
11
Head-to-head evaluation of seven different seroassays including direct viral neutralisation in a representative cohort for SARS-CoV-2
A number of seroassays are available for SARS-CoV-2 testing; yet, head-to-head evaluations of different testing principles are limited, especially using raw values rather than categorical data. In addition, identifying correlates of protection is of utmost importance, and comparisons of available testing systems with functional assays, such as direct viral neutralisation, are needed.We analysed 6658 samples consisting of true-positives (n=193), true-negatives (n=1091), and specimens of unknown status (n=5374). For primary testing, we used Euroimmun-Anti-SARS-CoV-2-ELISA-IgA/IgG and Roche-Elecsys-Anti-SARS-CoV-2. Subsequently virus-neutralisation, GeneScriptcPass, VIRAMED-SARS-CoV-2-ViraChip, and Mikrogen-recomLine-SARS-CoV-2-IgG were applied for confirmatory testing. Statistical modelling generated optimised assay cut-off thresholds. Sensitivity of Euroimmun-anti-S1-IgA was 64.8%, specificity 93.3% (manufacturer’s cut-off); for Euroimmun-anti-S1-IgG, sensitivity was 77.2/79.8% (manufacturer’s/optimised cut-offs), specificity 98.0/97.8%; Roche-anti-N sensitivity was 85.5/88.6%, specificity 99.8/99.7%. In true-positives, mean and median Euroimmun-anti-S1-IgA and -IgG titres decreased 30/90 days after RT-PCR-positivity, Roche-anti-N titres decreased significantly later. Virus-neutralisation was 80.6% sensitive, 100.0% specific (≥1:5 dilution). Neutralisation surrogate tests (GeneScriptcPass, Mikrogen-recomLine-RBD) were >94.9% sensitive and >98.1% specific. Optimised cut-offs improved test performances of several tests. Confirmatory testing with virus-neutralisation might be complemented with GeneScriptcPassTM or recomLine-RBD for certain applications. Head-to-head comparisons given here aim to contribute to the refinement of testing strategies for individual and public health use.
12
HCV spread kinetics reveal varying contributions of transmission modes to infection dynamics
Hepatitis C virus (HCV) is capable of spreading within a host by two different transmission modes: cell-free and cell-to-cell. Although viral dissemination and diffusion of viral particles facilitates the infection of distant cells, direct cell-to-cell transmission to uninfected neighboring cells is thought to shield the virus from immune recognition. However, the contribution of each of these transmission mechanisms to HCV spread is unknown. To dissect the contribution of these different transmission modes to HCV spread, we measured HCV lifecycle kinetics and used an in vitro spread assay to monitor HCV spread kinetics after low multiplicity of infection in the absence and presence of a neutralizing antibody that blocks cell-free spread. By analyzing these data with a spatially-explicit mathematical model that describes viral spread on a single-cell level, we quantified the contribution of cell-free and cell-to-cell spread to the overall infection dynamics and show that both transmission modes act synergistically to enhance the spread of infection. Thus, the simultaneous occurrence of both transmission modes likely represents an advantage for HCV that may contribute to the efficient establishment of chronic infection. Notably, the relative contribution of each viral transmission mode appeared to vary dependent on different experimental conditions and suggests that viral spread is optimized according to the environment. Together, our analyses provide insight into the transmission dynamics of HCV and reveal how different transmission modes impact each other. Importance Hepatitis C Virus can spread within a host by diffusing viral particles or direct cell-to-cell transfer of viral material between infected and uninfected cells. To which extend these cell-free and cell-to-cell transmission modes contribute to HCV spread, establishment of chronicity and antiviral escape is still unknown. By combining in vitro experimental HCV spread data with a multi-scale mathematical model we have disentangled the contribution and interplay of cell-free and cell-to-cell transmission modes during HCV infection. Our analysis revealed synergistic effects between the two transmission modes, with the relative contribution of each transmission mode varying dependent on the experimental conditions. This highlights the adaptability of the virus and suggests that transmission modes might be optimized dependent on the environment, which could contribute to viral persistence.
13
14
Missing data in amortized simulation-based neural posterior estimation
Amortized simulation-based neural posterior estimation provides a novel machine learning based approach for solving parameter estimation problems. It has been shown to be computationally efficient and able to handle complex models and data sets. Yet, the available approach cannot handle the in experimental studies ubiquitous case of missing data, and might provide incorrect posterior estimates. In this work, we discuss various ways of encoding missing data and integrate them into the training and inference process. We implement the approaches in the BayesFlow methodology, an amortized estimation framework based on invertible neural networks, and evaluate their performance on multiple test problems. We find that an approach in which the data vector is augmented with binary indicators of presence or absence of values performs the most robustly. Accordingly, we demonstrate that amortized simulation-based inference approaches are applicable even with missing data, and we provide a guideline for their handling, which is relevant for a broad spectrum of applications.
15
Inferring the effect of interventions on COVID-19 transmission networks
Countries around the world implement nonpharmaceutical interventions (NPIs) to mitigate the spread of COVID-19. Design of efficient NPIs requires identification of the structure of the disease transmission network. We here identify the key parameters of the COVID-19 transmission network for time periods before, during, and after the application of strict NPIs for the first wave of COVID-19 infections in Germany combining Bayesian parameter inference with an agent-based epidemiological model. We assume a Watts–Strogatz small-world network which allows to distinguish contacts within clustered cliques and unclustered, random contacts in the population, which have been shown to be crucial in sustaining the epidemic. In contrast to other works, which use coarse-grained network structures from anonymized data, like cell phone data, we consider the contacts of individual agents explicitly. We show that NPIs drastically reduced random contacts in the transmission network, increased network clustering, and resulted in a previously unappreciated transition from an exponential to a constant regime of new cases. In this regime, the disease spreads like a wave with a finite wave speed that depends on the number of contacts in a nonlinear fashion, which we can predict by mean field theory.
16
17
FitMultiCell: simulating and parameterizing computational models of multi-scale and multi-cellular processes
Motivation Biological tissues are dynamic and highly organized. Multi-scale models are helpful tools to analyze and understand the processes determining tissue dynamics. These models usually depend on parameters that need to be inferred from experimental data to achieve a quantitative understanding, to predict the response to perturbations, and to evaluate competing hypotheses. However, even advanced inference approaches such as Approximate Bayesian Computation (ABC) are difficult to apply due to the computational complexity of the simulation of multi-scale models. Thus, there is a need for a scalable pipeline for modeling, simulating, and parameterizing multi-scale models of multi-cellular processes. Results Here, we present FitMultiCell, a computationally efficient and user-friendly open-source pipeline that can handle the full workflow of modeling, simulating, and parameterizing for multi-scale models of multi-cellular processes. The pipeline is modular and integrates the modeling and simulation tool Morpheus and the statistical inference tool pyABC. The easy integration of high-performance infrastructure allows to scale to computationally expensive problems. The introduction of a novel standard for the formulation of parameter inference problems for multi-scale models additionally ensures reproducibility and reusability. By applying the pipeline to multiple biological problems, we demonstrate its broad applicability, which will benefit in particular image-based systems biology. Availability FitMultiCell is available open-source at https://gitlab.com/fitmulticell/fit. Contact jan.hasenauer@uni-bonn.de Supplementary information Supplementary data are available at https://doi.org/10.5281/zenodo.7646287 online.
18
Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation
Calibrating model parameters on heterogeneous data can be challenging and inefficient. This holds especially for likelihood-free methods such as approximate Bayesian computation (ABC), which rely on the comparison of relevant features in simulated and observed data and are popular for otherwise intractable problems. To address this problem, methods have been developed to scale-normalize data, and to derive informative low-dimensional summary statistics using inverse regression models of parameters on data. However, while approaches only correcting for scale can be inefficient on partly uninformative data, the use of summary statistics can lead to information loss and relies on the accuracy of employed methods. In this work, we first show that the combination of adaptive scale normalization with regression-based summary statistics is advantageous on heterogeneous parameter scales. Second, we present an approach employing regression models not to transform data, but to inform sensitivity weights quantifying data informativeness. Third, we discuss problems for regression models under non-identifiability, and present a solution using target augmentation. We demonstrate improved accuracy and efficiency of the presented approach on various problems, in particular robustness and wide applicability of the sensitivity weights. Our findings demonstrate the potential of the adaptive approach. The developed algorithms have been made available in the open-source Python toolbox pyABC.
19
Robust adaptive distance functions for approximate Bayesian inference on outlier-corrupted data
Approximate Bayesian Computation (ABC) is a likelihood-free parameter inference method for complex stochastic models in systems biology and other research areas. While conceptually simple, its practical performance relies on the ability to efficiently compare relevant features in simulated and observed data via distance functions. Complications can arise particularly from the presence of outliers in the data, which can severely impair the inference. Thus, robust methods are required that provide reliable estimates also from outlier-corrupted data. We illustrate how established ABC distance functions are highly sensitive to outliers, and can in practice yield erroneous or highly uncertain parameter estimates and model predictions. We introduce self-tuned outlier-insensitive distance functions, based on a popular adaptive distance weighting concept, complemented by a simulation-based online outlier detection and downweighting routine. We evaluate and compare the presented methods on six test models covering different model types, problem features, and outlier scenarios. Our evaluation demonstrates substantial improvements on outlier-corrupted data, while giving at least comparable performance on outlier-free data. The developed methods have been made available as part of the open-source Python package pyABC (https://github.com/icb-dcm/pyabc).
20
An amortized approach to non-linear mixed-effects modeling based on neural posterior estimation
Non-linear mixed-effects models are a powerful tool for studying heterogeneous populations in various fields, including biology, medicine, economics, and engineering. Here, the aim is to find a distribution over the parameters that describe the whole population using a model that can generate simulations for an individual of that population. However, fitting these distributions to data is computationally challenging if the description of individuals is complex and the population is large. To address this issue, we propose a novel machine learning-based approach: We exploit neural density estimation based on conditional normalizing flows to approximate individual-specific posterior distributions in an amortized fashion, thereby allowing for efficient inference of population parameters. Applying this approach to problems from cell biology and pharmacology, we demonstrate its unseen flexibility and scalability to large data sets compared to established methods.
21
A Serology Strategy for Epidemiological Studies Based on the Comparison of the Performance of Seven Different Test Systems
Background. Serosurveys are essential to understand SARS-CoV-2 exposure and enable population-level surveillance, but currently available tests need further in-depth evaluation. We aimed to identify testing-strategies by comparing seven seroassays in a population-based cohort. Methods. We analysed 6,658 samples consisting of true-positives (n=193), true-negatives (n=1,091), and specimens of unknown status (n=5,374). For primary testing, we used Euroimmun-Anti-SARS-CoV-2-ELISA-IgA/IgG and Roche-Elecsys-Anti-SARS-CoV-2; and virus-neutralisation, GeneScript(R)cPassTM, VIRAMED-SARS-CoV-2-ViraChip(R), and Mikrogen-recomLine-SARS-CoV-2-IgG, including common-cold CoVs, for confirmatory testing. Statistical modelling generated optimised assay cut-off-thresholds. Findings. Sensitivity of Euroimmun-anti-S1-IgA was 64.8%, specificity 93.3%; for Euroimmun-anti-S1-IgG, sensitivity was 77.2/79.8% (manufacturer's/optimised cut-offs), specificity 98.0/97.8%; Roche-anti-N sensitivity was 85.5/88.6%, specificity 99.8/99.7%. In true-positives, mean and median titres remained stable for at least 90-120 days after RT-PCR-positivity. Of true-positives with positive RT-PCR (<30 days), 6.7% did not mount detectable seroresponses. Virus-neutralisation was 73.8% sensitive, 100.0% specific (1:10 dilution). Neutralisation surrogate tests (GeneScript(R)cPassTM, Mikrogen-recomLine-RBD) were >94.9% sensitive, >98.1% specific. Seasonality had limited effects; cross-reactivity with common-cold CoVs 229E and NL63 in SARS-CoV-2 true-positives was significant. Conclusion. Optimised cut-offs improved test performances of several tests. Non-reactive serology in true-positives was uncommon. For epidemiological purposes, confirmatory testing with virus-neutralisation may be replaced with GeneScript(R)cPassTM or recomLine-RBD. Head-to-head comparisons given here aim to contribute to the refinement of testing-strategies for individual and public health use.
22
The representative COVID-19 cohort Munich (KoCo19): from the beginning of the pandemic to the Delta virus variant
Background Population-based serological studies allow to estimate prevalence of SARS-CoV-2 infections despite a substantial number of mild or asymptomatic disease courses. This became even more relevant for decision making after vaccination started. The KoCo19 cohort tracks the pandemic progress in the Munich general population for over two years, setting it apart in Europe. Methods Recruitment occurred during the initial pandemic wave, including 5313 participants above 13 years from private households in Munich. Four follow-ups were held at crucial times of the pandemic, with response rates of at least 70%. Participants filled questionnaires on socio-demographics and potential risk factors of infection. From Follow-up 2, information on SARS-CoV-2 vaccination was added. SARS-CoV-2 antibody status was measured using the Roche Elecsys® Anti-SARS-CoV-2 anti-N assay (indicating previous infection) and the Roche Elecsys® Anti-SARS-CoV-2 anti-S assay (indicating previous infection and/or vaccination). This allowed us to distinguish between sources of acquired antibodies. Results The SARS-CoV-2 estimated cumulative sero-prevalence increased from 1.6% (1.1-2.1%) in May 2020 to 14.5% (12.7-16.2%) in November 2021. Underreporting with respect to official numbers fluctuated with testing policies and capacities, becoming a factor of more than two during the second half of 2021. Simultaneously, the vaccination campaign against the SARS-CoV-2 virus increased the percentage of the Munich population having antibodies, with 86.8% (85.5-87.9%) having developed anti-S and/or anti-N in November 2021. Incidence rates for infections after (BTI) and without previous vaccination (INS) differed (ratio INS/BTI of 2.1, 0.7-3.6). However, the prevalence of infections was higher in the non-vaccinated population than in the vaccinated one. Considering the whole follow-up time, being born outside Germany, working in a high-risk job and living area per inhabitant were identified as risk factors for infection, while other socio-demographic and health-related variables were not. Although we obtained significant within-household clustering of SARS-CoV-2 cases, no further geospatial clustering was found. Conclusions Vaccination increased the coverage of the Munich population presenting SARS-CoV-2 antibodies, but breakthrough infections contribute to community spread. As underreporting stays relevant over time, infections can go undetected, so non-pharmaceutical measures are crucial, particularly for highly contagious strains like Omicron.
23
yaml2sbml: Human-readable and-writable specification of ODE models and their conversion to SBML
Ordinary differential equations (ODE) models are used throughout natural sciences to describe dynamic processes. In systems biology, ODEs are mostly stored and exchanged using the Systems Biology Markup Language (SBML), a widely adopted community standard based on XML. The Parameter Estimation table (PEtab) format extends SBML to parameter estimation problems. A large number of software tools support simulation of SBML models and parameter estimation for PEtab problems. Specifying ODE models in SBML and parameter estimation problems in PEtab provides access to these tools. However, SBML is considered to be neither human-readable nor human-writable. An easy-to-use approach to construct the SBML/PEtab models tailored to ODE models will facilitate model generation. In this contribution, we present yaml2sbml , a Python tool for converting ODE models specified in an easy-to-read and -write YAML file into SBML/PEtab. yaml2sbml comes with a format validator for the input YAML, a command-line interface (CLI) and a model editor to: 1) create an ODE model programmatically in Python that can then be saved as SBML, PEtab or YAML
24
Massively Parallel Likelihood-Free Parameter Inference for Biological Multi-Scale Systems
The Python package pyABC provides a framework for approximate Bayesian computation (ABC), a likelihood-free parameter inference method popular in many research areas. At its core, it implements a sequential Monte-Carlo (SMC) scheme, with various algorithms to adapt to the problem structure and automatically tune hyperparameters. To scale to computationally expensive problems, it provides efficient parallelization strategies for multi-core and distributed systems. The package is highly modular and designed to be easily usable. In this major update to pyABC, we implement several advanced algorithms that facilitate efficient and robust inference on a wide range of data and model types. In particular, we implement algorithms to accurately account for measurement noise, to adaptively scale-normalize distance metrics, to robustly handle data outliers, to elucidate informative data points via regression models, to circumvent summary statistics via optimal transport based distances, and to avoid local optima in acceptance threshold sequences by predicting acceptance rate curves. Further, we provide, besides previously existing support of Python and R, interfaces in particular to the Julia language, the COPASI simulator, and the PEtab standard.
25
26
A wall-time minimizing parallelization strategy for approximate Bayesian computation
Approximate Bayesian Computation (ABC) is a widely applicable and popular approach to estimating unknown parameters of mechanistic models. As ABC analyses are computationally expensive, parallelization on high-performance infrastructure is often necessary. However, the existing parallelization strategies leave computing resources unused at times and thus do not optimally leverage them yet. We present look-ahead scheduling, a wall-time minimizing parallelization strategy for ABC Sequential Monte Carlo algorithms, which avoids idle times of computing units by preemptive sampling of subsequent generations. This allows to utilize all available resources. The strategy can be integrated with e.g. adaptive distance function and summary statistic selection schemes, which is essential in practice. Our key contribution is the theoretical assessment of the strategy of preemptive sampling and the proof of unbiasedness. Complementary, we provide an implementation and evaluate the strategy on different problems and numbers of parallel cores, showing speed-ups of typically 10-20% and up to 50% compared to the best established approach, with some variability. Thus, the proposed strategy allows to improve the cost and run-time efficiency of ABC methods on high-performance infrastructure.