Overview
Model-based Design of Experiments (MBDoE) is a technique to maximize the information
gain from experiments by directly using science-based models with physically meaningful
parameters. It is one key component in the model calibration and uncertainty
quantification workflow shown below:
The parameter estimation, uncertainty analysis, and MBDoE are
combined into an iterative framework to select, refine, and calibrate science-based
mathematical models with quantified uncertainty. Currently, Pyomo.DoE focuses on
increasing parameter precision.
Pyomo.DoE provides the exploratory analysis and MBDoE capabilities to the
Pyomo ecosystem. The user provides one Pyomo model, a set of parameter nominal values,
the allowable design spaces for design variables, and the postulated observation error model.
During exploratory analysis, Pyomo.DoE checks experimental design conditions that will provide
the most informative experimental outputs (i.e., data) to estimate the model parameters.
Parameter estimation packages such as Parmest can perform
parameter estimation using the available data to infer values for parameters,
and facilitate an uncertainty analysis to approximate the parameter covariance matrix.
If the parameter uncertainties are sufficiently small (i.e., at a level acceptable to the user),
the workflow terminates and returns the final model with quantified parametric uncertainty.
If not, MBDoE recommends optimized experimental conditions to generate new data
that will maximize information gain and eventually reduce parameter uncertainty.
Below is an overview of the type of optimization models Pyomo.DoE can accommodate:
Pyomo.DoE is suitable for optimization models of continuous variables
Pyomo.DoE can handle equality constraints defining state variables
Pyomo.DoE supports (Partial) Differential-Algebraic Equations (PDAE) models via Pyomo.DAE
Pyomo.DoE also supports models with only algebraic equations
The general form of a DAE problem that can be passed into Pyomo.DoE is shown below:
\[\begin{array}{l}
\dot{\mathbf{x}}(t) = \mathbf{f}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}}, \boldsymbol{\theta}) \\
\mathbf{g}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\theta})=\mathbf{0} \\
\mathbf{y} =\mathbf{h}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\theta}) \\
\mathbf{f}^{\mathbf{0}}\left(\dot{\mathbf{x}}\left(t_{0}\right), \mathbf{x}\left(t_{0}\right), \mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta}\right)=\mathbf{0} \\
\mathbf{g}^{\mathbf{0}}\left( \mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta}\right)=\mathbf{0}\\
\mathbf{y}^{\mathbf{0}}\left(t_{0}\right)=\mathbf{h}\left(\mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta}\right)
\end{array}\]
where:
\(\boldsymbol{\theta} \in \mathbb{R}^{N_p}\) are unknown model parameters.
\(\mathbf{x} \subseteq \mathcal{X}\) are dynamic state variables which characterize trajectory of the system, \(\mathcal{X} \in \mathbb{R}^{N_x \times N_t}\).
\(\mathbf{z} \subseteq \mathcal{Z}\) are algebraic state variables, \(\mathcal{Z} \in \mathbb{R}^{N_z \times N_t}\).
\(\mathbf{u} \subseteq \mathcal{U}\) are time-varying decision variables, \(\mathcal{U} \in \mathbb{R}^{N_u \times N_t}\).
\(\overline{\mathbf{w}} \in \mathbb{R}^{N_w}\) are time-invariant decision variables.
\(\mathbf{y} \subseteq \mathcal{Y}\) are measured variables (i.e., experimental outputs), \(\mathcal{Y} \in \mathbb{R}^{N_r \times N_t}\).
\(\mathbf{f}(\cdot)\) are differential equations.
\(\mathbf{g}(\cdot)\) are algebraic equations.
\(\mathbf{h}(\cdot)\) are measurement functions.
\(\mathbf{t} \in \mathbb{R}^{N_t \times 1}\) is a union of all time sets.
Note
Parameters and design variables should be defined as Pyomo Var components
when building the model using the Experiment class so that users can use both
Parmest and Pyomo.DoE seamlessly.
Based on the above notation, the form of the MBDoE problem addressed in Pyomo.DoE is shown below:
\begin{equation}
\begin{aligned}
\underset{\boldsymbol{\varphi}}{\max} \quad & \Psi (\mathbf{M}(\boldsymbol{\hat{\theta}}, \boldsymbol{\varphi})) \\
\text{s.t.} \quad & \mathbf{M}(\boldsymbol{\hat{\theta}}, \boldsymbol{\varphi}) = \sum_r^{N_r} \sum_{r'}^{N_r} \tilde{\sigma}_{(r,r')}\mathbf{Q}_r^\mathbf{T} \mathbf{Q}_{r'} + \mathbf{M}_0 \\
& \dot{\mathbf{x}}(t) = \mathbf{f}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}}, \boldsymbol{\hat{\theta}}) \\
& \mathbf{g}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\hat{\theta}})=\mathbf{0} \\
& \mathbf{y} =\mathbf{h}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\hat{\theta}}) \\
& \mathbf{f}^{\mathbf{0}}\left(\dot{\mathbf{x}}\left(t_{0}\right), \mathbf{x}\left(t_{0}\right), \mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\hat{\theta}})\right)=\mathbf{0} \\
& \mathbf{g}^{\mathbf{0}}\left( \mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\hat{\theta}}\right)=\mathbf{0}\\
&\mathbf{y}^{\mathbf{0}}\left(t_{0}\right)=\mathbf{h}\left(\mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\hat{\theta}}\right)
\end{aligned}
\end{equation}
where:
\(\boldsymbol{\varphi}\) are design variables, which are manipulated to maximize the information content of experiments. It should consist of one or more of \(\mathbf{u}(t), \mathbf{y}^{\mathbf{0}}({t_0}),\overline{\mathbf{w}}\). With a proper model formulation, the timepoints for control or measurements \(\mathbf{t}\) can also be degrees of freedom.
\(\mathbf{M}\) is the Fisher information matrix (FIM), approximated as the inverse of the covariance matrix of parameter estimates \(\boldsymbol{\hat{\theta}}\). A large FIM indicates more information contained in the experiment for parameter estimation.
\(\mathbf{Q}\) is the dynamic sensitivity matrix, containing the partial derivatives of \(\mathbf{y}\) with respect to \(\boldsymbol{\theta}\).
\(\Psi\) is the scalar design criteria to measure the information content in the FIM.
\(\mathbf{M}_0\) is the sum of all the FIMs from previous experiments.
Pyomo.DoE provides five design criteria \(\Psi\) to measure the information in the FIM.
The covariance matrix of parameter estimates is approximated as the inverse of the FIM,
i.e., \(\mathbf{V} \approx \mathbf{M}^{-1}\).
We can use the FIM or the covariance matrix to define the design criteria.
Table 5 Pyomo.DoE design criteria
Design criterion |
Computation |
Geometrical meaning |
A-optimality |
\(\text{trace}(\mathbf{V}) = \text{trace}(\mathbf{M}^{-1})\) |
Minimizing this is equivalent to minimizing the enclosing box of the confidence ellipse |
Pseudo A-optimality |
\(\text{trace}(\mathbf{M})\) |
Maximizing this is equivalent to maximizing the dimensions of the enclosing box of the Fisher Information Matrix |
D-optimality |
\(\det(\mathbf{M}) = 1/\det(\mathbf{V})\) |
Maximizing this is equivalent to minimizing confidence-ellipsoid volume |
E-optimality |
\(\lambda_{\min}(\mathbf{M}) = 1/\lambda_{\max}(\mathbf{V})\) |
Maximizing this is equivalent to minimizing the longest axis of the confidence ellipse |
Modified E-optimality |
\(\text{cond}(\mathbf{M}) = \text{cond}(\mathbf{V})\) |
Minimizing this is equivalent to minimizing the ratio of the longest axis to the shortest axis of the confidence ellipse |
Note
A confidence ellipse is a geometric representation of the uncertainty in parameter
estimates. It is derived from the covariance matrix \(\mathbf{V}\).
In order to solve problems of the above, Pyomo.DoE implements the
2-stage stochastic program. Please see [PyomoDOE-paper] for details.