Pyomo.DoE

Pyomo.DoE (Pyomo Design of Experiments) is a Python library for model-based design of experiments using science-based models.

Pyomo.DoE was developed by Jialu Wang and Alexander W. Dowling at the University of Notre Dame as part of the Carbon Capture Simulation for Industry Impact (CCSI2). project, funded through the U.S. Department Of Energy Office of Fossil Energy.

If you use Pyomo.DoE, please cite:

[Wang and Dowling, 2022] Wang, Jialu, and Alexander W. Dowling. “Pyomo.DOE: An open‐source package for model‐based design of experiments in Python.” AIChE Journal 68.12 (2022): e17813. https://doi.org/10.1002/aic.17813

Methodology Overview

Model-based Design of Experiments (MBDoE) is a technique to maximize the information gain of experiments by directly using science-based models with physically meaningful parameters. It is one key component in the model calibration and uncertainty quantification workflow shown below:

../../_images/flowchart.png — Fig. 2 The exploratory analysis, parameter estimation, uncertainty analysis, and MBDoE are combined into an iterative framework to select, refine, and calibrate science-based mathematical models with quantified uncertainty. Currently, Pyomo.DoE focuses on increasing parameter precision.

Pyomo.DoE provides the exploratory analysis and MBDoE capabilities to the Pyomo ecosystem. The user provides one Pyomo model, a set of parameter nominal values, the allowable design spaces for design variables, and the assumed observation error model. During exploratory analysis, Pyomo.DoE checks if the model parameters can be inferred from the postulated measurements or preliminary data. MBDoE then recommends optimized experimental conditions for collecting more data. Parameter estimation packages such as Parmest can perform parameter estimation using the available data to infer values for parameters, and facilitate an uncertainty analysis to approximate the parameter covariance matrix. If the parameter uncertainties are sufficiently small, the workflow terminates and returns the final model with quantified parametric uncertainty. If not, MBDoE recommends optimized experimental conditions to generate new data.

Below is an overview of the type of optimization models Pyomo.DoE can accommodate:

Pyomo.DoE is suitable for optimization models of continuous variables
Pyomo.DoE can handle equality constraints defining state variables
Pyomo.DoE supports (Partial) Differential-Algebraic Equations (PDAE) models via Pyomo.DAE
Pyomo.DoE also supports models with only algebraic constraints

The general form of a DAE problem that can be passed into Pyomo.DoE is shown below:

\[\begin{split}\begin{align*} & \dot{\mathbf{x}}(t) = \mathbf{f}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}}, \boldsymbol{\theta}) \\ & \mathbf{g}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\theta})=\mathbf{0} \\ & \mathbf{y} =\mathbf{h}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\theta}) \\ & \mathbf{f}^{\mathbf{0}}\left(\dot{\mathbf{x}}\left(t_{0}\right), \mathbf{x}\left(t_{0}\right), \mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta})\right)=\mathbf{0} \\ & \mathbf{g}^{\mathbf{0}}\left( \mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta}\right)=\mathbf{0}\\ &\mathbf{y}^{\mathbf{0}}\left(t_{0}\right)=\mathbf{h}\left(\mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta}\right) \end{align*}\end{split}\]

where:

\(\boldsymbol{\theta} \in \mathbb{R}^{N_p}\) are unknown model parameters.
\(\mathbf{x} \subseteq \mathcal{X}\) are dynamic state variables which characterize trajectory of the system, \(\mathcal{X} \in \mathbb{R}^{N_x \times N_t}\).
\(\mathbf{z} \subseteq \mathcal{Z}\) are algebraic state variables, \(\mathcal{Z} \in \mathbb{R}^{N_z \times N_t}\).
\(\mathbf{u} \subseteq \mathcal{U}\) are time-varying decision variables, \(\mathcal{U} \in \mathbb{R}^{N_u \times N_t}\).
\(\overline{\mathbf{w}} \in \mathbb{R}^{N_w}\) are time-invariant decision variables.
\(\mathbf{y} \subseteq \mathcal{Y}\) are measurement response variables, \(\mathcal{Y} \in \mathbb{R}^{N_r \times N_t}\).
\(\mathbf{f}(\cdot)\) are differential equations.
\(\mathbf{g}(\cdot)\) are algebraic equations.
\(\mathbf{h}(\cdot)\) are measurement functions.
\(\mathbf{t} \in \mathbb{R}^{N_t \times 1}\) is a union of all time sets.

Note

Parameters and design variables should be defined as Pyomo Var components on the model to use direct_kaug mode, and can be defined as Pyomo Param object if not using direct_kaug.

Based on the above notation, the form of the MBDoE problem addressed in Pyomo.DoE is shown below:

\[\begin{split}\begin{equation} \begin{aligned} \underset{\boldsymbol{\varphi}}{\max} \quad & \Psi (\mathbf{M}(\mathbf{\hat{y}}, \boldsymbol{\varphi})) \\ \text{s.t.} \quad & \mathbf{M}(\boldsymbol{\hat{\theta}}, \boldsymbol{\varphi}) = \sum_r^{N_r} \sum_{r'}^{N_r} \tilde{\sigma}_{(r,r')}\mathbf{Q}_r^\mathbf{T} \mathbf{Q}_{r'} + \mathbf{V}^{-1}_{\boldsymbol{\theta}}(\boldsymbol{\hat{\theta}}) \\ & \dot{\mathbf{x}}(t) = \mathbf{f}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}}, \boldsymbol{\theta}) \\ & \mathbf{g}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\theta})=\mathbf{0} \\ & \mathbf{y} =\mathbf{h}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\theta}) \\ & \mathbf{f}^{\mathbf{0}}\left(\dot{\mathbf{x}}\left(t_{0}\right), \mathbf{x}\left(t_{0}\right), \mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta})\right)=\mathbf{0} \\ & \mathbf{g}^{\mathbf{0}}\left( \mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta}\right)=\mathbf{0}\\ &\mathbf{y}^{\mathbf{0}}\left(t_{0}\right)=\mathbf{h}\left(\mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta}\right) \end{aligned} \end{equation}\end{split}\]

where:

\(\boldsymbol{\varphi}\) are design variables, which are manipulated to maximize the information content of experiments. It should consist of one or more of \(\mathbf{u}(t), \mathbf{y}^{\mathbf{0}}({t_0}),\overline{\mathbf{w}}\). With a proper model formulation, the timepoints for control or measurements \(\mathbf{t}\) can also be degrees of freedom.
\(\mathbf{M}\) is the Fisher information matrix (FIM), estimated as the inverse of the covariance matrix of parameter estimates \(\boldsymbol{\hat{\theta}}\). A large FIM indicates more information contained in the experiment for parameter estimation.
\(\mathbf{Q}\) is the dynamic sensitivity matrix, containing the partial derivatives of \(\mathbf{y}\) with respect to \(\boldsymbol{\theta}\).
\(\Psi\) is the design criteria to measure FIM.
\(\mathbf{V}_{\boldsymbol{\theta}}(\boldsymbol{\hat{\theta}})^{-1}\) is the FIM of previous experiments.

Pyomo.DoE provides four design criteria \(\Psi\) to measure the size of FIM:

Table 2 Pyomo.DoE design criteria
Design criterion	Computation	Geometrical meaning
A-optimality	\(\text{trace}({\mathbf{M}})\)	Dimensions of the enclosing box of the confidence ellipse
D-optimality	\(\text{det}({\mathbf{M}})\)	Volume of the confidence ellipse
E-optimality	\(\text{min eig}({\mathbf{M}})\)	Size of the longest axis of the confidence ellipse
Modified E-optimality	\(\text{cond}({\mathbf{M}})\)	Ratio of the longest axis to the shortest axis of the confidence ellipse

In order to solve problems of the above, Pyomo.DoE implements the 2-stage stochastic program. Please see Wang and Dowling (2022) for details.

Pyomo.DoE Required Inputs

The required input to the Pyomo.DoE solver is an Experiment object. The experiment object must have a get_labeled_model function which returns a Pyomo model with four Suffix components identifying the parts of the model used in MBDoE analysis. This is in line with the convention used in the parameter estimation tool, Parmest. The four Suffix components are:

experiment_inputs - The experimental design decisions
experiment_outputs - The values measured during the experiment
measurement_error - The error associated with individual values measured during the experiment
unknown_parameters - Those parameters in the model that are estimated using the measured values during the experiment

An example Experiment object that builds and labels the model is shown in the next few sections.

Pyomo.DoE Usage Example

We illustrate the use of Pyomo.DoE using a reaction kinetics example (Wang and Dowling, 2022). The Arrhenius equations model the temperature dependence of the reaction rate coefficient \(k_1, k_2\). Assuming a first-order reaction mechanism gives the reaction rate model. Further, we assume only species A is fed to the reactor.

\[\begin{split}\begin{equation} \begin{aligned} k_1 & = A_1 e^{-\frac{E_1}{RT}} \\ k_2 & = A_2 e^{-\frac{E_2}{RT}} \\ \frac{d{C_A}}{dt} & = -k_1{C_A} \\ \frac{d{C_B}}{dt} & = k_1{C_A} - k_2{C_B} \\ C_{A0}& = C_A + C_B + C_C \\ C_B(t_0) & = 0 \\ C_C(t_0) & = 0 \\ \end{aligned} \end{equation}\end{split}\]

\(C_A(t), C_B(t), C_C(t)\) are the time-varying concentrations of the species A, B, C, respectively. \(k_1, k_2\) are the rates for the two chemical reactions using an Arrhenius equation with activation energies \(E_1, E_2\) and pre-exponential factors \(A_1, A_2\). The goal of MBDoE is to optimize the experiment design variables \(\boldsymbol{\varphi} = (C_{A0}, T(t))\), where \(C_{A0},T(t)\) are the initial concentration of species A and the time-varying reactor temperature, to maximize the precision of unknown model parameters \(\boldsymbol{\theta} = (A_1, E_1, A_2, E_2)\) by measuring \(\mathbf{y}(t)=(C_A(t), C_B(t), C_C(t))\). The observation errors are assumed to be independent both in time and across measurements with a constant standard deviation of 1 M for each species.

Step 0: Import Pyomo and the Pyomo.DoE module and create an `Experiment` class

>>> # === Required import ===
>>> import pyomo.environ as pyo
>>> from pyomo.contrib.doe import DesignOfExperiments
>>> import numpy as np

class ReactorExperiment(Experiment):
    def __init__(self, data, nfe, ncp):
        """
        Arguments
        ---------
        data: object containing vital experimental information
        nfe: number of finite elements
        ncp: number of collocation points for the finite elements
        """
        self.data = data
        self.nfe = nfe
        self.ncp = ncp
        self.model = None

        #############################

Step 1: Define the Pyomo process model

The process model for the reaction kinetics problem is shown below. We build the model without any data or discretization.

    def create_model(self):
        """
        This is an example user model provided to DoE library.
        It is a dynamic problem solved by Pyomo.DAE.

        Return
        ------
        m: a Pyomo.DAE model
        """

        m = self.model = pyo.ConcreteModel()

        # Model parameters
        m.R = pyo.Param(mutable=False, initialize=8.314)

        # Define model variables
        ########################
        # time
        m.t = ContinuousSet(bounds=[0, 1])

        # Concentrations
        m.CA = pyo.Var(m.t, within=pyo.NonNegativeReals)
        m.CB = pyo.Var(m.t, within=pyo.NonNegativeReals)
        m.CC = pyo.Var(m.t, within=pyo.NonNegativeReals)

        # Temperature
        m.T = pyo.Var(m.t, within=pyo.NonNegativeReals)

        # Arrhenius rate law equations
        m.A1 = pyo.Var(within=pyo.NonNegativeReals)
        m.E1 = pyo.Var(within=pyo.NonNegativeReals)
        m.A2 = pyo.Var(within=pyo.NonNegativeReals)
        m.E2 = pyo.Var(within=pyo.NonNegativeReals)

        # Differential variables (Conc.)
        m.dCAdt = DerivativeVar(m.CA, wrt=m.t)
        m.dCBdt = DerivativeVar(m.CB, wrt=m.t)

        ########################
        # End variable def.

        # Equation definition
        ########################

        # Expression for rate constants
        @m.Expression(m.t)
        def k1(m, t):
            return m.A1 * pyo.exp(-m.E1 * 1000 / (m.R * m.T[t]))

        @m.Expression(m.t)
        def k2(m, t):
            return m.A2 * pyo.exp(-m.E2 * 1000 / (m.R * m.T[t]))

        # Concentration odes
        @m.Constraint(m.t)
        def CA_rxn_ode(m, t):
            return m.dCAdt[t] == -m.k1[t] * m.CA[t]

        @m.Constraint(m.t)
        def CB_rxn_ode(m, t):
            return m.dCBdt[t] == m.k1[t] * m.CA[t] - m.k2[t] * m.CB[t]

        # algebraic balance for concentration of C
        # Valid because the reaction system (A --> B --> C) is equimolar
        @m.Constraint(m.t)
        def CC_balance(m, t):
            return m.CA[0] == m.CA[t] + m.CB[t] + m.CC[t]

        ########################

Step 2: Finalize the Pyomo process model

Here we add data to the model and finalize the discretization. This step is required before the model can be labeled.

    def finalize_model(self):
        """
        Example finalize model function. There are two main tasks
        here:
            1. Extracting useful information for the model to align
               with the experiment. (Here: CA0, t_final, t_control)
            2. Discretizing the model subject to this information.
        """
        m = self.model

        # Unpacking data before simulation
        control_points = self.data["control_points"]

        # Set initial concentration values for the experiment
        m.CA[0].value = self.data["CA0"]
        m.CB[0].fix(self.data["CB0"])

        # Update model time `t` with time range and control time points
        m.t.update(self.data["t_range"])
        m.t.update(control_points)

        # Fix the unknown parameter values
        m.A1.fix(self.data["A1"])
        m.A2.fix(self.data["A2"])
        m.E1.fix(self.data["E1"])
        m.E2.fix(self.data["E2"])

        # Add upper and lower bounds to the design variable, CA[0]
        m.CA[0].setlb(self.data["CA_bounds"][0])
        m.CA[0].setub(self.data["CA_bounds"][1])

        m.t_control = control_points

        # Discretizing the model
        discr = pyo.TransformationFactory("dae.collocation")
        discr.apply_to(m, nfe=self.nfe, ncp=self.ncp, wrt=m.t)

        # Initializing Temperature in the model
        cv = None
        for t in m.t:
            if t in control_points:
                cv = control_points[t]
            m.T[t].setlb(self.data["T_bounds"][0])
            m.T[t].setub(self.data["T_bounds"][1])
            m.T[t] = cv

        # Make a constraint that holds temperature constant between control time points
        @m.Constraint(m.t - control_points)
        def T_control(m, t):
            """
            Piecewise constant temperature between control points
            """
            neighbour_t = max(tc for tc in control_points if tc < t)
            return m.T[t] == m.T[neighbour_t]

        #########################

Step 3: Label the information needed for DoE analysis

We label the four important groups as defined before.

    def label_experiment(self):
        """
        Example for annotating (labeling) the model with a
        full experiment.
        """
        m = self.model

        # Set measurement labels
        m.experiment_outputs = pyo.Suffix(direction=pyo.Suffix.LOCAL)
        # Add CA to experiment outputs
        m.experiment_outputs.update((m.CA[t], None) for t in m.t_control)
        # Add CB to experiment outputs
        m.experiment_outputs.update((m.CB[t], None) for t in m.t_control)
        # Add CC to experiment outputs
        m.experiment_outputs.update((m.CC[t], None) for t in m.t_control)

        # Adding error for measurement values (assuming no covariance and constant error for all measurements)
        m.measurement_error = pyo.Suffix(direction=pyo.Suffix.LOCAL)
        concentration_error = 1e-2  # Error in concentration measurement
        # Add measurement error for CA
        m.measurement_error.update((m.CA[t], concentration_error) for t in m.t_control)
        # Add measurement error for CB
        m.measurement_error.update((m.CB[t], concentration_error) for t in m.t_control)
        # Add measurement error for CC
        m.measurement_error.update((m.CC[t], concentration_error) for t in m.t_control)

        # Identify design variables (experiment inputs) for the model
        m.experiment_inputs = pyo.Suffix(direction=pyo.Suffix.LOCAL)
        # Add experimental input label for initial concentration
        m.experiment_inputs[m.CA[m.t.first()]] = None
        # Add experimental input label for Temperature
        m.experiment_inputs.update((m.T[t], None) for t in m.t_control)

        # Add unknown parameter labels
        m.unknown_parameters = pyo.Suffix(direction=pyo.Suffix.LOCAL)
        # Add labels to all unknown parameters with nominal value as the value
        m.unknown_parameters.update((k, pyo.value(k)) for k in [m.A1, m.A2, m.E1, m.E2])

        #########################

Step 4: Implement the `get_labeled_model` method

This method utilizes the previous 3 steps and is used by Pyomo.DoE to build the model to perform optimal experimental design.

    def get_labeled_model(self):
        if self.model is None:
            self.create_model()
            self.finalize_model()
            self.label_experiment()
        return self.model

Step 5: Exploratory analysis (Enumeration)

Exploratory analysis is suggested to enumerate the design space to check if the problem is identifiable, i.e., ensure that D-, E-optimality metrics are not small numbers near zero, and Modified E-optimality is not a big number.

Pyomo.DoE can perform exploratory sensitivity analysis with the compute_FIM_full_factorial function. The compute_FIM_full_factorial function generates a grid over the design space as specified by the user. Each grid point represents an MBDoE problem solved using compute_FIM method. In this way, sensitivity of the FIM over the design space can be evaluated.

The following code executes the above problem description:

    DATA_DIR = Path(__file__).parent
    file_path = DATA_DIR / "result.json"

    with open(file_path) as f:
        data_ex = json.load(f)

    # Put temperature control time points into correct format for reactor experiment
    data_ex["control_points"] = {
        float(k): v for k, v in data_ex["control_points"].items()
    }

    # Create a ReactorExperiment object; data and discretization information are part
    # of the constructor of this object
    experiment = ReactorExperiment(data=data_ex, nfe=10, ncp=3)

    # Use a central difference, with step size 1e-3
    fd_formula = "central"
    step_size = 1e-3

    # Use the determinant objective with scaled sensitivity matrix
    objective_option = "determinant"
    scale_nominal_param_value = True

    # Create the DesignOfExperiments object
    # We will not be passing any prior information in this example
    # and allow the experiment object and the DesignOfExperiments
    # call of ``run_doe`` perform model initialization.
    doe_obj = DesignOfExperiments(
        experiment,
        fd_formula=fd_formula,
        step=step_size,
        objective_option=objective_option,
        scale_constant_value=1,
        scale_nominal_param_value=scale_nominal_param_value,
        prior_FIM=None,
        jac_initial=None,
        fim_initial=None,
        L_diagonal_lower_bound=1e-7,
        solver=None,
        tee=False,
        get_labeled_model_args=None,
        _Cholesky_option=True,
        _only_compute_fim_lower=True,
    )

    # Make design ranges to compute the full factorial design
    design_ranges = {"CA[0]": [1, 5, 9], "T[0]": [300, 700, 9]}

    # Compute the full factorial design with the sequential FIM calculation
    doe_obj.compute_FIM_full_factorial(design_ranges=design_ranges, method="sequential")

    # Plot the results
    doe_obj.draw_factorial_figure(
        sensitivity_design_variables=["CA[0]", "T[0]"],
        fixed_design_variables={
            "T[0.125]": 300,
            "T[0.25]": 300,
            "T[0.375]": 300,
            "T[0.5]": 300,
            "T[0.625]": 300,
            "T[0.75]": 300,
            "T[0.875]": 300,
            "T[1]": 300,
        },
        title_text="Reactor Example",
        xlabel_text="Concentration of A (M)",
        ylabel_text="Initial Temperature (K)",
        figure_file_name="example_reactor_compute_FIM",
        log_scale=False,
    )

    ###########################

An example output of the code above, a design exploration for the initial concentration and temperature as experimental design variables with 9 values, produces the four figures summarized below:

A heatmap shows the change of the objective function, a.k.a. the experimental information content, in the design space. Horizontal and vertical axes are the two experimental design variables, while the color of each grid shows the experimental information content. For A optimality (top left subfigure), the figure shows that the most informative region is around \(C_{A0}=5.0\) M, \(T=300.0\) K, while the least informative region is around \(C_{A0}=1.0\) M, \(T=700.0\) K.

Step 6: Performing an optimal experimental design

In step 5, the DoE object was constructed to perform an exploratory sensitivity analysis. The same object can be used to design an optimal experiment with a single line of code.

    ####################
    doe_obj.run_doe()

When run, the optimal design is an initial concentration of 5.0 mol/L and an initial temperature of 494 K with all other temperatures being 300 K. The corresponding log-10 determinant of the FIM is 13.75