Pyomo.DoE

../../_images/PyomoDoE-md.png

Pyomo.DoE (Pyomo Design of Experiments) is a Python library for model-based design of experiments using science-based models.

Pyomo.DoE was developed by Jialu Wang and Alexander Dowling at the University of Notre Dame as part of the Carbon Capture Simulation for Industry Impact (CCSI2) project, funded through the U.S. Department of Energy Office of Fossil Energy and Carbon Management. Special thank you to John Siirola and Bethany Nicholson for extensive code reviews, suggestions, and improvements to Pyomo.DoE.

If you use Pyomo.DoE, please cite:

Jialu Wang and Alexander W. Dowling (2022). “Pyomo.DoE: An open‐source package for model‐based design of experiments in Python.” AIChE Journal 68(12), e17813. doi: 10.1002/aic.17813

Methodology Overview

Model-based Design of Experiments (MBDoE) is a technique to maximize the information gain of experiments by directly using science-based models with physically meaningful parameters. It is one key component in the model calibration and uncertainty quantification workflow shown below:

../../_images/flowchart.png

Fig. 2 Pyomo.DoE integrates exploratory analysis, parameter estimation, uncertainty analysis, and MBDoE into an iterative framework to select, refine, and calibrate science-based mathematical models with quantified uncertainty. Currently, Pyomo.DoE focuses on increasing parameter precision.

Pyomo.DoE provides science-based MBDoE capabilities to the Pyomo ecosystem. The user provides one Pyomo model, a set of parameter nominal values, the allowable design spaces for design variables, and the assumed observation error structure (e.g., covariance matrix). During exploratory analysis, Pyomo.DoE checks if the model parameters can be inferred from the proposed measurements or preliminary data. Pyomo.DoE then recommends optimized experimental conditions for collecting more data. Parameter estimation via Parmest can estimate uncertainty parameters from data and compute a parameter covariance matrix. If the parameter uncertainties are sufficiently small, the workflow terminates and returns the final model with quantified parametric uncertainty. Otherwise, Pyomo.DoE recommends the best next set of experimental conditions to generate new data.

Below is an overview of the type of optimization models Pyomo.DoE can accomodate:

  • Pyomo.DoE is suitable for optimization models of continuous variables
  • Pyomo.DoE can handle equality constraints defining state variables
  • Pyomo.DoE supports (Partial) Differential-Algebraic Equations (PDAE) models via Pyomo.DAE
  • Pyomo.DoE also supports models with only algebraic constraints

Pyomo.DoE considers the following DAE model:

\[\begin{split}\begin{align*} & \dot{\mathbf{x}}(t) = \mathbf{f}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}}, \boldsymbol{\theta}) \\ & \mathbf{g}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\theta})=\mathbf{0} \\ & \mathbf{y} =\mathbf{h}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\theta}) \\ & \mathbf{f}^{\mathbf{0}}\left(\dot{\mathbf{x}}\left(t_{0}\right), \mathbf{x}\left(t_{0}\right), \mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta})\right)=\mathbf{0} \\ & \mathbf{g}^{\mathbf{0}}\left( \mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta}\right)=\mathbf{0}\\ &\mathbf{y}^{\mathbf{0}}\left(t_{0}\right)=\mathbf{h}\left(\mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta}\right) \end{align*}\end{split}\]

where:

  • \(\boldsymbol{\theta} \in \mathbb{R}^{N_p}\) are unknown model parameters.
  • \(\mathbf{x} \subseteq \mathcal{X}\) are dynamic state variables which characterize trajectory of the system, \(\mathcal{X} \in \mathbb{R}^{N_x \times N_t}\).
  • \(\mathbf{z} \subseteq \mathcal{Z}\) are algebraic state variables, \(\mathcal{Z} \in \mathbb{R}^{N_z \times N_t}\).
  • \(\mathbf{u} \subseteq \mathcal{U}\) are time-varying decision variables, \(\mathcal{U} \in \mathbb{R}^{N_u \times N_t}\).
  • \(\overline{\mathbf{w}} \in \mathbb{R}^{N_w}\) are time-invariant decision variables.
  • \(\mathbf{y} \subseteq \mathcal{Y}\) are measurement response variables, \(\mathcal{Y} \in \mathbb{R}^{N_r \times N_t}\).
  • \(\mathbf{f}(\cdot)\) are differential equations.
  • \(\mathbf{g}(\cdot)\) are algebraic equations.
  • \(\mathbf{h}(\cdot)\) are measurement functions.
  • \(\mathbf{t} \in \mathbb{R}^{N_t \times 1}\) is a union of all time sets.

Note

  • Process models provided to Pyomo.DoE should define an extra scenario index for all state variables and all parameters, as the first index before any other index. The next version of Pyomo.DoE will remove this requirement.
  • Process models must include an index for time, named t. For steady-state models, t should be [0].
  • Measurements can have an extra index (e.g., spatial domain) besides time.
  • Parameters and design variables should be defined as Pyomo var components on the model to use direct_kaug mode. Other modes allow these to be defined as Pyomo Param objects.
  • Create model function should take scenarios as the first argument of this function.
  • Design variables are defined with and only with a time index.

Pyomo.DoE solves the following DAE-constrainted optimization problem:

\[\begin{split}\begin{equation} \begin{aligned} \underset{\boldsymbol{\varphi}}{\max} \quad & \Psi (\mathbf{M}(\mathbf{\hat{y}}, \boldsymbol{\varphi})) \\ \text{s.t.} \quad & \mathbf{M}(\boldsymbol{\hat{\theta}}, \boldsymbol{\varphi}) = \sum_r^{N_r} \sum_{r'}^{N_r} \tilde{\sigma}_{(r,r')}\mathbf{Q}_r^\mathbf{T} \mathbf{Q}_{r'} + \mathbf{V}^{-1}_{\boldsymbol{\theta}}(\boldsymbol{\hat{\theta}}) \\ & \dot{\mathbf{x}}(t) = \mathbf{f}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}}, \boldsymbol{\theta}) \\ & \mathbf{g}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{y}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\theta})=\mathbf{0} \\ & \mathbf{y} =\mathbf{h}(\mathbf{x}(t), \mathbf{z}(t), \mathbf{u}(t), \overline{\mathbf{w}},\boldsymbol{\theta}) \\ & \mathbf{f}^{\mathbf{0}}\left(\dot{\mathbf{x}}\left(t_{0}\right), \mathbf{x}\left(t_{0}\right), \mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta})\right)=\mathbf{0} \\ & \mathbf{g}^{\mathbf{0}}\left( \mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{y}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta}\right)=\mathbf{0}\\ &\mathbf{y}^{\mathbf{0}}\left(t_{0}\right)=\mathbf{h}\left(\mathbf{x}\left(t_{0}\right),\mathbf{z}(t_0), \mathbf{u}\left(t_{0}\right), \overline{\mathbf{w}}, \boldsymbol{\theta}\right) \end{aligned} \end{equation}\end{split}\]

where:

  • \(\boldsymbol{\varphi}\) are design variables, which are manipulated to maximize the information content of experiments. \(\boldsymbol{\varphi}\) should consist of one or more of \(\mathbf{u}(t), \mathbf{y}^{\mathbf{0}}({t_0}),\overline{\mathbf{w}}\). With a proper model formulation, the timepoints for control or measurements \(\mathbf{t}\) can also be degrees of freedom.
  • \(\mathbf{M}\) is the Fisher information matrix (FIM), estimated as the inverse of the covariance matrix of parameter estimates \(\boldsymbol{\hat{\theta}}\). A large FIM indicates more information contained in the experiment for parameter estimation.
  • \(\mathbf{Q}\) is the dynamic sensitivity matrix, containing the partial derivatives of \(\mathbf{y}\) with respect to \(\boldsymbol{\theta}\).
  • \(\Psi(\cdot)\) is the design criteria computed from the FIM.
  • \(\mathbf{V}_{\boldsymbol{\theta}}(\boldsymbol{\hat{\theta}})^{-1}\) is the FIM of previous experiments.

Pyomo.DoE provides four design criteria \(\Psi(\cdot)\) to measure the size of FIM:

Table 1 Pyomo.DoE design criteria
Design criterion Computation Geometrical meaning
A-optimality \(\text{trace}({\mathbf{M}})\) Dimensions of the enclosing box of the confidence ellipse
D-optimality \(\text{det}({\mathbf{M}})\) Volume of the confidence ellipse
E-optimality \(\text{min eig}({\mathbf{M}})\) Size of the longest axis of the confidence ellipse
Modified E-optimality \(\text{cond}({\mathbf{M}})\) Ratio of the longest axis to the shortest axis of the confidence ellipse

Pyomo.DoE reformulates the above optimization problem as a stochastic program. The scenarios are perturbations of the model parameters \(\boldsymbol{\theta}\) to compute the sensitivities \(\mathbf{Q}\) via finite difference. See Wang and Dowling (2022) for details.

Pyomo.DoE Required Inputs

Pyomo.DoE requires the following inputs:

  • A Python function that creates the process model. This is similar to interface for Parmest.
  • Dictionary of parameters and their nominal values
  • Dictionary of measurements and their measurement time points
  • Dictionary of design variables and their control time points
  • A Numpy array containing the prior FIM
  • Local and global nonlinear optimization solver object

Below is a list of arguments that Pyomo.DoE expects the user to provide.

param_initdictionary
A dictionary of parameter names and values. If they are indexed variables, put the variable names and indexes in a nested Dictionary.
design_variable_timepointsdictionary
A dictionary of design variable names and their control time points. If the design variables are time-invariant (constant), set the time to [0]
measurement_objectobject
An object of the measurements, provided by the measurement class.
create_modelfunction
A function returning a deterministic process model.
prior_FIMarray
An array defining the Fisher information matrix (FIM) for prior experiments, default is a zero matrix.

Pyomo.DoE Solver Interface

../../_images/uml.png
class pyomo.contrib.doe.doe.DesignOfExperiments(param_init, design_variable_timepoints, measurement_object, create_model, solver=None, time_set_name='t', prior_FIM=None, discretize_model=None, args=None)[source]
__init__(param_init, design_variable_timepoints, measurement_object, create_model, solver=None, time_set_name='t', prior_FIM=None, discretize_model=None, args=None)[source]

This package enables model-based design of experiments analysis with Pyomo. Both direct optimization and enumeration modes are supported. NLP sensitivity tools, e.g., sipopt and k_aug, are supported to accelerate analysis via enumeration. It can be applied to dynamic models, where design variables are controlled throughout the experiment.

Parameters:
  • param_init – A dictionary of parameter names and values. If they defined as indexed Pyomo variable, put the variable name and index, such as ‘theta[“A1”]’. Note: if sIPOPT is used, parameter shouldn’t be indexed.
  • design_variable_timepoints – A dictionary where keys are design variable names, values are its control time points. If this design var is independent of time (constant), set the time to [0]
  • measurement_object – A measurement object.
  • create_model – A function that returns the model
  • solver – A solver object that User specified, default=None. If not specified, default solver is IPOPT MA57.
  • time_set_name – A string of the name of the time set in the model. Default is “t”.
  • prior_FIM – A list of lists containing Fisher information matrix (FIM) for prior experiments.
  • discretize_model – A user-specified function that discretizes the model. Only use with Pyomo.DAE, default=None
  • args – Additional arguments for the create_model function.
compute_FIM(design_values, mode='sequential_finite', FIM_store_name=None, specified_prior=None, tee_opt=True, scale_nominal_param_value=False, scale_constant_value=1, store_output=None, read_output=None, extract_single_model=None, formula='central', step=0.001, objective_option='det')[source]

This function solves a square Pyomo model with fixed design variables to compute the FIM. It calculates FIM with sensitivity information from four modes:

  1. sequential_finite: Calculates a single-scenario model which is solved many times in series to estimate sensitivity information by finite difference
  2. sequential_sipopt: calculate sensitivity by sIPOPT [Experimental]
  3. sequential_kaug: calculate sensitivity by k_aug [Experimental]
  4. direct_kaug: calculate sensitivity by k_aug with direct sensitivity

“Simultaneous_finite” mode is not included in this function.

Parameters:
  • design_values – a dict where keys are design variable names, values are a dict whose keys are time point and values are the design variable value at that time point
  • mode – use mode=’sequential_finite’, ‘sequential_sipopt’, ‘sequential_kaug’, ‘direct_kaug’
  • FIM_store_name – if storing the FIM in a .csv or .txt, give the file name here as a string.
  • specified_prior – provide alternate prior matrix, default is no prior.
  • tee_opt – if True, IPOPT console output is printed
  • scale_nominal_param_value – if True, the parameters are scaled by its own nominal value in param_init
  • scale_constant_value – scale all elements in Jacobian matrix, default is 1.
  • store_output – if storing the output (value stored in Var ‘output_record’) as a pickle file, give the file name here as a string.
  • read_output – if reading the output (value for Var ‘output_record’) as a pickle file, give the file name here as a string.
  • extract_single_model – if True, the solved model outputs for each scenario are all recorded as a .csv file. The output file uses the name AB.csv, where string A is store_output input, B is the index of scenario. scenario index is the number of the scenario outputs which is stored.
  • formula – choose from ‘central’, ‘forward’, ‘backward’, None. This option is only used for ‘sequential_finite’ mode.
  • step – Sensitivity perturbation step size, a fraction between [0,1]. default is 0.001
  • objective_option – choose from ‘det’ or ‘trace’ or ‘zero’. Optimization problem maximizes determinant or trace or using 0 as objective function.
Returns:

FIM_analysis

Return type:

result summary object of this solve

Enumerate through full grid search for any number of design variables; solve square problems sequentially to compute FIMs. It calculates FIM with sensitivity information from four modes:

  1. sequential_finite: Calculates a one scenario model multiple times for multiple scenarios. Sensitivity info estimated by finite difference
  2. sequential_sipopt: calculate sensitivity by sIPOPT [Experimental]
  3. sequential_kaug: calculate sensitivity by k_aug [Experimental]
  4. direct_kaug: calculate sensitivity by k_aug with direct sensitivity
Parameters:
  • design_values – a dict where keys are design variable names, values are a dict whose keys are time point and values are the design variable value at that time point
  • design_ranges – a list of design variable values to go over
  • design_dimension_names – a list of design variable names of each design range
  • design_control_time – a list of control time points that should be fixed to the values in dv_ranges
  • mode – use mode=’sequential_finite’, ‘sequential_sipopt’, ‘sequential_kaug’, ‘direct_kaug’
  • tee_option – if solver console output is made
  • scale_nominal_param_value – if True, the parameters are scaled by its own nominal value in param_init
  • scale_constant_value – scale all elements in Jacobian matrix, default is 1.
  • store_name – a string of file name. If not None, store results with this name. Since there are multiple experiments, results are numbered with a scalar number, and the result for one grid is ‘store_name(count).csv’ (count is the number of count).
  • read_name – a string of file name. If not None, read result files. Since there are multiple experiments, this string should be the common part of all files; Real name of the file is “read_name(count)”, where count is the number of the experiment.
  • filename – if True, grid search results stored with this file name
  • formula – choose from ‘central’, ‘forward’, ‘backward’, None. This option is only used for ‘sequential_finite’ mode.
  • step – Sensitivity perturbation step size, a fraction between [0,1]. default is 0.001
Returns:

figure_draw_object

Return type:

a combined result object of class Grid_search_result

stochastic_program(design_values, if_optimize=True, objective_option='det', jac_involved_measurement=None, scale_nominal_param_value=False, scale_constant_value=1, optimize_opt=None, if_Cholesky=False, L_LB=1e-07, L_initial=None, jac_initial=None, fim_initial=None, formula='central', step=0.001, check=True, tee_opt=True)[source]

Optimize DOE problem with design variables being the decisions. The DOE model is formed invasively and all scenarios are computed simultaneously. The function will first run a square problem with design variable being fixed at the given initial points (Objective function being 0), then a square problem with design variables being fixed at the given initial points (Objective function being Design optimality), and then unfix the design variable and do the optimization.

Parameters:
  • design_values – a dict where keys are design variable names, values are a dict whose keys are time point and values are the design variable value at that time point
  • if_optimize – if true, continue to do optimization. else, just run square problem with given design variable values
  • objective_option – supporting maximizing the ‘det’ determinant or the ‘trace’ trace of the FIM
  • jac_involved_measurement – the measurement class involved in calculation. If None, take the overall measurement class
  • scale_nominal_param_value – if True, the parameters are scaled by its own nominal value in param_init
  • scale_constant_value – scale all elements in Jacobian matrix, default is 1.
  • optimize_opt – A dictionary, keys are design variables, values are True or False deciding if this design variable will be optimized as DOF or not
  • if_Cholesky – if True, Cholesky decomposition is used for Objective function for D-optimality.
  • L_LB – L is the Cholesky decomposition matrix for FIM, i.e. FIM = L*L.T. L_LB is the lower bound for every element in L. if FIM is positive definite, the diagonal element should be positive, so we can set a LB like 1E-10
  • L_initial – initialize the L
  • jac_initial – a matrix used to initialize jacobian matrix
  • fim_initial – a matrix used to initialize FIM matrix
  • formula – choose from ‘central’, ‘forward’, ‘backward’, None. This option is only used for ‘sequential_finite’ mode.
  • step – Sensitivity perturbation step size, a fraction between [0,1]. default is 0.001
  • check – if True, inputs are checked for consistency, default is True.
Returns:

  • analysis_square (result summary of the square problem solved at the initial point)
  • analysis_optimize (result summary of the optimization problem solved)

Note

stochastic_program() includes the following steps:
  1. Build two-stage stochastic programming optimization model where scenarios correspond to finite difference approximations for the Jacobian of the response variables with respect to calibrated model parameters
  2. Fix the experiment design decisions and solve a square (i.e., zero degrees of freedom) instance of the two-stage DoE problem. This step is for initialization.
  3. Unfix the experiment design decisions and solve the two-stage DoE problem.
class pyomo.contrib.doe.measurements.Measurements(measurement_index_time, variance=None, ind_string='_index_')[source]
__init__(measurement_index_time, variance=None, ind_string='_index_')[source]

This class stores information on which algebraic and differential variables in the Pyomo model are considered measurements.

This includes the functionality to specify indices for these measurement variables. For example, with a partial differential algebraic equation model, these measurement index sets can specify which spatial and temporal coordinates each measurement is available. Moreover, this class supports defining the covariance matrix for all measurements.

Parameters:
  • measurement_index_time

    a dict, keys are measurement variable names,

    • if there are extra indices, for e.g., Var[scenario, extra_index, time]: values are a dictionary, keys are its extra index, values are its measuring time points.
    • if there are no extra indices, for e.g., Var[scenario, time]: values are a list of measuring time point.

    For e.g., for the kinetics illustrative example, it should be {‘C’:{‘CA’:[0,1,..], ‘CB’:[0,2,…]}, ‘k’:[0,4,..]}, so the measurements are C[scenario, ‘CA’, 0]…, k[scenario, 0]….

  • variance – a dict, keys are measurement variable names, values are a dictionary, keys are its extra index, values are its variance (a scalar number), values are its variance if there is no extra index for this measurement. For e.g., for the kinetics illustrative example, it should be {‘C’:{‘CA’: 10, ‘CB’: 1, ‘CC’: 2}}. If given None, the default is {‘C’:{‘CA’: 1, ‘CB’: 1, ‘CC’: 1}}.
  • ind_string – a ‘’string’’, used to flatten the name of variables and extra index. Default is ‘_index_’. For e.g., for {‘C’:{‘CA’: 10, ‘CB’: 1, ‘CC’: 2}}, the reformulated name is ‘C_index_CA’.
check_subset(subset, throw_error=True, valid_subset=True)[source]

Check if the subset is correctly defined with right name, index and time.

Parameters:
  • subset – a ‘’dict’’ where measurement name and index are involved in jacobian calculation
  • throw_error – if the given subset is not a subset of the measurement set, throw error message
class pyomo.contrib.doe.scenario.Scenario_generator(para_dict, formula='central', step=0.001, store=False)[source]
__init__(para_dict, formula='central', step=0.001, store=False)[source]

Generate scenarios. DoE library first calls this function to generate scenarios. For sequential and simultaneous models, call different functions in this class.

Parameters:
  • para_dict – a dict of parameter, keys are names of ‘’string’’, values are their nominal value of ‘’float’’. for e.g., {‘A1’: 84.79, ‘A2’: 371.72, ‘E1’: 7.78, ‘E2’: 15.05}
  • formula – choose from ‘central’, ‘forward’, ‘backward’, None.
  • step – Sensitivity perturbation step size, a fraction between [0,1]. default is 0.001
  • store – if True, store results.
class pyomo.contrib.doe.result.FisherResults(para_name, measure_object, jacobian_info=None, all_jacobian_info=None, prior_FIM=None, store_FIM=None, scale_constant_value=1, max_condition_number=1000000000000.0, verbose=True)[source]
__init__(para_name, measure_object, jacobian_info=None, all_jacobian_info=None, prior_FIM=None, store_FIM=None, scale_constant_value=1, max_condition_number=1000000000000.0, verbose=True)[source]

Analyze the FIM result for a single run

Parameters:
  • para_name – A list of parameter names
  • measure_object – measurement information object
  • jacobian_info – the jacobian for this measurement object
  • all_jacobian_info – the overall jacobian
  • prior_FIM – if there’s prior FIM to be added
  • store_FIM – if storing the FIM in a .csv or .txt, give the file name here as a string
  • scale_constant_value – scale all elements in Jacobian matrix, default is 1.
  • max_condition_number – max condition number
  • verbose – if True, print statements are used
calculate_FIM(dv_values, result=None)[source]

Calculate FIM from Jacobian information. This is for grid search (combined models) results

Parameters:
  • dv_values – a dict where keys are design variable names, values are a dict whose keys are time point and values are the design variable value at that time point
  • result – solver status returned by IPOPT
class pyomo.contrib.doe.result.GridSearchResult(design_ranges, design_dimension_names, design_control_time, FIM_result_list, store_optimality_name=None, verbose=True)[source]
__init__(design_ranges, design_dimension_names, design_control_time, FIM_result_list, store_optimality_name=None, verbose=True)[source]

This class deals with the FIM results from grid search, providing A, D, E, ME-criteria results for each design variable. Can choose to draw 1D sensitivity curves and 2D heatmaps.

Parameters:
  • design_ranges – a dict whose keys are design variable names, values are a list of design variable values to go over
  • design_dimension_names – a list of design variables names
  • design_control_time – a list of design control timesets
  • FIM_result_list – a dict containing FIM results, keys are a tuple of design variable values, values are FIM result objects
  • store_optimality_name – a .csv file name containing all four optimalities value
  • verbose – if True, print statements

Pyomo.DoE Usage Example

We illustrate the Pyomo.DoE interface with a reaction kinetics example when feed A is converted to species B and C (Wang and Dowling, 2022). Assuming an Arrhenius temperature dependence for the reaction rates \(k_1, k_2\), first-order reaction mechanisms, and only species A is fed to the reactor gives the following DAE model:

\[\begin{split}\begin{equation} \begin{aligned} k_1 & = A_1 e^{-\frac{E_1}{RT}} \\ k_2 & = A_2 e^{-\frac{E_2}{RT}} \\ \frac{d{C_A}}{dt} & = -k_1{C_A} \\ \frac{d{C_B}}{dt} & = k_1{C_A} - k_2{C_B} \\ C_{A0}& = C_A + C_B + C_C \\ C_B(t_0) & = 0 \\ C_C(t_0) & = 0 \\ \end{aligned} \end{equation}\end{split}\]

Here \(C_A(t), C_B(t), C_C(t)\) are the time-varying concentrations of the species A, B, C, respectively. The reaction rates \(k_1, k_2\) depend on the activation energies \(E_1, E_2\) and pre-exponential factors \(A_1, A_2\). The goal of MBDoE is to optimize the experiment design variables \(\boldsymbol{\varphi} = (C_{A0}, T(t))\), where \(C_{A0},T(t)\) are the initial concentration of species A and the time-varying reactor temperature, to maximize the precision of unknown model parameters \(\boldsymbol{\theta} = (A_1, E_1, A_2, E_2)\) by measuring \(\mathbf{y}(t)=(C_A(t), C_B(t), C_C(t))\). The observation errors are assumed to be independent in both time and across measurements with a constant standard deviation of 1 M for each species. Thus the measurement covariance matrix is the identity matrix.

See this Jupyter notebook for an extended version of this example.

Step 0: Import Pyomo and the Pyomo.DoE module

>>> # === Required import ===
>>> import pyomo.environ as pyo
>>> from pyomo.dae import ContinuousSet, DerivativeVar
>>> from pyomo.contrib.doe import Measurements, DesignOfExperiments
>>> import numpy as np

Step 1: Define the Pyomo process model

The process model for the reaction kinetics problem is shown below.

>>> def create_model(scena, CA_init=5, T_initial=300,args=None):
...     # === Create model ==
...     m = pyo.ConcreteModel()
...     m.R = 8.31446261815324  # J/K/mol
...     # === Define set ===
...     m.t0 = pyo.Set(initialize=[0])
...     m.t = ContinuousSet(bounds=(0, 1))
...     m.t_con = pyo.Set(initialize=[0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1])
...     m.scena = pyo.Set(initialize=scena['scena-name'])
...     m.y_set = pyo.Set(initialize=['CA', 'CB', 'CC'])
...     # === Define variables ===
...     m.CA0 = pyo.Var(m.t0, initialize = CA_init, bounds=(1.0,5.0), within=pyo.NonNegativeReals) # mol/L
...     m.T = pyo.Var(m.t, initialize =T_initial, bounds=(300, 700), within=pyo.NonNegativeReals)
...     m.C = pyo.Var(m.scena, m.y_set, m.t, initialize=3, within=pyo.NonNegativeReals)
...     m.dCdt = DerivativeVar(m.C, wrt=m.t)
...     m.kp1 = pyo.Var(m.scena, m.t, initialize=3)
...     m.kp2 = pyo.Var(m.scena, m.t, initialize=1)
...     # === Define Param ===
...     m.A1 = pyo.Param(m.scena, initialize=scena['A1'],mutable=True)
...     m.A2 = pyo.Param(m.scena, initialize=scena['A2'],mutable=True)
...     m.E1 = pyo.Param(m.scena, initialize=scena['E1'],mutable=True)
...     m.E2 = pyo.Param(m.scena, initialize=scena['E2'],mutable=True)
...     # === Constraints ===
...     def T_control(m,t):
...         if t in m.t_con:
...             return pyo.Constraint.Skip
...         else:
...             j = -1
...             for t_con in m.t_con:
...                 if t>t_con:
...                     j+=1
...             neighbour_t = t_control[j]
...         return m.T[t] == m.T[neighbour_t]
...     def cal_kp1(m,z,t):
...         return m.kp1[z,t] == m.A1[z]*pyo.exp(-m.E1[z]*1000/(m.R*m.T[t]))
...     def cal_kp2(m,z,t):
...         return m.kp2[z,t] == m.A2[z]*pyo.exp(-m.E2[z]*1000/(m.R*m.T[t]))
...     def dCdt_control(m,z,y,t):
...         if y=='CA':
...             return m.dCdt[z,y,t] == -m.kp1[z,t]*m.C[z,'CA',t]
...         elif y=='CB':
...             return m.dCdt[z,y,t] == m.kp1[z,t]*m.C[z,'CA',t] - m.kp2[z,t]*m.C[z,'CB',t]
...         elif y=='CC':
...             return pyo.Constraint.Skip
...     def alge(m,z,t):
...         return m.C[z,'CA',t] + m.C[z,'CB',t] + m.C[z,'CC', t] == m.CA0[0]
...     m.T_rule = pyo.Constraint(m.t, rule=T_control)
...     m.k1_pert_rule = pyo.Constraint(m.scena, m.t, rule=cal_kp1)
...     m.k2_pert_rule = pyo.Constraint(m.scena, m.t, rule=cal_kp2)
...     m.dCdt_rule = pyo.Constraint(m.scena,m.y_set, m.t, rule=dCdt_control)
...     m.alge_rule = pyo.Constraint(m.scena, m.t, rule=alge)
...     for z in m.scena:
...         m.C[z,'CB',0.0].fix(0.0)
...         m.C[z,'CC',0.0].fix(0.0)
...     return m

Next we define a function to discretize the model.

>>> # === Discretization ===
>>> def disc(m, NFE=32):
...     discretizer = pyo.TransformationFactory('dae.collocation')
...     discretizer.apply_to(m, nfe=NFE, ncp=3, wrt=m.t)
...     return m

Note

The first argument of the create_model function should be scena.

Note

To use direct_kaug mode, the model parameters ( \(A_1, A_2, E_1, E_2\)) definitions should be changed from Param to Var objects.

Step 2: Define the inputs for Pyomo.DoE

>>> # === Design variables, time points
>>> t_control = [0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1]    # Control time set [h]
>>> dv_pass = {'CA0': [0],'T': t_control}  # design variable and its control time set

>>> # === Measurement object ===
>>> t_measure = [0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1]   # Measurement time points [h]
>>> measure_pass = {'C':{'CA': t_measure, 'CB': t_measure, 'CC': t_measure}}
>>> measure_variance = {'C': {'CA': 1, 'CB': 1, 'CC': 1}} # provide measurement uncertainty
>>> measure_class =  Measurements(measure_pass, variance=measure_variance)  # Use Pyomo.DoE.Measurements to achieve a measurement object

>>> # === Parameter dictionary ===
>>> parameter_dict = {'A1': 84.79, 'A2': 371.72, 'E1': 7.78, 'E2': 15.05}

>>> # === Define prior information ==
>>> prior_none = np.zeros((4,4))

Step 3: Compute the FIM of a square MBDoE problem

This method computes an MBDoE optimization problem with no degree of freedom.

This method can be accomplished by two modes, direct_kaug and sequential_finite. direct_kaug mode requires the installation of the solver k_aug which is available through the IDAES-PSE extensions.

>>> # === Decide mode ===
>>> sensi_opt = 'sequential_finite'
>>> # === Specify an experiment ===
>>> exp1 = {'CA0': {0: 5}, 'T': {0: 570, 0.125:300,  0.25:300,  0.375:300,  0.5:300,  0.625:300,  0.75:300,  0.875:300, 1:300}}
>>> # === Create the DoE object ===
>>> doe_object = DesignOfExperiments(parameter_dict, dv_pass, measure_class, create_model,
...                            prior_FIM=prior_none, discretize_model=disc)
>>> # === Use ``compute_FIM`` to compute one MBDoE square problem ===
>>> result = doe_object.compute_FIM(exp1,mode=sensi_opt, FIM_store_name = 'dynamic.csv',
...                            store_output = 'store_output') 
>>> # === Use ``calculate_FIM`` method of the result object to evaluate the FIM ===
>>> result.calculate_FIM(doe_object.design_values) 
>>> # === Print FIM and its trace, determinant, condition number and minimal eigen value ===
>>> result.FIM  
>>> result.trace 
>>> result.det 
>>> result.cond 
>>> result.min_eig  

Step 4: Exploratory analysis (Enumeration)

Exploratory analysis is suggested to enumerate the design space to check if the problem is identifiable, i.e., ensure that D-, E-optimality metrics are not small numbers near zero, and Modified E-optimality is not a big number.

Pyomo.DoE accomplishes the exploratory analysis with the run_grid_search function. It allows users to define any number of design decisions. Heatmaps (or lines) are used to visualize the sensitivity of the DoE criteria to changes in two (or one) design variables with any other design variables held constant. The function run_grid_search enumerates over the design space, each MBDoE problem accomplished by compute_FIM method. Therefore, run_grid_search supports only two modes: sequential_finite and direct_kaug.

>>> # === Specify inputs===
>>> design_ranges = [[1,2,3,4,5], [300,400,500,600,700]] # [CA0 [M], T [K]]
>>> dv_apply_name = ['CA0','T']
>>> dv_apply_time = [[0],t_control]
>>> exp1 = {'CA0': {0: 5}, 'T': {0: 570, 0.125:300,  0.25:300,  0.375:300,  0.5:300,  0.625:300,  0.75:300,  0.875:300, 1:300}} # CA0 in [M], T in [K]
>>> sensi_opt = 'sequential_finite'
>>> prior_all = np.zeros((4,4))
>>> prior_pass=np.asarray(prior_all)

>>> # === Run enumeration ===
>>> doe_object = DesignOfExperiments(parameter_dict, dv_pass, measure_class, create_model,
...                            prior_FIM=prior_none, discretize_model=disc) 
>>> all_fim = doe_object.run_grid_search(exp1, design_ranges, dv_apply_name, dv_apply_time, mode=sensi_opt) 

>>> # === Analyze results ===
>>> test = all_fim.extract_criteria() 
>>> # === Draw 1D sensitivity curve===
>>> fixed = {"'CA0'": 5.0} # fix a dimension
>>> all_fim.figure_drawing(fixed, ['T'], 'Reactor case','T [K]','$C_{A0}$ [M]' ) 
>>> # === Draw 2D heatmap ===
>>> fixed = {} # do not need to fix
>>> all_fim.figure_drawing(fixed, ['CA0','T'], 'Reactor case','$C_{A0}$ [M]', 'T [K]' ) 

Successful run of the above code shows the following figure:

../../_images/grid-1.png

This heatmap shows the sensitivity of the DoE criteria, i.e., measures of the experimental information content, across the two-dimensional experiment design space. Horizontal and vertical axes are two design variables, while the color of each grid shows the experimental information content. For example, A-optimality shows that the most informative region is around \(C_{A0}=5.0\) M, \(T=300.0\) K, while the least informative region is around \(C_{A0}=1.0\) M, \(T=700.0\) K.

Step 5: Gradient-based optimization

Pyomo.DoE formulates a two-stage stochastic_program to compute A- and D-optimality designs.

This function solves twice to ensure reliable intialization: first, Pyomo.DoE solves a square problem. Next, Pyomo.DoE unfixes the design variables (adds degrees of freedom) and solves again.

>>> # === Specify a starting point ===
>>> exp1 = {'CA0': {0: 5}, 'T': {0: 300, 0.125:300,  0.25:300,  0.375:300,  0.5:300,  0.625:300,  0.75:300,  0.875:300, 1:300}}
>>> # === Define DoE object ===
>>> doe_object = DesignOfExperiments(parameter_dict, dv_pass, measure_class, createmod,
...                            prior_FIM=prior_pass, discretize_model=disc) 
>>> # === Optimize ===
>>> square_result, optimize_result= doe_object.stochastic_program(exp1,
...                                                         if_optimize=True,
...                                                          if_Cholesky=True,
...                                                         scale_nominal_param_value=True,
...                                                         objective_option='det',
...                                                         L_initial=None)