Parameter Estimation

Parameter Estimation using parmest requires a Pyomo model, experimental data which defines multiple scenarios, and a list of parameter names (thetas) to estimate. parmest uses Pyomo [PyomoBookII] and (optionally) mpi-sppy [mpisppy] to solve a two-stage stochastic programming problem, where the experimental data is used to create a scenario tree. The objective function needs to be written with the Pyomo Expression for first stage cost (named “FirstStageCost”) set to zero and the Pyomo Expression for second stage cost (named “SecondStageCost”) defined as the deviation between the model and the observations (typically defined as the sum of squared deviation between model values and observed values).

If the Pyomo model is not formatted as a two-stage stochastic programming problem in this format, the user can supply a custom function to use as the second stage cost and the Pyomo model will be modified within parmest to match the required specifications. The stochastic programming callback function is also defined within parmest. The callback function returns a populated and initialized model for each scenario.

To use parmest, the user creates a Estimator object which includes the following methods:

`theta_est`	Parameter estimation using all scenarios in the data
`theta_est_bootstrap`	Parameter estimation using bootstrap resampling of the data
`theta_est_leaveNout`	Parameter estimation where N data points are left out of each sample
`objective_at_theta`	Objective value for each theta
`confidence_region_test`	Confidence region test to determine if theta values are within a rectangular, multivariate normal, or Gaussian kernel density distribution for a range of alpha values
`likelihood_ratio_test`	Likelihood ratio test to identify theta values within a confidence region using the \(\chi^2\) distribution
`leaveNout_bootstrap_test`	Leave-N-out bootstrap test to compare theta values where N data points are left out to a bootstrap analysis using the remaining data, results indicate if theta is within a confidence region determined by the bootstrap analysis

Additional functions are available in parmest to group data, plot results, and fit distributions to theta values.

`group_data`	Group data by scenario
`pairwise_plot`	Plot pairwise relationship for theta values, and optionally alpha-level confidence intervals and objective value contours
`grouped_boxplot`	Plot a grouped boxplot to compare two datasets
`grouped_violinplot`	Plot a grouped violinplot to compare two datasets
`fit_rect_dist`	Fit an alpha-level rectangular distribution to theta values
`fit_mvn_dist`	Fit a multivariate normal distribution to theta values
`fit_kde_dist`	Fit a Gaussian kernel-density distribution to theta values

A Estimator object can be created using the following code. A description of each argument is listed below. Examples are provided in the Examples Section.

>>> import pyomo.contrib.parmest.parmest as parmest
>>> pest = parmest.Estimator(model_function, data, theta_names, objective_function)

Optionally, solver options can be supplied, e.g.,

>>> solver_options = {"max_iter": 6000}
>>> pest = parmest.Estimator(model_function, data, theta_names, objective_function, solver_options)

Model function

The first argument is a function which uses data for a single scenario to return a populated and initialized Pyomo model for that scenario.

Parameters that the user would like to estimate can be defined as mutable parameters (Pyomo `Param`) or variables (Pyomo `Var`). Within parmest, any parameters that are to be estimated are converted to unfixed variables. Variables that are to be estimated are also unfixed.

The model does not have to be specifically written as a two-stage stochastic programming problem for parmest. That is, parmest can modify the objective, see Objective function below.

Data

The second argument is the data which will be used to populate the Pyomo model. Supported data formats include:

Pandas Dataframe where each row is a separate scenario and column names refer to observed quantities. Pandas DataFrames are easily stored and read in from csv, excel, or databases, or created directly in Python.
List of Pandas Dataframe where each entry in the list is a separate scenario. Dataframes store observed quantities, referenced by index and column.
List of dictionaries where each entry in the list is a separate scenario and the keys (or nested keys) refer to observed quantities. Dictionaries are often preferred over DataFrames when using static and time series data. Dictionaries are easily stored and read in from json or yaml files, or created directly in Python.
List of json file names where each entry in the list contains a json file name for a separate scenario. This format is recommended when using large datasets in parallel computing.

The data must be compatible with the model function that returns a populated and initialized Pyomo model for a single scenario. Data can include multiple entries per variable (time series and/or duplicate sensors). This information can be included in custom objective functions, see Objective function below.

Theta names

The third argument is a list of parameters or variable names that the user wants to estimate. The list contains strings with Param and/or Var names from the Pyomo model.

Objective function

The fourth argument is an optional argument which defines the optimization objective function to use in parameter estimation.

If no objective function is specified, the Pyomo model is used “as is” and should be defined with “FirstStageCost” and “SecondStageCost” expressions that are used to build an objective for the two-stage stochastic programming problem.

If the Pyomo model is not written as a two-stage stochastic programming problem in this format, and/or if the user wants to use an objective that is different than the original model, a custom objective function can be defined for parameter estimation. The objective function arguments include model and data and the objective function returns a Pyomo expression which is used to define “SecondStageCost”. The objective function can be used to customize data points and weights that are used in parameter estimation.

Parameter Estimation

Model function

Data

Theta names

Objective function

Suggested initialization procedure for parameter estimation problems