Parameter Estimation

Parameter Estimation using parmest requires a Pyomo model, experimental data which defines multiple scenarios, and a list of parameter names (thetas) to estimate. parmest uses Pyomo [PyomoBookII] and (optionally) mpi-sppy [mpisppy] to solve a two-stage stochastic programming problem, where the experimental data is used to create a scenario tree. The objective function needs to be written with the Pyomo Expression for first stage cost (named “FirstStageCost”) set to zero and the Pyomo Expression for second stage cost (named “SecondStageCost”) defined as the deviation between the model and the observations (typically defined as the sum of squared deviation between model values and observed values).

If the Pyomo model is not formatted as a two-stage stochastic programming problem in this format, the user can supply a custom function to use as the second stage cost and the Pyomo model will be modified within parmest to match the required specifications. The stochastic programming callback function is also defined within parmest. The callback function returns a populated and initialized model for each scenario.

To use parmest, the user creates a Estimator object which includes the following methods:

theta_est Parameter estimation using all scenarios in the data
theta_est_bootstrap Parameter estimation using bootstrap resampling of the data
theta_est_leaveNout Parameter estimation where N data points are left out of each sample
objective_at_theta Objective value for each theta
confidence_region_test Confidence region test to determine if theta values are within a rectangular, multivariate normal, or Gaussian kernel density distribution for a range of alpha values
likelihood_ratio_test Likelihood ratio test to identify theta values within a confidence region using the \(\chi^2\) distribution
leaveNout_bootstrap_test Leave-N-out bootstrap test to compare theta values where N data points are left out to a bootstrap analysis using the remaining data, results indicate if theta is within a confidence region determined by the bootstrap analysis

Additional functions are available in parmest to group data, plot results, and fit distributions to theta values.

group_data Group data by scenario
pairwise_plot Plot pairwise relationship for theta values, and optionally alpha-level confidence intervals and objective value contours
grouped_boxplot Plot a grouped boxplot to compare two datasets
grouped_violinplot Plot a grouped violinplot to compare two datasets
fit_rect_dist Fit an alpha-level rectangular distribution to theta values
fit_mvn_dist Fit a multivariate normal distribution to theta values
fit_kde_dist Fit a Gaussian kernel-density distribution to theta values

A Estimator object can be created using the following code. A description of each argument is listed below. Examples are provided in the Examples Section.

>>> import pyomo.contrib.parmest.parmest as parmest
>>> pest = parmest.Estimator(model_function, data, theta_names, objective_function)

Optionally, solver options can be supplied, e.g.,

>>> solver_options = {"max_iter": 6000}
>>> pest = parmest.Estimator(model_function, data, theta_names, objective_function, solver_options)

Model function

The first argument is a function which uses data for a single scenario to return a populated and initialized Pyomo model for that scenario.

Parameters that the user would like to estimate can be defined as mutable parameters (Pyomo `Param`) or variables (Pyomo `Var`). Within parmest, any parameters that are to be estimated are converted to unfixed variables. Variables that are to be estimated are also unfixed.

The model does not have to be specifically written as a two-stage stochastic programming problem for parmest. That is, parmest can modify the objective, see Objective function below.

Data

The second argument is the data which will be used to populate the Pyomo model. Supported data formats include:

  • Pandas Dataframe where each row is a separate scenario and column names refer to observed quantities. Pandas DataFrames are easily stored and read in from csv, excel, or databases, or created directly in Python.
  • List of Pandas Dataframe where each entry in the list is a separate scenario. Dataframes store observed quantities, referenced by index and column.
  • List of dictionaries where each entry in the list is a separate scenario and the keys (or nested keys) refer to observed quantities. Dictionaries are often preferred over DataFrames when using static and time series data. Dictionaries are easily stored and read in from json or yaml files, or created directly in Python.
  • List of json file names where each entry in the list contains a json file name for a separate scenario. This format is recommended when using large datasets in parallel computing.

The data must be compatible with the model function that returns a populated and initialized Pyomo model for a single scenario. Data can include multiple entries per variable (time series and/or duplicate sensors). This information can be included in custom objective functions, see Objective function below.

Theta names

The third argument is a list of parameters or variable names that the user wants to estimate. The list contains strings with Param and/or Var names from the Pyomo model.

Objective function

The fourth argument is an optional argument which defines the optimization objective function to use in parameter estimation.

If no objective function is specified, the Pyomo model is used “as is” and should be defined with “FirstStageCost” and “SecondStageCost” expressions that are used to build an objective for the two-stage stochastic programming problem.

If the Pyomo model is not written as a two-stage stochastic programming problem in this format, and/or if the user wants to use an objective that is different than the original model, a custom objective function can be defined for parameter estimation. The objective function arguments include model and data and the objective function returns a Pyomo expression which is used to define “SecondStageCost”. The objective function can be used to customize data points and weights that are used in parameter estimation.

Suggested initialization procedure for parameter estimation problems

To check the quality of initial guess values provided for the fitted parameters, we suggest solving a square instance of the problem prior to solving the parameter estimation problem using the following steps:

  1. Create Estimator object. To initialize the parameter estimation solve from the square problem solution, set optional argument solver_options = {bound_push: 1e-8}.
  2. Call objective_at_theta with optional argument (initialize_parmest_model=True). Different initial guess values for the fitted parameters can be provided using optional argument theta_values (Pandas Dataframe)
  3. Solve parameter estimation problem by calling theta_est