TempGP

The Gaussian process-based power curve that avoids temporal overfitting.

from dwse import TempGP
model = TempGP()
model.fit(X_train, y_train)
prediction = model.predict(X_test)
model.update(X_update, y_update)
prediction = model.predict(X_test_new)
class dswe.tempGP.TempGP(opt_method='L-BFGS-B', limit_memory=5000, fast_computation=True, optim_control={'batch_size': 100, 'beta1': 0.9, 'beta2': 0.999, 'epsilon': 1e-08, 'learning_rate': 0.05, 'logfile': None, 'max_iter': 5000, 'tol': 1e-06})[source]
Parameters
  • opt_method (string) – Type of solver. The best working solver are [‘L-BFGS-B’, ‘BFGS’]. Default value is ‘L-BFGS-B’.

  • limit_memory (int or None) – The integer is used as sample training points during prediction to limit the total memory requirement. Setting the value to None would result in no sampling, that is, full training data is used for prediction. Default value is 5000.

  • fast_computation (bool) – A boolean (True/False) that specifies whether to do exact inference or fast approximation. Default is True.

  • optim_control (dict) –

    A dictionary of parameters passed to the Adam optimizer when fast_computation is set to True. The default values have been tested rigorously and tend to strike a balance between accuracy and speed.

    • batch_size: Number of training points sampled at each iteration of Adam. Default value is 100.

    • learning_rate: The step size for the Adam optimizer. Default value is 0.05.

    • max_iter: The maximum number of iterations to be performed by Adam. Default value is 5000.

    • tol: Gradient tolerance. Default value is 1e-6.

    • beta1: Decay rate for the first moment of the gradient. Default value is 0.9.

    • beta2: Decay rate for the second moment of the gradient. Default value is 0.999.

    • epsilon: A small number to avoid division by zero. Default value is 1e-8.

    • logfile: A string specifying a file name to store hyperparameters value for each iteration. Default value is None.

fit(X_train, y_train, T_train=[])[source]

Fit the TempGP from the training dataset.

Parameters
  • X_train (np.ndarray or pd.DataFrame) – A matrix or dataframe of input variable values in the training dataset.

  • y_train (np.array) – A numeric array for response values in the training dataset.

  • T_train (np.array) – A temporal array for time indices of the data points. By default, the function assigns natural numbers starting from 1 as the time indices.

Returns

self with trained parameters.

  • thinning_number: the thinning number computed by the algorithm.

  • model_F: A dictionary containing details of the model for predicting function f(x).
    • ’X_train’ is the input variable matrix for computing the cross-covariance for predictions, same as X_train unless the model is updated. See TempGP.update() method for details on updating the model.

    • ’y_train’ is the response vector, again same as y_train unless the model is updated.

    • ’weighted_y’ is the weighted response, that is, the response left multiplied by the inverse of the covariance matrix.

  • model_G: A dictionary containing details of the model for predicting function g(t).
    • ’residuals’ is the residuals after subtracting function f(x) from the response. Used to predict g(t). See TempGP.update() method for updating the residuals.

    • ’T_train’ is the time indices of the residuals, same as T_train.

  • optim_result: A dictionary containing optimized values of model f(x).
    • ’estimated_params’ is estimated hyperparameters for function f(x).

    • ’obj_val’ is objective value of the hyperparameter optimization for f(x).

    • ’grad_val’ is gradient vector at the optimal objective value.

Return type

TempGP

predict(X_test, T_test=[])[source]

Predict the target for the provided data.

Parameters
  • X_test (np.ndarray or pd.DataFrame) – A matrix or dataframe of test input variable values to compute predictions.

  • T_test (np.array) – Temporal values of test data points.

Returns

Predicted target values.

Return type

np.array

update(X_update, y_update, T_update=[], replace=True, update_model_F=False)[source]

Update the model when new training dataset will arrive.

Parameters
  • X_update (np.ndarray or pd.DataFrame) – A matrix or dataframe of input variable values in the new added dataset.

  • y_update (np.array) – A numeric array for response values in the new added dataset.

  • T_update (np.array) – A temporal array for time indices of the data points. By default, the function assigns natural numbers starting from 1 as the time indices.

  • replace (bool) – A boolean to specify whether to replace the old data with the new one, or to add the new data while still keeping all the old data. Default is True, which replaces the top m rows from the old data, where m is the number of data points in the new data.

  • update_model_F (bool) – A boolean to specify whether to update model_F as well. If the original TempGP model is trained on a sufficiently large dataset (say one year), updating model_F regularly may not result in any significant improvement, but can be computationally expensive.

Returns

self with updated trained parameter values.

Return type

TempGP

Reference

Prakash, Tuo, and Ding, 2022, “The temporal overfitting problem with applications in wind power curve modeling,” Technometrics, accepted.