ComparePCurve

Power curve comparison and uncertainity quantification.

from dwse import ComparePCurve
model = ComparePCurve(Xlist, ylist, testcol)
diff = model.compute_weighted_difference(weights)

class dswe.comparePCurve.ComparePCurve(Xlist, ylist, testcol, testset=None, circ_pos=None, thresh=0.2, conf_level=0.95, grid_size=[50, 50], power_bins=15, baseline=1, limit_memory=True, opt_method='L-BFGS-B', sample_size={'band_size': 5000, 'optim_size': 500}, rng_seed=1)[source]

Parameters

Xlist (list) – A list, consisting of data sets to match, also each of the individual data set can be a matrix with each column corresponding to one input variable.
ylist (list) – A list, consisting of data sets to match, and each list is an array that corresponds to target values of the data sets.
testcol (list) – A list stating column number of covariates to used in generating test set. Maximum of two columns to be used.
testset (np.array) – Test points at which the functions will be compared. Default is set to None, means calculate at the runtime.
circ_pos (list or int) – A list or array stating the column position of circular variables. An integer when only one circular variable present. Default value is None.
thresh (float or list) – A numerical or a list of threshold values for each covariates, against which matching happens. It should be a single value or a list of values representing threshold for each of the covariate. Default value is 0.2.
conf_level (float) – A single value representing the statistical significance level for constructing the band. Default value is 0.95.
grid_size (list) – A list or numpy array to be used in constructing test set, should be provided when testset is None, else it is ignored. Default is [50,50] for 2-dim input which is converted internally to a default of [1000] for 1-dim input. Total number of test points (product of grid_size elements components) must be less than or equal to 2500.
power_bins (int) – A integer stating the number of power bins for computing the scaled difference. Default value is 15.
bseline (int) – An integer between 0 to 2, where 1 indicates to use power curve of first dataset as the base for metric calculation, 2 indicates to use the power curve of second dataset as the base, and 0 indicates to use the average of both power curves as the base. Default is set to 1.
limit_memory (bool) – A boolean (True/False) indicating whether to limit the memory use or not. Default is True. If set to True, 5000 datapoints are randomly sampled from each dataset under comparison for inference.
opt_method (string) – A string specifying the optimization method to be used for hyperparameter estimation. The best working solver are [‘L-BFGS-B’, ‘BFGS’]. Default is set to ‘L-BFGS-B’.
sample_size (dict) – A dictionary with two keys: optim_size and band_size, denoting the sample size for each dataset for hyperparameter optimization and confidence band computation, respectively, when limit_memory = TRUE. Default value is list(optim_size = 500,band_size = 5000).
rng_seed (int) – Random number genrator (rng) seed for sampling data when limit_memory = TRUE. Default value is 1.

Returns

self with trained parameters.

weighted_diff: a numeric, % difference between the functions weighted using the density of the covariates.
weighted_stat_diff: a numeric, % statistically significant difference between the functions weighted using the density of the covariates.
scaled_diff: a numeric, % difference between the functions scaled to the orginal data.
scaled_stat_diff: a numeric, % statistically significant difference between the functions scaled to the orginal data.
unweighted_diff: a numeric, % difference between the functions unweighted.
unweighted_stat_diff: a numeric, % statistically significant difference between the functions unweighted.
reduction_ratio: a list consisting of shrinkage ratio of features used in testset.
mu1: An array of test prediction for first data set.
mu2: An array of test prediction for second data set.
mu_diff: An array of pointwise difference between the predictions from the two datasets (mu2-mu1).
band: An array of the allowed statistical difference between functions at testpoints in testset.
conf_level: A numeric representing the statistical significance level for constructing the band.
estimated_params: A list of estimated hyperparameters for GP.
testset: an array/matrix of the test points either provided by user, or generated internally.
matched_data_X: a list of features of two matched datasets as generated by covariate matching.
matched_data_y: a list of target of two matched datasets as generated by covariate matching.

Return type

ComparePCurve

compute_weighted_difference(weights, baseline=1, stat_diff=False)[source]

Computes percentage weighted difference between power curves based on user provided weights instead of the weights computed from the data.

Parameters

weights (list) – a list of user specified weights for each element of mu_diff. It can be based on any probability distribution of user choice. The weights must sum to 1.
baseline (int) – An integer between 1 to 2, where 1 indicates to use mu1 predictions from the power curve and 2 indicates to use mu2 predictions from the power curve as obtained from ComparePCurve() function. The mu1 and mu2 corresponds to test prediction for first and second data set respectively. Default is set to 1.
stat_diff (bool) – a boolean (True/False) specifying whether to compute the statistical significant difference or not. Default is set to False, i.e. statistical significant difference is not computed. If set to true, band generated from ComparePCurve() function to be used.

Returns

numeric percentage weighted difference or statistical significant percetage weighted difference based on whether statDiff is set to False or True.

Return type

float

Reference

Ding, Kumar, Prakash, Kio, Liu, Liu, and Li, 2021, “A case study of space-time performance comparison of wind turbines on a wind farm,” Renewable Energy, Vol. 171, pp. 735-746.