pyoe.dataloaders package

Submodules

pyoe.dataloaders.base module

class pyoe.dataloaders.base.BaseDataloader(dataset_name: str, data_dir: str = './data/', reload: bool = False)

Bases: Dataset

For datasets in OEBench, dataset_name in ``[‘dataset_experiment_info/allstate_claims_severity’,

‘dataset_experiment_info/bike_sharing_demand’, ‘dataset_experiment_info/rssi’, ‘dataset_experiment_info/noaa’, ‘dataset_experiment_info/KDDCUP99’, ‘dataset_experiment_info/electricity_prices’, ‘dataset_experiment_info/tetouan’, ‘dataset_experiment_info/beijing_multisite/wanliu’, ‘dataset_experiment_info/beijing_multisite/wanshouxingong’, ‘dataset_experiment_info/beijing_multisite/gucheng’, ‘dataset_experiment_info/beijing_multisite/huairou’, ‘dataset_experiment_info/beijing_multisite/nongzhanguan’, ‘dataset_experiment_info/beijing_multisite/changping’, ‘dataset_experiment_info/beijing_multisite/dingling’, ‘dataset_experiment_info/beijing_multisite/aotizhongxin’, ‘dataset_experiment_info/beijing_multisite/dongsi’, ‘dataset_experiment_info/beijing_multisite/shunyi’, ‘dataset_experiment_info/beijing_multisite/guanyuan’, ‘dataset_experiment_info/beijing_multisite/tiantan’, ‘dataset_experiment_info/weather_indian_cities/bangalore’, ‘dataset_experiment_info/weather_indian_cities/lucknow’, ‘dataset_experiment_info/weather_indian_cities/mumbai’, ‘dataset_experiment_info/weather_indian_cities/Rajasthan’, ‘dataset_experiment_info/weather_indian_cities/Bhubhneshwar’, ‘dataset_experiment_info/weather_indian_cities/delhi’, ‘dataset_experiment_info/weather_indian_cities/chennai’, ‘dataset_experiment_info/insects/abrupt_imbalanced’, ‘dataset_experiment_info/insects/out-of-control’, ‘dataset_experiment_info/insects/incremental_imbalanced’, ‘dataset_experiment_info/insects/incremental_reoccurring_balanced’, ‘dataset_experiment_info/insects/incremental_balanced’, ‘dataset_experiment_info/insects/incremental_abrupt_balanced’, ‘dataset_experiment_info/insects/gradual_imbalanced’, ‘dataset_experiment_info/insects/abrupt_balanced’, ‘dataset_experiment_info/insects/incremental_abrupt_imbalanced’, ‘dataset_experiment_info/insects/incremental_reoccurring_imbalanced’, ‘dataset_experiment_info/insects/gradual_balanced’, ‘dataset_experiment_info/italian_city_airquality’, ‘dataset_experiment_info/taxi_ride_duration’, ‘dataset_experiment_info/room_occupancy’, ‘dataset_experiment_info/bitcoin’, ‘dataset_experiment_info/airlines’, ‘dataset_experiment_info/traffic_volumn’, ‘dataset_experiment_info/news_popularity’, ‘dataset_experiment_info/beijingPM2.5’, ‘dataset_experiment_info/energy_prediction’, ‘dataset_experiment_info/household’, ‘dataset_experiment_info/election’, ‘dataset_experiment_info/covtype’, ‘dataset_experiment_info/safe_driver’, ‘dataset_experiment_info/5cities/shenyang’, ‘dataset_experiment_info/5cities/guangzhou’, ‘dataset_experiment_info/5cities/beijing’, ‘dataset_experiment_info/5cities/shanghai’, ‘dataset_experiment_info/5cities/chengdu’]``

for datasets in METER (outlier detection task with provided ground-truth), dataset_name in ``[‘OD_datasets/NSL’,

‘OD_datasets/AT’, ‘OD_datasets/CPU’, ‘OD_datasets/MT’, ‘OD_datasets/NYC’, ‘OD_datasets/INSECTS_Abr’, ‘OD_datasets/INSECTS_Incr’, ‘OD_datasets/INSECTS_IncrGrd’, ‘OD_datasets/INSECTS_IncrRecr’, ‘OD_datasets/ionosphere’, ‘OD_datasets/mammography’, ‘OD_datasets/pima’, ‘OD_datasets/satellite’]``

get_data() Tensor | DataFrame

Return the data in the dataset. pd.DataFrame for time series data, torch.Tensor for others.

Returns:

the data in the dataset.

Return type:

out (torch.Tensor | pd.DataFrame)

static get_meter_dataset() list[str]
get_num_columns() int

Return the number of columns in the dataset.

Returns:

the number of columns in the dataset.

Return type:

int

get_num_samples() int

Return the number of samples in the dataset.

Returns:

the number of samples in the dataset.

Return type:

int

static get_oebench_datasets() list[str]
static get_oebench_representative_dataset() list[str]
get_output_dim() int

Return the output dimension of the dataset.

Returns:

the output dimension of the dataset.

Return type:

int

get_target() Tensor | DataFrame

Return the target in the dataset. pd.DataFrame for time series data, torch.Tensor for others.

Returns:

the target in the dataset.

Return type:

out (torch.Tensor | pd.DataFrame)

get_task() str

Return the task of the dataset.

Returns:

the task of the dataset.

Return type:

str

class pyoe.dataloaders.base.Dataloader(dataset_name: str, data_dir: str = './data/', reload: bool = False)

Bases: BaseDataloader

This class is used to load the dataset from local files. For non-time-series data only, the data is stored in a torch tensor.

get_outlier_ratio() float

Return the outlier ratio for the dataset.

Returns:

the outlier ratio for the dataset.

Return type:

float

class pyoe.dataloaders.base.DataloaderWrapper(dataset: Dataloader, return_outlier_label=False)

Bases: Dataset

This class is a wrapper for the dataset. It will call the dataset to get the data and target.

class pyoe.dataloaders.base.TimeSeriesDataloader(dataset_name: str, data_dir: str = './data/', predicted_label: str = '1. open', reload: bool = False)

Bases: BaseDataloader

This class is used to load the time series dataset from local files. For time-series data only, the data is stored in a pandas dataframe.

pyoe.dataloaders.pipeline module

pyoe.dataloaders.pipeline.load_data(dataset_path: str, prefix: str = '', reload: bool = False)

Load the data and return the target data, data before one hot encoding, data after one hot encoding, window size, output dimension, data one hot, and task.

Parameters:
  • dataset_path (str) – the path of the dataset folder.

  • prefix (str) – the prefix of the dataset path.

Returns:

the target data without null values. data_before_onehot (pd.DataFrame): the data before one hot encoding. data_one_hot (pd.DataFrame): the data after one hot encoding. data_onehot_nonnull (pd.DataFrame): the data after one hot encoding without null values. window_size (int): the window size. output_dim (int): the output dimension. task (str): the task of the data.

Return type:

target_data_nonnull (pd.DataFrame)

Module contents