pyoe.dataloaders package¶
Submodules¶
pyoe.dataloaders.base module¶
- class pyoe.dataloaders.base.BaseDataloader(dataset_name: str, data_dir: str = './data/', reload: bool = False)¶
Bases:
Dataset
For datasets in OEBench, dataset_name in ``[‘dataset_experiment_info/allstate_claims_severity’,
‘dataset_experiment_info/bike_sharing_demand’, ‘dataset_experiment_info/rssi’, ‘dataset_experiment_info/noaa’, ‘dataset_experiment_info/KDDCUP99’, ‘dataset_experiment_info/electricity_prices’, ‘dataset_experiment_info/tetouan’, ‘dataset_experiment_info/beijing_multisite/wanliu’, ‘dataset_experiment_info/beijing_multisite/wanshouxingong’, ‘dataset_experiment_info/beijing_multisite/gucheng’, ‘dataset_experiment_info/beijing_multisite/huairou’, ‘dataset_experiment_info/beijing_multisite/nongzhanguan’, ‘dataset_experiment_info/beijing_multisite/changping’, ‘dataset_experiment_info/beijing_multisite/dingling’, ‘dataset_experiment_info/beijing_multisite/aotizhongxin’, ‘dataset_experiment_info/beijing_multisite/dongsi’, ‘dataset_experiment_info/beijing_multisite/shunyi’, ‘dataset_experiment_info/beijing_multisite/guanyuan’, ‘dataset_experiment_info/beijing_multisite/tiantan’, ‘dataset_experiment_info/weather_indian_cities/bangalore’, ‘dataset_experiment_info/weather_indian_cities/lucknow’, ‘dataset_experiment_info/weather_indian_cities/mumbai’, ‘dataset_experiment_info/weather_indian_cities/Rajasthan’, ‘dataset_experiment_info/weather_indian_cities/Bhubhneshwar’, ‘dataset_experiment_info/weather_indian_cities/delhi’, ‘dataset_experiment_info/weather_indian_cities/chennai’, ‘dataset_experiment_info/insects/abrupt_imbalanced’, ‘dataset_experiment_info/insects/out-of-control’, ‘dataset_experiment_info/insects/incremental_imbalanced’, ‘dataset_experiment_info/insects/incremental_reoccurring_balanced’, ‘dataset_experiment_info/insects/incremental_balanced’, ‘dataset_experiment_info/insects/incremental_abrupt_balanced’, ‘dataset_experiment_info/insects/gradual_imbalanced’, ‘dataset_experiment_info/insects/abrupt_balanced’, ‘dataset_experiment_info/insects/incremental_abrupt_imbalanced’, ‘dataset_experiment_info/insects/incremental_reoccurring_imbalanced’, ‘dataset_experiment_info/insects/gradual_balanced’, ‘dataset_experiment_info/italian_city_airquality’, ‘dataset_experiment_info/taxi_ride_duration’, ‘dataset_experiment_info/room_occupancy’, ‘dataset_experiment_info/bitcoin’, ‘dataset_experiment_info/airlines’, ‘dataset_experiment_info/traffic_volumn’, ‘dataset_experiment_info/news_popularity’, ‘dataset_experiment_info/beijingPM2.5’, ‘dataset_experiment_info/energy_prediction’, ‘dataset_experiment_info/household’, ‘dataset_experiment_info/election’, ‘dataset_experiment_info/covtype’, ‘dataset_experiment_info/safe_driver’, ‘dataset_experiment_info/5cities/shenyang’, ‘dataset_experiment_info/5cities/guangzhou’, ‘dataset_experiment_info/5cities/beijing’, ‘dataset_experiment_info/5cities/shanghai’, ‘dataset_experiment_info/5cities/chengdu’]``
for datasets in METER (outlier detection task with provided ground-truth), dataset_name in ``[‘OD_datasets/NSL’,
‘OD_datasets/AT’, ‘OD_datasets/CPU’, ‘OD_datasets/MT’, ‘OD_datasets/NYC’, ‘OD_datasets/INSECTS_Abr’, ‘OD_datasets/INSECTS_Incr’, ‘OD_datasets/INSECTS_IncrGrd’, ‘OD_datasets/INSECTS_IncrRecr’, ‘OD_datasets/ionosphere’, ‘OD_datasets/mammography’, ‘OD_datasets/pima’, ‘OD_datasets/satellite’]``
- get_data() Tensor | DataFrame ¶
Return the data in the dataset. pd.DataFrame for time series data, torch.Tensor for others.
- Returns:
the data in the dataset.
- Return type:
out (torch.Tensor | pd.DataFrame)
- static get_meter_dataset() list[str] ¶
- get_num_columns() int ¶
Return the number of columns in the dataset.
- Returns:
the number of columns in the dataset.
- Return type:
int
- get_num_samples() int ¶
Return the number of samples in the dataset.
- Returns:
the number of samples in the dataset.
- Return type:
int
- static get_oebench_datasets() list[str] ¶
- static get_oebench_representative_dataset() list[str] ¶
- get_output_dim() int ¶
Return the output dimension of the dataset.
- Returns:
the output dimension of the dataset.
- Return type:
int
- get_target() Tensor | DataFrame ¶
Return the target in the dataset. pd.DataFrame for time series data, torch.Tensor for others.
- Returns:
the target in the dataset.
- Return type:
out (torch.Tensor | pd.DataFrame)
- get_task() str ¶
Return the task of the dataset.
- Returns:
the task of the dataset.
- Return type:
str
- class pyoe.dataloaders.base.Dataloader(dataset_name: str, data_dir: str = './data/', reload: bool = False)¶
Bases:
BaseDataloader
This class is used to load the dataset from local files. For non-time-series data only, the data is stored in a torch tensor.
- get_outlier_ratio() float ¶
Return the outlier ratio for the dataset.
- Returns:
the outlier ratio for the dataset.
- Return type:
float
- class pyoe.dataloaders.base.DataloaderWrapper(dataset: Dataloader, return_outlier_label=False)¶
Bases:
Dataset
This class is a wrapper for the dataset. It will call the dataset to get the data and target.
- class pyoe.dataloaders.base.TimeSeriesDataloader(dataset_name: str, data_dir: str = './data/', predicted_label: str = '1. open', reload: bool = False)¶
Bases:
BaseDataloader
This class is used to load the time series dataset from local files. For time-series data only, the data is stored in a pandas dataframe.
pyoe.dataloaders.pipeline module¶
- pyoe.dataloaders.pipeline.load_data(dataset_path: str, prefix: str = '', reload: bool = False)¶
Load the data and return the target data, data before one hot encoding, data after one hot encoding, window size, output dimension, data one hot, and task.
- Parameters:
dataset_path (str) – the path of the dataset folder.
prefix (str) – the prefix of the dataset path.
- Returns:
the target data without null values. data_before_onehot (pd.DataFrame): the data before one hot encoding. data_one_hot (pd.DataFrame): the data after one hot encoding. data_onehot_nonnull (pd.DataFrame): the data after one hot encoding without null values. window_size (int): the window size. output_dim (int): the output dimension. task (str): the task of the data.
- Return type:
target_data_nonnull (pd.DataFrame)