pyoe.OEBench package

Subpackages

Submodules

pyoe.OEBench.arf module

class pyoe.OEBench.arf.ARFHoeffdingTree(m, delta_w, delta_d, grace_period=50, leaf_prediction='nb', no_pre_prune=True)

Bases: HoeffdingTreeClassifier

ARFHoeffding Tree A Hoeffding tree is an incremental, anytime decision tree induction algorithm that is capable of learning from massive data streams, assuming that the distribution generating examples does not change over time. Hoeffding trees exploit the fact that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound, which quantifies the number of observations (in our case, examples) needed to estimate some statistics within a prescribed precision (in our case, the goodness of an attribute). A theoretically appealing feature of Hoeffding Trees not shared by other incremental decision tree learners is that it has sound guarantees of performance. Using the Hoeffding bound one can show that its output is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. ARFHoeffding tree is based on Hoeffding tree and it has two main differences. Whenever a new node is created, a subset of m random attributes is chosen and split attempts are limited to that subset. Second difference is that there is no early tree prunning.

See for details: G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD’01, pages 97–106, San Francisco, CA, 2001. ACM Press. Implementation based on: Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer (2010); MOA: Massive Online Analysis; Journal of Machine Learning Research 11: 1601-1604

Parameters:
  • m (Int) – Number of random attributes for split on each node

  • grace_period (Int) – The number of instances a leaf should observe between split attempts.

  • delta_w (float) – Warning threshold of change detection for ADWIN change detector

  • delta_d (float) – Change threshold of change detection for ADWIN change detector

  • no_pre_prune (Boolean) – If True, disable pre-pruning. Default: True

  • leaf_prediction (String) – Prediction mechanism used at leafs. ‘mc’ - Majority Class ‘nb’ - Naive Bayes ‘nba’ - Naive BAyes Adaptive

  • Tree (Other attributes for Hoeffding)

  • HoeffdingTree.max_byte_size (Int) – Maximum memory consumed by the tree.

  • HoeffdingTree.memory_estimate_period (Int) – How many instances between memory consumption checks.

  • HoeffdingTree.split_criterion (String) – Split criterion to use. ‘gini’ - Gini ‘info_gain’ - Information Gain

  • HoeffdingTree.split_confidence (Float) – Allowed error in split decision, a value closer to 0 takes longer to decide.

  • HoeffdingTree.tie_threshold (Float) – Threshold below which a split will be forced to break ties.

  • HoeffdingTree.binary_split (Boolean) – If True only allow binary splits.

  • HoeffdingTree.stop_mem_management (Boolean) – If True, stop growing as soon as memory limit is hit.

  • HoeffdingTree.remove_poor_atts (Boolean) – If True, disable poor attributes.

  • HoeffdingTree.nb_threshold (Int) – The number of instances a leaf should observe before permitting Naive Bayes.

  • HoeffdingTree.nominal_attributes (List) – List of Nominal attributes

static is_randomizable()
rf_tree_train(X, y)

This function calculates Poisson(6) and assigns this as a weight of instance. If Poisson(6) returns zero, it doesn’t use this instance for training. :param X: Array

Input vector

Parameters:

y – Array True value of class for X

class pyoe.OEBench.arf.AdaptiveRandomForest(nb_features=5, nb_trees=100, predict_method='mc', pretrain_size=1000, delta_w=0.01, delta_d=0.001)

Bases: object

AdaptiveRandomForest or ARF An Adaptive Random Forest is a classification algorithm that want to make Random Forest, which is not a stream algorithm, be again among the best classifier in streaming In this code you will find the implementation of the ARF described on :

Adaptive random forests for evolving data stream classification Heitor M. Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabricio Enembreck, Bernhard Pfharinger, Geoff Holmes, Talel Abdessalem

Parameters:
  • nb_features (Int) – The number of features a leaf should observe.

  • nb_trees (Int) – The number of trees that the forest should contain

  • predict_method (String) – Prediction method: either Majority Classifier “mc”, Average “avg”

create_tree()

Create a ARF Hoeffding tree :return: a tree :rtype: ARFHoeffdingTree

create_trees()

Create nb_trees, trees :return: a dictionnary of trees :rtype: Dictionnary

init_weights()

Init weight of the trees. Weight is 1 per default :return: a dictionnary of weight, where each weight is associated to 1 ARF Hoeffding Tree :rtype: Dictionnary

learning_performance(idx, y_predicted, y)

Compute the learning performance of one tree at the index “idx” :param idx: index of the tree in the dictionnary :type idx: Int :param y_predicted: Prediction result :type y_predicted: Int :param y: The real y, from the training :type y: Int :return: / :rtype: /

partial_fit(X, y, classes=None)

Partial fit over X and y arrays :param X: Features :type X: Numpy.ndarray of shape (n_samples, n_features) :param y: Classes :type y: Vector :return: :rtype:

predict(X)

Predicts the label of the X instance(s) :param X: :type X: Numpy.ndarray of shape (n_samples, n_features) :param All the samples we want to predict the label for.:

Returns:

  • list

  • A list containing the predicted labels for all instances in X.

pyoe.OEBench.armnet module

class pyoe.OEBench.armnet.ARMNetModel(nfield: int, nfeat: int, nemb: int, nhead: int, alpha: float, nhid: int, mlp_nlayer: int, mlp_nhid: int, dropout: float, ensemble: bool, deep_nlayer: int, deep_nhid: int, noutput: int = 1)

Bases: Module

Model: Adaptive Relation Modeling Network (Multi-Head) Important Hyper-Params: alpha (sparsity), nhead (attention heads), nhid (exponential neurons)

feature_extractor(x)
Parameters:

x – {‘id’: [bsz, nfield], LongTensor, ‘value’: [bsz, nfield], FloatTensor}

Returns:

hidden-layer feature

forward(x)
Parameters:

x – {‘id’: [bsz, nfield], LongTensor, ‘value’: [bsz, nfield], FloatTensor}

Returns:

y: [bsz], FloatTensor of size B, for Regression or Classification

class pyoe.OEBench.armnet.SparseAttLayer(nhead: int, nfield: int, nemb: int, d_k: int, nhid: int, alpha: float = 1.5)

Bases: Module

forward(x)
Parameters:

x – [bsz, nfield, nemb], FloatTensor

Returns:

Att_weights [bsz, nhid, nfield], FloatTensor

reset_parameters() None

pyoe.OEBench.cluster module

pyoe.OEBench.dataset_selection module

pyoe.OEBench.entmax module

Bisection implementation of alpha-entmax (Peters et al., 2019). Backward pass wrt alpha per (Correia et al., 2019). See https://arxiv.org/pdf/1905.05702 for detailed description.

class pyoe.OEBench.entmax.EntmaxBisect(alpha=1.5, dim=-1, n_iter=50)

Bases: Module

forward(X)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pyoe.OEBench.entmax.EntmaxBisectFunction(*args, **kwargs)

Bases: Function

classmethod backward(ctx, dY)

Define a formula for differentiating the operation with backward mode automatic differentiation.

This function is to be overridden by all subclasses. (Defining this function is equivalent to defining the vjp function.)

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computed w.r.t. the output.

classmethod forward(ctx, X, alpha=1.5, dim=-1, n_iter=50, ensure_sum_one=True)

Define the forward of the custom autograd Function.

This function is to be overridden by all subclasses. There are two ways to define forward:

Usage 1 (Combined forward and ctx):

@staticmethod
def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any:
    pass
  • It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

  • See combining-forward-context for more details

Usage 2 (Separate forward and ctx):

@staticmethod
def forward(*args: Any, **kwargs: Any) -> Any:
    pass

@staticmethod
def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
    pass
  • The forward no longer accepts a ctx argument.

  • Instead, you must also override the torch.autograd.Function.setup_context() staticmethod to handle setting up the ctx object. output is the output of the forward, inputs are a Tuple of inputs to the forward.

  • See extending-autograd for more details

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

class pyoe.OEBench.entmax.SparsemaxBisect(dim=-1, n_iter=None)

Bases: Module

forward(X)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pyoe.OEBench.entmax.SparsemaxBisectFunction(*args, **kwargs)

Bases: EntmaxBisectFunction

classmethod backward(ctx, dY)

Define a formula for differentiating the operation with backward mode automatic differentiation.

This function is to be overridden by all subclasses. (Defining this function is equivalent to defining the vjp function.)

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computed w.r.t. the output.

classmethod forward(ctx, X, dim=-1, n_iter=50, ensure_sum_one=True)

Define the forward of the custom autograd Function.

This function is to be overridden by all subclasses. There are two ways to define forward:

Usage 1 (Combined forward and ctx):

@staticmethod
def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any:
    pass
  • It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

  • See combining-forward-context for more details

Usage 2 (Separate forward and ctx):

@staticmethod
def forward(*args: Any, **kwargs: Any) -> Any:
    pass

@staticmethod
def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
    pass
  • The forward no longer accepts a ctx argument.

  • Instead, you must also override the torch.autograd.Function.setup_context() staticmethod to handle setting up the ctx object. output is the output of the forward, inputs are a Tuple of inputs to the forward.

  • See extending-autograd for more details

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

pyoe.OEBench.entmax.entmax_bisect(X, alpha=1.5, dim=-1, n_iter=50, ensure_sum_one=True)

alpha-entmax: normalizing sparse transform (a la softmax).

Solves the optimization problem:

max_p <x, p> - H_a(p) s.t. p >= 0, sum(p) == 1.

where H_a(p) is the Tsallis alpha-entropy with custom alpha >= 1, using a bisection (root finding, binary search) algorithm.

This function is differentiable with respect to both X and alpha.

Parameters:
  • X (torch.Tensor) – The input tensor.

  • alpha (float or torch.Tensor) – Tensor of alpha parameters (> 1) to use. If scalar or python float, the same value is used for all rows, otherwise, it must have shape (or be expandable to) alpha.shape[j] == (X.shape[j] if j != dim else 1) A value of alpha=2 corresponds to sparsemax, and alpha=1 corresponds to softmax (but computing it this way is likely unstable).

  • dim (int) – The dimension along which to apply alpha-entmax.

  • n_iter (int) – Number of bisection iterations. For float32, 24 iterations should suffice for machine precision.

  • ensure_sum_one (bool,) – Whether to divide the result by its sum. If false, the result might sum to close but not exactly 1, which might cause downstream problems.

Returns:

P – The projection result, such that P.sum(dim=dim) == 1 elementwise.

Return type:

torch tensor, same shape as X

pyoe.OEBench.entmax.sparsemax_bisect(X, dim=-1, n_iter=50, ensure_sum_one=True)

sparsemax: normalizing sparse transform (a la softmax), via bisection.

Solves the projection:

min_p ||x - p||_2 s.t. p >= 0, sum(p) == 1.

Parameters:
  • X (torch.Tensor) – The input tensor.

  • dim (int) – The dimension along which to apply sparsemax.

  • n_iter (int) – Number of bisection iterations. For float32, 24 iterations should suffice for machine precision.

  • ensure_sum_one (bool,) – Whether to divide the result by its sum. If false, the result might sum to close but not exactly 1, which might cause downstream problems.

  • Note (This function does not yet support normalizing along anything except)

  • more (the last dimension. Please use transposing and views to achieve)

  • behavior. (general)

Returns:

P – The projection result, such that P.sum(dim=dim) == 1 elementwise.

Return type:

torch tensor, same shape as X

pyoe.OEBench.ewc module

class pyoe.OEBench.ewc.EWC(model, input, target, ids, task)

Bases: object

penalty(model: Module)
pyoe.OEBench.ewc.ewc_train(model: ~torch.nn.modules.module.Module, optimizer: <module 'torch.optim' from '/home/hydroiodic/anaconda3/envs/pyoe/lib/python3.10/site-packages/torch/optim/__init__.py'>, data_loader: ~torch.utils.data.dataloader.DataLoader, ewc: ~pyoe.OEBench.ewc.EWC, importance: float)
pyoe.OEBench.ewc.normal_train(model: ~torch.nn.modules.module.Module, optimizer: <module 'torch.optim' from '/home/hydroiodic/anaconda3/envs/pyoe/lib/python3.10/site-packages/torch/optim/__init__.py'>, data_loader: ~torch.utils.data.dataloader.DataLoader)
pyoe.OEBench.ewc.test(model: Module, data_loader: DataLoader)
pyoe.OEBench.ewc.variable(t: Tensor, use_cuda=False, **kwargs)

pyoe.OEBench.experiments module

pyoe.OEBench.layers module

class pyoe.OEBench.layers.Embedding(nfeat, nemb)

Bases: Module

forward(x)
Parameters:

x – {‘id’: LongTensor B*F, ‘value’: FloatTensor B*F}

Returns:

embeddings B*F*E

class pyoe.OEBench.layers.FactorizationMachine(reduce_dim=True)

Bases: Module

forward(x)
Parameters:

x – FloatTensor B*F*E

class pyoe.OEBench.layers.Linear(nfeat)

Bases: Module

forward(x)
Parameters:

x – {‘id’: LongTensor B*F, ‘value’: FloatTensor B*F}

Returns:

linear transform of x

class pyoe.OEBench.layers.MLP(ninput, nlayers, nhid, dropout, noutput=1)

Bases: Module

forward(x)
Parameters:

x – FloatTensor B*ninput

Returns:

FloatTensor B*nouput

class pyoe.OEBench.layers.MultiHeadAttention(nhead, ninput, n_k, n_v, dropout=0.0)

Bases: Module

Multi-head Attention Module

forward(x, mask=None)
Parameters:
  • x – B*F*E

  • mask – B*F*F

Returns:

B*F*E

class pyoe.OEBench.layers.SelfAttnLayer(nemb)

Bases: Module

forward(x)
Parameters:

x – B*F*E

Returns:

B*F*E

pyoe.OEBench.layers.get_all_indices(n)

get all the row, col indices for an (n, n) array

pyoe.OEBench.layers.get_triu_indices(n, diag_offset=1)

get the row, col indices for the upper-triangle of an (n, n) array

pyoe.OEBench.layers.normalize_adj(adj)

normalize and return a adjacency matrix (numpy array)

class pyoe.OEBench.layers.scaled_dot_prodct_attention_(temperature, attn_dropout=0.0)

Bases: Module

Scaled Dot-Product Attention

forward(q, k, v, mask=None)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

pyoe.OEBench.model module

class pyoe.OEBench.model.FcNet(input_dim, hidden_dims, output_dim, dropout_p=0.0)

Bases: Module

Fully connected network for MNIST classification

feature_extractor(x)
forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

pyoe.OEBench.outliers module

pyoe.OEBench.pipeline module

pyoe.OEBench.stream_cluster module

Module contents