All Models

realseries.models.AR module

Created on Sun Apr 19 22:34:04 2020

@author: zhihengzhang

class realseries.models.AR.AR(lag)

Bases: realseries.models.base.BaseModel

Parameters

lag – the time lag in AR model

X_train

training data in AR model.

Y_train

training label in AR model.

detect(X)
Parameters
  • X – time series(dimension is 1 or 2)

  • y – The default is None.

Returns

the fitting result of training data.

Return type

detection

fit(X, y=None)
Parameters
  • X – time series(dimension is 1 or 2)

  • y – The default is None.

Returns

None.

realseries.models.DWGC module

Created on Fri Apr 17 11:57:41 2020

@author: zhihengzhang

class realseries.models.DWGC.DWGC(win_len, model, index_lr, method, count, train_rate)

Bases: realseries.models.base.BaseModel

Dynamic-window-level Granger Causality method, try to find the window-level causality on each channel pair.

Args:

win_len: window length model:AR or NAR index_lr: leanring rate of causal-indexing coefficients method: option of fitting method, ‘NAR’/’AR’

Attributes:

causal_index : causal-indexing coefficients single_error1/single_error2 : the fitting error without other channel’s dimension double_error: the fitting error with other channel’s dimension

detect(X)
Parameters

X – the pair of time series

Returns

the causality on window-level

Return type

Ftest_win

fit(X, y=None)

Fit the model

Parameters
  • x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

  • y (ndarray, optional) – Ignored. Defaults to None.

realseries.models.GC module

Created on Thu Apr 16 11:07:05 2020

@author: zhihengzhang

class realseries.models.GC.GC(win_len, model, method, train_rate)

Bases: realseries.models.base.BaseModel

Parameters
  • win_len – window length

  • model – ‘AR’ or ‘NAR-network’

  • method – option of fitting method, ‘NAR’/’AR’

-- single_error

the fitting error without other channel’s dimension

-- double_error

the fitting error with other channel’s dimension

detect(X)
Parameters

X – time series pair

Returns

window-level causality

Return type

Ftest_win

fit(X, y=None)

Fit the model

Parameters
  • x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

  • y (ndarray, optional) – Ignored. Defaults to None.

realseries.models.NAR module

Created on Tue Apr 14 16:36:45 2020

@author: studyzzh

class realseries.models.NAR.NAR_Network(inputnodes, hiddennodes, outputnodes, learningrate)

Bases: realseries.models.base.BaseModel

Parameters
  • inputnodes (--) – the number of nodes in input layer.

  • hiddennodes (--) – the number of nodes in hidden layer.

  • outputnodes (--) – the number of nodes in outout layer.

  • rate (-- learning) – the learning rate of NAR model.

-- fit_X

the fitting results on training data.

detect(X)
Parameters
  • X – the input time series in shape of (,1) or (,2)

  • y – The default is None.

Returns

the fitting errors on training data

Return type

output_errors

fit(X, y=None)
Parameters
  • X – the input time series in shape of (,1) or (,2)

  • y – The default is None.

realseries.models.base module

Base class for all time series analysis methods. It includes the methods like fit, detect and predict etc.

class realseries.models.base.BaseModel(contamination=0.1)

Bases: object

BaseModel class for all RealSeries predict/detect algorithms.

Parameters

contamination (float, optional) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.. Defaults to 0.1.

Raises

ValueError – Contamination must be in (0, 0.5].

abstract detect(x)

Predict using the trained detector.

Parameters

x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

Returns

Outlier labels of shape (n_length,).For each sample of time series, whether or not it is an outlier. 0 for inliers and 1 for outliers.

Return type

ndarray

abstract fit(x, y=None)

Fit the model

Parameters
  • x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

  • y (ndarray, optional) – Ignored. Defaults to None.

forecast(x, t)

Forecast the input.

Parameters
  • x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

  • t (int) – time index of to-be-forecast samples.

Returns

Forecast samples of shape (n_length, n_features)

Return type

X_1 (ndarray)

impute(x, t)

Impute the input data X at time index t.

Parameters
  • x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

  • t (int) – time index of to-be-forecast samples.

Returns

Impute samples of shape (n_length, n_features)

Return type

X_1 (ndarray)

load(path)

Load the model from path

Parameters

path (string) – model load path

save(path)

Save the model to path

Parameters

path (string) – model save path

realseries.models.iforest module

The implementation of isolation forest method based on sklearn.

class realseries.models.iforest.IForest(n_estimators=100, max_samples='auto', contamination='auto', max_features=1.0, bootstrap=False, n_jobs=1, random_state=None, verbose=0)

Bases: realseries.models.base.BaseModel

Isolation forest algorithm.

The IsolationForest ‘isolates’ observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

Parameters
  • n_estimators (int, optional) – The number of base estimators in the ensemble. Defaults to 100.

  • max_samples (int or float, optional) –

    The number of samples to draw from X to train each base estimator. Defaults to “auto”.

    • If int, then draw max_samples samples.

    • If float, then draw max_samples * X.shape[0] samples.

    • If “auto”, then max_samples=min(256, n_samples).

    If max_samples is larger than the number of samples provided, all samples will be used for all trees (no sampling).

  • contamination ('auto' or float, optional) – The amount of contamination of the data set. Defaults to ‘auto’.

  • max_features (int or float, optional) – The number of features to draw from X to train each base estimator. Defaults to 1.

  • bootstrap (bool, optional) – If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed. Defaults to False.

  • n_jobs (int, optional) – The number of jobs to run in parallel. Defaults to 1.

  • random_state (int, optional) – If RandomState instance, random_state is the random number generator. If None, the random number generator is the RandomState instance used by np.random. Defaults to 0.

  • verbose (int, optional) – Controls the verbosity of the tree building process. Defaults to None.

anomaly_score

Array of anomaly score.

IF

The isolation model.

estimators_

List of DecisionTreeClassifier.The collection of fitted sub-estimators.

estimators_samples_

List of arrays.The subset of drawn samples (i.e., the in-bag samples) for each base estimator.

max_samples_

The actual number of samples

detect(X)

Detect the test data by trained model.

Parameters

X (array_like) – The input sequence with shape (n_sample, n_features).

Returns

The predicted anomaly score.

Return type

ndarray

property estimators_

The collection of fitted sub-estimators. Decorator for scikit-learn Isolation Forest attributes.

property estimators_samples_

The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Decorator for scikit-learn Isolation Forest attributes.

fit(X, y=None)

Train the model.

Parameters
  • X (array_like) – The input sequence with shape (n_sample, n_features).

  • y (ndarray, optional) – The label. Defaults to None.

property max_samples_

The actual number of samples. Decorator for scikit-learn Isolation Forest attributes.

realseries.models.lstm_dynamic module

The lstm danamic threshold method is the implentation of paper ‘Detecting Spacecraft Anomalies Using LSTMs andNonparametric Dynamic Thresholding’

class realseries.models.lstm_dynamic.LSTM_dynamic(hidden_size=128, model_path='./model', dropout=0.3, lr=0.001, lstm_batch_size=100, epochs=50, num_layers=2, l_s=120, n_predictions=10, batch_size=32, window_size=50, smoothing_perc=0.2, error_buffer=50, p=0.1)

Bases: realseries.models.base.BaseModel

LSTM Dynamic method.

Parameters
  • hidden_size (int, optional) – Hidden size of LSTM. Defaults to 128.

  • model_path (str, optional) – Path for saving and loading model. Defaults to ‘./model’.

  • dropout (float, optional) – Dropout rate. Defaults to 0.3.

  • lr (float, optional) – Learning rate. Defaults to 1e-3.

  • lstm_batch_size (int, optional) – Batch size of training LSTM. Defaults to 100.

  • epochs (int, optional) – Epochs of training. Defaults to 50.

  • num_layers (int, optional) – Number of LSTM layer. Defaults to 2.

  • l_s (int, optional) – Length of the input sequence for LSTM. Defaults to 120.

  • n_predictions (int, optional) – Number of values to predict by input sequence. Defaults to 10.

  • batch_size (int, optional) – Number of values to evaluate in each batch in the prediction stage. Defaults to 32.

  • window_size (int, optional) – Window_size to use in error calculation. Defaults to 50.

  • smoothing_perc (float, optional) – Percentage of total values used in EWMA smoothing. Defaults to 0.2.

  • error_buffer (int, optional) – Number of values surrounding an error that are brought into the sequence. Defaults to 50.

  • p (float, optional) – Minimum percent decrease between max errors in anomalous sequences (used for pruning). Defaults to 0.1.

model

The LSTM model.

y_test

The origin data for calculate error.

y_hat

The predicted data.

detect(X, smoothed=True)

Get anomaly score of input sequence.

Parameters
  • X (array_like) – Input sequence.

  • smoothed (bool, optional) – Whether to smooth the errors by EWMA. Defaults to True.

Returns

(error_seq, error_seq_scores).The error_seq is list that stand the anomaly duration. The error_seq_scores is the corresponding anomaly score.

Return type

tuple

fit(X, split=0.25, monitor='val_loss', patience=10, delta=0, verbose=True)

Train the LSTM model.

Parameters
  • X (arrar_like) – The 2-D input sequence with shape (n_samples, n_features)

  • split (float, optional) – Fration to split for validation set. Defaults to 0.25.

  • monitor (str, optional) – Monitor the validation loss by setting the monitor argument to ‘val_loss’. Defaults to ‘val_loss’.

  • patience (int, optional) – Patience argument represents the number of epochs before stopping once your loss starts to increase (stops improving). Defaults to 10.

  • delta (int, optional) – A threshold to whether quantify a loss at some epoch as improvement or not. If the difference of loss is below delta, it is quantified as no improvement. Better to leave it as 0 since we’re interested in when loss becomes worse. Defaults to 0.

  • verbose (bool, optional) – Verbose decides what to print. Defaults to True.

static obtain_anomaly(y_test, y_hat, batch_size, window_size, smoothing_perc, p, l_s, error_buffer, smoothed=True)

Obtain anomaly from the origin sequence and reconstructed sequence y_hat.

Parameters
  • y_test (ndarray) – The origin 1-D signals array of test targets corresponding to true values to be predicted at end of each window.

  • y_hat (ndarray) – The predicted 1-D sequence y_hat for each timestep in y_test

  • batch_size (int, optional) – Number of values to evaluate in each batch in the prediction stage. Defaults to 32.

  • window_size (int, optional) – Window_size to use in error calculation. Defaults to 50.

  • smoothing_perc (float, optional) – Percentage of total values used in EWMA smoothing. Defaults to 0.2.

  • error_buffer (int, optional) – Number of values surrounding an error that are brought into the sequence. Defaults to 50.

  • p (float, optional) – Minimum percent decrease between max errors in anomalous sequences (used for pruning). Defaults to 0.1.

  • l_s (int, optional) – Length of the input sequence for LSTM. Defaults to 120.

  • smoothed (bool, optional) – Whether to smooth the errors by EWMA. Defaults to True.

Returns

(error_seq, error_seq_scores)

Return type

tuple

predict(X)

Predict the reconstructed output array y_hat.

Parameters

X (array_like) – The input 2-D array.

Raises

ValueError – Num_batches less than 0.

Returns

The predicted array of lstm_encoder_decoder.

Return type

ndarray

realseries.models.lumino module

The implementation of luminol method. Reference: https://github.com/linkedin/luminol

class realseries.models.lumino.Lumino

Bases: realseries.models.base.BaseModel

detect(X, algorithm_name=None, algorithm_params=None)

Detect the input sequence and return anomaly socre.

Parameters
  • X (array_like) – 1-D time series with shape (n_samples,)

  • algorithm_name (str, optional) – Algorithm_name. Defaults to None.

  • algorithm_params (dict, optional) –

    Algorithm_params. Defaults to None. The algorithm_name and the corresponding algorithm_params are:

     11.  'bitmap_detector': # behaves well for huge data sets, and it is the default detector.
     2    {
     3    'precision'(4): # how many sections to categorize values,
     4    'lag_window_size'(2% of the series length): # lagging window size,
     5    'future_window_size'(2% of the series length): # future window size,
     6    'chunk_size'(2): # chunk size.
     7    }
     82.  'default_detector': # used when other algorithms fails, not meant to be explicitly used.
     93.  'derivative_detector': # meant to be used when abrupt changes of value are of main interest.
    10    {
    11    'smoothing factor'(0.2): # smoothing factor used to compute exponential moving averages
    12                                # of derivatives.
    13    }
    144.  'exp_avg_detector': # meant to be used when values are in a roughly stationary range.
    15                        # and it is the default refine algorithm.
    16    {
    17    'smoothing factor'(0.2): # smoothing factor used to compute exponential moving averages.
    18    'lag_window_size'(20% of the series length): # lagging window size.
    19    'use_lag_window'(False): # if asserted, a lagging window of size lag_window_size will be used.
    20    }
    

Returns

Normalized anomaly score in [0,1].

Return type

ndarray

fit()

Fit the model

Parameters
  • x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

  • y (ndarray, optional) – Ignored. Defaults to None.

realseries.models.rcforest module

The implementation of random cur forest method. Reference: S. Guha, N. Mishra, G. Roy, & O. Schrijvers, Robust random cut forest based anomaly detection on streams, in Proceedings of the 33rd International conference on machine learning, New York, NY, 2016 (pp. 2712-2721). https://github.com/kLabUM/rrcf

class realseries.models.rcforest.RCForest(shingle_size=32, num_trees=100, tree_size=50, random_state=0)

Bases: realseries.models.base.BaseModel

Random cut forest.The Robust Random Cut Forest (RRCF) algorithm is an ensemble method for detecting outliers in streaming data. RRCF offers a number of features that many competing anomaly detection algorithms lack. Specifically, RRCF:

  • Is designed to handle streaming data.

  • Performs well on high-dimensional data.

  • Reduces the influence of irrelevant dimensions.

  • Gracefully handles duplicates and near-duplicates that could otherwise mask the presence of outliers.

  • Features an anomaly-scoring algorithm with a clear underlying statistical meaning.

Parameters
  • shingle_size (int, optional) – Window size. Defaults to 32.

  • num_trees (int, optional) – Number of estimators. Defaults to 100.

  • tree_size (int, optional) – Number of leaf. Defaults to 50.

  • random_state (int, optional) – Random state seed. Defaults to None.

detect(X)

Detect the input.

Parameters

X (array_like) – Input sequence.

Returns

Anomaly score.

Return type

ndarray

fit(X, y=None)

Fit the model

Parameters
  • x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

  • y (ndarray, optional) – Ignored. Defaults to None.

realseries.models.rnn module

RNN encoder decoder model. Reference ‘LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection’

class realseries.models.rnn.LSTMED(rnn_type='LSTM', emsize=128, nhid=128, epochs=200, nlayers=2, batch_size=64, window_size=50, dropout=0.2, lr=0.0002, weight_decay=0.0001, clip=10, res_connection=False, prediction_window_size=10, model_path=None, seed=1111)

Bases: realseries.models.base.BaseModel

RNN(LSTM) encoder decoder model for anomaly detection.

Parameters
  • rnn_type (str, optional) – Type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU, SRU). Defaults to ‘LSTM’.

  • emsize (int, optional) – Size of rnn input features. Defaults to 128.

  • nhid (int, optional) – Number of hidden units per layer. Defaults to 128.

  • epochs (int, optional) – Upper epoch limit. Defaults to 200.

  • nlayers (int, optional) – Number of LSTM layers. Defaults to 2.

  • batch_size (int, optional) – Batch size. Defaults to 64.

  • window_size (int, optional) – LSTM input sequence length. Defaults to 50.

  • dropout (float, optional) – Defaults to 0.2.

  • lr (float, optional) – Learning rate. Defaults to 0.0002.

  • weight_decay (float, optional) – Weight decay. Defaults to 1e-4.

  • clip (int, optional) – Gradient clipping. Defaults to 10.

  • res_connection (bool, optional) – Residual connection. This parameters has not been tested when setting True. Defaults to False.

  • prediction_window_size (int, optional) – Prediction window size. Defaults to 10.

  • model_path (str, optional) – The path to save or load model. Defaults to None.

  • seed (int, optional) – Seed. Defaults to 1111.

model

LSTM model.

detect(X, channel_idx=0)
If X is an array of shape (n_samples, n_features), it need to be

detected one by one channel.

Parameters
  • X (array_like) – Input sequence.

  • channel_idx (int, optional) – The index of feature cahnnel to detect.

  • 0. (Defaults to) –

Returns

Anomaly score

Return type

ndarray

fit(X, y=None, augment_length=None, split=0.25, monitor='val_loss', patience=10, delta=0, verbose=True)

Train the detector.

Parameters
  • X (array_like) – The input sequence of shape (n_length,).

  • y (array_like, optional) – Ignored. Defaults to None.

  • augment_length (int, optional) – The total number of samples after augmented. Defaults to None.

  • split (float, optional) – Fration to split for validation set. Defaults to 0.25.

  • monitor (str, optional) – Monitor the validation loss by setting the monitor argument to ‘val_loss’. Defaults to ‘val_loss’.

  • patience (int, optional) – Patience argument represents the number of epochs before stopping once your loss starts to increase (stops improving). Defaults to 10.

  • delta (int, optional) – A threshold to whether quantify a loss at some epoch as improvement or not. If the difference of loss is below delta, it is quantified as no improvement. Better to leave it as 0 since we’re interested in when loss becomes worse. Defaults to 0.

  • verbose (bool, optional) – Verbose decides what to print. Defaults to True.

realseries.models.seqvl module

Introduction of seqvl.

class realseries.models.seqvl.SeqVL(contamination=0.1, name='SeqVL', num_epochs=250, batch_size=1, lr=0.001, lr_decay=0.8, lamb=10, clip_norm_value=12.0, data_split_rate=0.5, window_size=30, window_count=300, h_dim=24, z_dim=5, l_h_dim=24)

Bases: realseries.models.base.BaseModel

detect(X, thres)

Detect the data by trained model.

Parameters
  • X (array_like) – 1-D time series with length L.

  • thres (float) – Threshold.

Returns

Dict containing results. 0-1 sequence indicates whether the last point of a window is

an anomaly. length: L - window_size + 1

Return type

dict

fit(X)

Train the model.

Parameters

X (array_like) – Input sequence.

reshape_for_test(X)

Reshape the data gor test.

Parameters

X (array_like) – Input data.

Returns

Reshaped data.

Return type

ndarray

reshape_for_training(X)

Reshape the data for training.

Parameters

X (ndarray) – 1-D time series

Returns

input with shape [-1, window_count, window_size],

label with shape [-1, window_count]

Return type

tuple

realseries.models.sr module

class realseries.models.sr.SpectralResidual(series, threshold, mag_window, score_window)

Bases: realseries.models.base.BaseModel

SpectralResidual calss.

Parameters
  • series – input time series with shape (n_sample,)

  • threshold – the threshold that apply anomaly score

  • mag_window – the window of avarage filter when calculating spectral mag

  • score_window – the window of average filter when calculating score

detect()

Predict using the trained detector.

Parameters

x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

Returns

Outlier labels of shape (n_length,).For each sample of time series, whether or not it is an outlier. 0 for inliers and 1 for outliers.

Return type

ndarray

static extend_series(values, extend_num=5, look_ahead=5)

extend the array data by the predicted next value

Parameters
  • values (ndarray) – array of float numbers.

  • extend_num (int, optional) – number of values added to the back of data. Defaults to 5.

  • look_ahead (int, optional) – number of previous values used in prediction. Defaults to 5.

Raises

ValueError – the parameter ‘look_ahead’ must be at least 1

Returns

The result array.

Return type

ndarray

fit()

Fit the model

Parameters
  • x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

  • y (ndarray, optional) – Ignored. Defaults to None.

generate_spectral_score(series)
static predict_next(values)

Predicts the next value by sum up the slope of the last value with previous values.

Mathematically, \(g = 1/m * \sum_{i=1}^{m} g(x_n, x_{n-i})\), \(x_{n+1} = x_{n-m+1} + g * m\), where \(g(x_i,x_j) = (x_i - x_j) / (i - j)\).

Parameters

values (list) – a list of float numbers.

Raises

ValueError – Length lsit should at least 2.

Returns

The predicted next value.

Return type

float

spectral_residual_transform(values)

Transform a time series into spectral residual series by FFT.

Parameters

values (ndarray) – Array of values.

Returns

Spectral residual values.

Return type

ndarray

realseries.models.srcnn module

class realseries.models.srcnn.SR_CNN(model_path, window=128, lr=1e-06, seed=0, epochs=20, batch_size=64, dropout=0.2, num_worker=0)

Bases: realseries.models.base.BaseModel

The sali_map method for anomaly detection.

Parameters
  • model_path (str, optional) – Path for saving and loading model.

  • window (int, optional) – Length of each sample for input. Defaults to 128.

  • lr (float, optional) – Learning rate. Defaults to 1e-6.

  • seed (int, optional) – Random seed. Defaults to 0.

  • epochs (int, optional) – Defaults to 20.

  • batch_size (int, optional) – Defaults to 64.

  • dropout (float, optional) – Defaults to 0.2.

  • num_worker (int, optional) – Defaults to 0.

model

CNN model built by torch.

detect(X, y, back_k=0, backaddnum=5, step=1)

Get anomaly score of input sequence.

Parameters
  • X (array_like) – Input sequence.

  • y – Ignored.

  • back_k (int, optional) – Not test. Defaults to 0.

  • backaddnum (int, optional) – Not test. Defaults to 5.

  • step (int, optional) – Stride of sliding window in detecing stage. Defaults to 1.

Returns

Anomaly score.

Return type

ndarray

fit(X, step=64, num=10, back_k=0)

Train the model

Parameters
  • X (array_like) – The input 1-D array.

  • step (int, optional) – Stride of sliding window. Defaults to 64.

  • num (int, optional) – Number of added anomaly points to each window. Defaults to 10.

  • back_k (int, optional) – Defaults to 0.

realseries.models.stl module

class realseries.models.stl.STL

Bases: realseries.models.base.BaseModel

static calc_seasonal(detrended, period)

Calculate seasonal from detrended data.

Parameters
  • detrended (ndarray) – Input detrended data.

  • period (float or int) – The period of data.

Returns

The seasonal and the period_averages.

Return type

(ndarray, ndarray)

static calc_trend(observed, lo_frac=0.6, lo_delta=0.01)

calculate trend from observed data.

Parameters
  • observed (ndarray) – Input array.

  • lo_frac (float, optional) – Defaults to 0.6.

  • lo_delta (float, optional) – Defaults to 0.01.

Returns

The trend.

Return type

ndarray

detect()

Predict using the trained detector.

Parameters

x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

Returns

Outlier labels of shape (n_length,).For each sample of time series, whether or not it is an outlier. 0 for inliers and 1 for outliers.

Return type

ndarray

static drift(data, n=3)
The drift forecast for the next point is a linear extrapolation from

the previous n points in the series.

Parameters
  • data (ndrray) – Observed data, presumed to be ordered in time.

  • n (int) – period over which to calculate linear model for extrapolation.

Returns

a single-valued forecast for the next value in the series.

Return type

float

fit(df, period=365, lo_frac=0.6, lo_delta=0.01)

Train the STL decompose model.Y[t] = T[t] + S[t] + e[t]

Parameters
  • df (DataFrame) – Input data.

  • period (int, optional) – Defaults to 365.

  • lo_frac (float, optional) – Defaults to 0.6.

  • lo_delta (float, optional) – Defaults to 0.01.

Returns

Dict results.

Return type

dict

forecast(stl, forecast_func='drift', steps=10, seasonal=False)

Forecast the given decomposition stl forward by steps

Parameters
  • stl (object) – STL object.

  • forecast_func (str, optional) – Defaults to ‘drift’.

  • steps (int, optional) – Defaults to 10.

  • seasonal (bool, optional) – Defaults to False.

Returns

forecast dataframe

Return type

DataFrame

static mean(data, n=3)
static naive(data, n=7)

realseries.models.vae_ad module

class realseries.models.vae_ad.VAE_AD(name='VAE_AD', num_epochs=256, batch_size=256, lr=0.001, lr_decay=0.8, clip_norm_value=12.0, weight_decay=0.001, data_split_rate=0.5, window_size=120, window_step=1, h_dim=100, z_dim=5)

Bases: realseries.models.base.BaseModel

The Donut-VAE version for anomaly detection

Parameters
  • name (str, optional) – Model name. Defaults to ‘VAE_AD’.

  • num_epochs (int, optional) – Epochs for model training. Defaults to 256.

  • batch_size (int, optional) – Batch size for model training. Defaults to 256.

  • lr ([type], optional) – Learning rate. Defaults to 1e-3.

  • lr_decay (float, optional) – Learning rate decay. Defaults to 0.8.

  • clip_norm_value (float, optional) – Gradient clip value. Defaults to 12.0.

  • weight_decay ([type], optional) – L2 regularization. Defaults to 1e-3.

  • data_split_rate (float, optional) – Defaults to 0.5.

  • window_size (int, optional) – Defaults to 120.

  • window_step (int, optional) – Defaults to 1.

  • h_dim (int, optional) – Hidden dim between x and z for VAE’s encoder and decoder Defaults to 100.

  • z_dim (int, optional) – Defaults to 5.

model

VAE model built by torch.

detect(X)

Get anomaly score of input sequence.

Parameters

X (array_like) – Input sequence.

Returns

origin_series: ndarray, Origin time series score: ndarray, Corresponding anomaly score.

Return type

A dict with attributes

fit(X)

Train the model

Parameters

X (array_like) – The input 1-D array.

forecast(X)

Forecast the input.

Parameters
  • x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

  • t (int) – time index of to-be-forecast samples.

Returns

Forecast samples of shape (n_length, n_features)

Return type

X_1 (ndarray)

load(path)

Load the model from path

Parameters

path (string) – model load path

predict(X)
save(path)

Save the model to path

Parameters

path (string) – model save path

realseries.models.vae_dense module

class realseries.models.vae_dense.VAE_Dense(window_size, channels, name='VAE_Dense', num_epochs=256, batch_size=64, lr=0.001, lr_decay=0.8, clip_norm_value=12.0, weight_decay=0.001, h_dim=200, z_dim=20)

Bases: realseries.models.base.BaseModel

The Donut-VAE version for anomaly detection

Parameters
  • window_size (int) –

  • channels (int) – Channel count of the input signals.

  • name (str, optional) – Model name. Defaults to ‘VAE_Dense’.

  • num_epochs (int, optional) – Epochs for model training. Defaults to 256.

  • batch_size (int, optional) – Batch size for model training. Defaults to 256.

  • lr ([type], optional) – Learning rate. Defaults to 1e-3.

  • lr_decay (float, optional) – Learning rate decay. Defaults to 0.8.

  • clip_norm_value (float, optional) – Gradient clip value. Defaults to 12.0.

  • weight_decay ([type], optional) – L2 regularization. Defaults to 1e-3.

  • h_dim (int, optional) – Hidden dim between x and z for VAE’s encoder and decoder Defaults to 200.

  • z_dim (int, optional) – Defaults to 20.

model

VAE model built by torch.

detect(X)

Get anomaly score of input sequence.

Parameters

X (array_like) – Input sequence.

Returns

origin_series: ndarray [timesteps, channels], Origin time series recon_series: ndarray [timesteps, channels], Reconstruct time series score: ndarray [timesteps, channels], Corresponding anomaly score.

Return type

A dict with attributes

fit(X)

Train the model

Parameters

X (array_like) – The input 2-D array. The first dimension denotes timesteps. The second dimension denotes the signal channels.

flatten(x)
forecast(X)

Forecast the input.

Parameters
  • x (array_like) – The input sequence of shape (n_length, n_features) or (n_length,).

  • t (int) – time index of to-be-forecast samples.

Returns

Forecast samples of shape (n_length, n_features)

Return type

X_1 (ndarray)

load(path)

Load the model from path

Parameters

path (string) – model load path

predict(X)
reform(x)
save(path)

Save the model to path

Parameters

path (string) – model save path

realseries.models.crmmd module

The crmmd is the implentation of paper ‘Calibrated Reliable Regression using Maximum Mean Discrepancy’ https://arxiv.org/abs/2006.10255

class realseries.models.crmmd.CRMMD(kernel_type='LSTM', input_size=128, hidden_sizes=[128, 64], prediction_window_size=1, activation='tanh', dropout_rate=0.2, variance=True, lr=0.0002, weight_decay=0.001, grad_clip=10, epochs_hnn=400, epochs_mmd=100, batch_size=1024, window_size=15, model_path='./model', seed=1111)

Bases: realseries.models.base.BaseModel

HNN forecaster for uncertainty prediction.

Parameters
  • kernel_type (str, optional) – Type of recurrent net (RNN, LSTM, GRU). Defaults to ‘LSTM’.

  • input_size (int, optional) – Size of rnn input features. Defaults to 128.

  • hidden_sizes (list, optional) – Number of hidden units per layer. Defaults to [128,64].

  • prediction_window_size (int, optional) – Prediction window size. Defaults to 1.

  • activation (str,optional) – The activation func to use. Can be either 'tanh' or 'relu'. Default: 'relu'

  • dropout_rate (float, optional) – Defaults to 0.2.

  • variance (bool, optional) – Whether to add a variance item at the last layer to indicate uncertainty. Default to True

  • lr (float, optional) – Learning rate. Defaults to 0.0002.

  • weight_decay (float, optional) – Weight decay. Defaults to 1e-4.

  • grad_clip (int, optional) – Gradient clipping. Defaults to 10.

  • epochs_hnn (int, optional) – Upper epoch limit for the first training phase (HNN). Defaults to 200.

  • epochs_mmd (int, optional) – Upper epoch limit for the second training phase (MMD). Defaults to 200.

  • batch_size (int, optional) – Batch size. Defaults to 1024.

  • window_size (int, optional) – LSTM input sequence length. Defaults to 15.

  • model_path (str, optional) – The path to save or load model. Defaults to ‘./model’.

  • seed (int, optional) – Seed. Defaults to 1111.

model

HNN model.

evaluation_model(scaler, test_data, test_label, t=1, confidence=95)

Get predictive intervals and evaluation scores.

Parameters
  • scaler – receive the scaler of data loader

  • test_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • test_label (numpy array) – The 2-D input sequence (n_samples,prediction_window_size)

  • t (optional, int) – the forecasting horizon, default to 1

  • confidence (optional, int) – the confidence of predictive intervals. Default to 95, output 95% predictive intervals.

Returns

the lower bound and the upper bound arrays of the predictive intervals for test data. rmse (float): the rmse score calibration error (float): the uncertainty evaluation score for test data.

Return type

PIs (two numpy arrays)

fit(train_data, train_label, val_data, val_label, patience=50, delta=0, verbose=True)

Train the LSTM model.

Parameters
  • train_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • train_label (numpy array) – The 2-D input sequence (n_samples,prediction_window_size)

  • val_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • val_label (numpy array) – The 2-D input sequence (n_samples,prediction_window_size)

  • patience (int, optional) – Patience argument represents the number of epochs before stopping once your loss starts to increase (stops improving). Defaults to 10.

  • delta (int, optional) – A threshold to whether quantify a loss at some epoch as improvement or not. If the difference of loss is below delta, it is quantified as no improvement. Better to leave it as 0 since we’re interested in when loss becomes worse. Defaults to 0.

  • verbose (bool, optional) – Verbose decides what to print. Defaults to True.

model

a trained model to save to model_path/checkpoint.pt.

forecast(scaler, test_data, t=1, confidence=95, is_uncertainty=True)

Get predictive intervals and evaluation scores.

Parameters
  • scaler – receive the scaler of data loader

  • test_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • t (optional, int) – the forecasting horizon, default to 1

  • confidence (optional, int) – the confidence of predictive intervals. Default to 95, output 95% predictive intervals.

  • is_uncertainty (optional, bool) – whether to get uncertainty, if true, outputing PIs, if false, outputing means. Defaults to True.

Returns

the lower bound and the upper bound arrays of the predictive intervals for test data.

Return type

PIs (two numpy arrays)

load_model(path=PosixPath('model/checkpoint_crmmd.pt'))

Load Pytorch model.

Parameters

model_path (string or path) – Path for loading model.

save_model(model_path=PosixPath('model/checkpoint_crmmd.pt'))

Save pytorch model.

Parameters

model_path (string or path) – Path for saving model.

realseries.models.hnn module

The models(HNN, Deep-ensemble, MC-dropout, CRMMD…) for time series forcasting and uncertainty prediction.

class realseries.models.hnn.HNN(kernel_type='LSTM', input_size=128, hidden_sizes=[128, 64], prediction_window_size=1, activation='tanh', dropout_rate=0.2, variance=True, lr=0.0002, weight_decay=0.001, grad_clip=10, epochs=200, batch_size=1024, window_size=15, model_path='./model', seed=1111)

Bases: realseries.models.base.BaseModel

HNN forecaster for uncertainty prediction.

Parameters
  • kernel_type (str, optional) – Type of recurrent net (RNN, LSTM, GRU). Defaults to ‘LSTM’.

  • input_size (int, optional) – Size of rnn input features. Defaults to 128.

  • hidden_sizes (list, optional) – Number of hidden units per layer. Defaults to [128,64].

  • prediction_window_size (int, optional) – Prediction window size. Defaults to 1.

  • activation (str,optional) – The activation func to use. Can be either 'tanh' or 'relu'. Default: 'relu'

  • dropout_rate (float, optional) – Defaults to 0.2.

  • variance (bool, optional) – Whether to add a variance item at the last layer to indicate uncertainty. Default to True

  • lr (float, optional) – Learning rate. Defaults to 0.0002.

  • weight_decay (float, optional) – Weight decay. Defaults to 1e-4.

  • grad_clip (int, optional) – Gradient clipping. Defaults to 10.

  • epochs (int, optional) – Upper epoch limit. Defaults to 200.

  • batch_size (int, optional) – Batch size. Defaults to 1024.

  • window_size (int, optional) – LSTM input sequence length. Defaults to 15.

  • model_path (str, optional) – The path to save or load model. Defaults to ‘./model’.

  • seed (int, optional) – Seed. Defaults to 1111.

model

HNN model.

evaluation_model(scaler, test_data, test_label, t=1, confidence=95)

Get predictive intervals and evaluation scores.

Parameters
  • scaler – receive the scaler of data loader

  • test_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • test_label (numpy array) – The 2-D input sequence (n_samples,prediction_window_size)

  • t (optional, int) – the forecasting horizon, default to 1

  • confidence (optional, int) – the confidence of predictive intervals. Default to 95, output 95% predictive intervals.

Returns

the lower bound and the upper bound arrays of the predictive intervals for test data. rmse (float): the rmse score calibration error (float): the uncertainty evaluation score for test data.

Return type

PIs (two numpy arrays)

fit(train_data, train_label, val_data, val_label, patience=50, delta=0, verbose=True)

Train the LSTM model.

Parameters
  • train_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • train_label (numpy array) – The 2-D input sequence (n_samples,prediction_window_size)

  • val_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • val_label (numpy array) – The 2-D input sequence (n_samples,prediction_window_size)

  • patience (int, optional) – Patience argument represents the number of epochs before stopping once your loss starts to increase (stops improving). Defaults to 10.

  • delta (int, optional) – A threshold to whether quantify a loss at some epoch as improvement or not. If the difference of loss is below delta, it is quantified as no improvement. Better to leave it as 0 since we’re interested in when loss becomes worse. Defaults to 0.

  • verbose (bool, optional) – Verbose decides what to print. Defaults to True.

model

a trained model to save to model_path/checkpoint_hnn.pt.

forecast(scaler, test_data, t=1, confidence=95, is_uncertainty=True)

Get predictive intervals and evaluation scores.

Parameters
  • scaler – receive the scaler of data loader

  • test_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • t (optional, int) – the forecasting horizon, default to 1

  • confidence (optional, int) – the confidence of predictive intervals. Default to 95, output 95% predictive intervals.

  • is_uncertainty (optional, bool) – whether to get uncertainty, if true, outputing PIs, if false, outputing means. Defaults to True.

Returns

the lower bound and the upper bound arrays of the predictive intervals for test data.

Return type

PIs (two numpy arrays)

load_model(path=PosixPath('model/checkpoint_hnn.pt'))

Load Pytorch model.

Parameters

model_path (string or path) – Path for loading model.

save_model(model_path=PosixPath('model/checkpoint_hnn.pt'))

Save pytorch model.

Parameters

model_path (string or path) – Path for saving model.

realseries.models.mc_dropout module

The models(HNN, Deep-ensemble, MC-dropout, CRMMD…) for time series forcasting and uncertainty prediction.

class realseries.models.mc_dropout.MC_dropout(kernel_type='LSTM', input_size=128, hidden_sizes=[128, 64], prediction_window_size=1, activation='tanh', dropout_rate=0.2, variance=True, lr=0.0002, weight_decay=0.001, grad_clip=10, epochs=200, batch_size=1024, window_size=15, model_path='./model', seed=1111)

Bases: realseries.models.base.BaseModel

MC-dropout forecaster for uncertainty prediction.

Parameters
  • kernel_type (str, optional) – Type of recurrent net (RNN, LSTM, GRU). Defaults to ‘LSTM’.

  • input_size (int, optional) – Size of rnn input features. Defaults to 128.

  • hidden_sizes (list, optional) – Number of hidden units per layer. Defaults to [128,64].

  • prediction_window_size (int, optional) – Prediction window size. Defaults to 1.

  • activation (str,optional) – The activation func to use. Can be either 'tanh' or 'relu'. Default: 'relu'

  • dropout_rate (float, optional) – Defaults to 0.2.

  • variance (bool, optional) – Whether to add a variance item at the last layer to indicate uncertainty. Default to True

  • lr (float, optional) – Learning rate. Defaults to 0.0002.

  • weight_decay (float, optional) – Weight decay. Defaults to 1e-4.

  • grad_clip (int, optional) – Gradient clipping. Defaults to 10.

  • epochs (int, optional) – Upper epoch limit. Defaults to 200.

  • batch_size (int, optional) – Batch size. Defaults to 1024.

  • window_size (int, optional) – LSTM input sequence length. Defaults to 15.

  • model_path (str, optional) – The path to save or load model. Defaults to ‘./model’.

  • seed (int, optional) – Seed. Defaults to 1111.

model

regular MC-dropout model.

evaluation_model(scaler, test_data, test_label, t=1, confidence=95, mc_times=400)

Get predictive intervals and evaluation scores.

Parameters
  • scaler – receive the scaler of data loader

  • test_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • test_label (numpy array) – The 2-D input sequence (n_samples,prediction_window_size)

  • t (optional, int) – the forecasting horizon, default to 1

  • confidence (optional, int) – the confidence of predictive intervals. Default to 95, output 95% predictive intervals.

  • mc_times (optional, int) – the sampling times of MC dropout, Default to 400

Returns

the lower bound and the upper bound arrays of the predictive intervals for test data. rmse (float): the rmse score calibration error (float): the uncertainty evaluation score for test data.

Return type

PIs (two numpy arrays)

fit(train_data, train_label, val_data, val_label, monitor='val_loss', patience=50, delta=0, verbose=True)

Train the LSTM model.

Parameters
  • train_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • train_label (numpy array) – The 2-D input sequence (n_samples,prediction_window_size)

  • val_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • val_label (numpy array) – The 2-D input sequence (n_samples,prediction_window_size)

  • patience (int, optional) – Patience argument represents the number of epochs before stopping once your loss starts to increase (stops improving). Defaults to 10.

  • delta (int, optional) – A threshold to whether quantify a loss at some epoch as improvement or not. If the difference of loss is below delta, it is quantified as no improvement. Better to leave it as 0 since we’re interested in when loss becomes worse. Defaults to 0.

  • verbose (bool, optional) – Verbose decides what to print. Defaults to True.

model

a trained model to save to model_path/checkpoint_mc.pt.

forecast(scaler, test_data, t=1, confidence=95, mc_times=400, is_uncertainty=True)

Get predictive intervals and evaluation scores.

Parameters
  • scaler – receive the scaler of data loader

  • test_data (numpy array) – The 3-D input sequence (n_samples,window_size,n_features)

  • t (optional, int) – the forecasting horizon, default to 1

  • confidence (optional, int) – the confidence of predictive intervals. Default to 95, output 95% predictive intervals.

  • mc_times (optional, int) – the sampling times of MC dropout, Default to 400

  • is_uncertainty (optional, bool) – whether to get uncertainty, if true, outputing PIs, if false, outputing means. Defaults to True.

Returns

the lower bound and the upper bound arrays of the predictive intervals for test data.

Return type

PIs (two numpy arrays)

load_model(path=PosixPath('model/checkpoint_mc.pt'))

Load Pytorch model.

Parameters

model_path (string or path) – Path for loading model.

save_model(model_path=PosixPath('model/checkpoint_mc.pt'))

Save pytorch model.

Parameters

model_path (string or path) – Path for saving model.