Time Series Forecasting


Automated Time Series Prediction

Training a model using TimeSequencePredictor

TimeSequencePredictor can be used to train a model on historical time sequence data and predict future sequences. Note that:
* We require input time series data to be uniformly sampled in timeline. Missing data points will lead to errors or unreliable prediction result.

1. Before training, init RayOnSpark.

from zoo import init_spark_on_local
from zoo.ray import RayContext
sc = init_spark_on_local(cores=4)
ray_ctx = RayContext(sc=sc)
ray_ctx.init()
from zoo import init_spark_on_yarn
from zoo.ray import RayContext
slave_num = 2
sc = init_spark_on_yarn(
        hadoop_conf=args.hadoop_conf,
        conda_name="ray36",
        num_executor=slave_num,
        executor_cores=4,
        executor_memory="8g ",
        driver_memory="2g",
        driver_cores=4,
        extra_executor_memory_for_ray="10g")
ray_ctx = RayContext(sc=sc, object_store_memory="5g")
ray_ctx.init()

2. Create a TimeSequencePredictor

from zoo.automl.regression.time_sequence_predictor import TimeSequencePredictor
tsp = TimeSequencePredictor(dt_col="datetime", target_col="value", extra_features_col=None, future_seq_len=1)

3. Train on historical time sequence.

datetime value
2019-06-06 1.2
2019-06-07 2.3
pipeline = tsp.fit(train_df, metric="mean_squared_error", recipe=RandomRecipe(num_samples=1), distributed=False)

4. After training finished, stop RayOnSpark

ray_ctx.stop()

Saving and Loading a TimeSequencePipeline

pipeline.save("/tmp/saved_pipeline/my.ppl")
from zoo.automl.pipeline.time_sequence import load_ts_pipeline

pipeline = load_ts_pipeline("/tmp/saved_pipeline/my.ppl")

Prediction and Evaluation using TimeSequencePipeline

A TimeSequencePipeline contains a chain of feature transformers and models, which does end-to-end time sequence prediction on input data. TimeSequencePipeline can be saved and loaded for future deployment.

Output dataframe look likes below (assume predict n values forward). col datetime is the starting timestamp.

datetime value_0 value_1 ... value_{n-1}
2019-06-06 1.2 2.8 ... 4.4
result_df = pipeline.predict(test_df)
#evaluate with MSE and R2 metrics
mse, rs = pipeline.evaluate(test_df, metrics=["mse", "rs"])
#fit with new data and train for 5 epochs
pipeline.fit(new_train_df,epoch_num=5)