Tutorial


Install

Zouw needs below requirements to run.

You can install above python dependencies manually, or install using below command.

pip install analytics-zoo[automl]

Using built-in forecast models

The built-in forecast models are all derived from tfpark.KerasModels.

1.To start, you need to create a forecast model first. Specify target_dim and feature_dim in constructor.

Refer to API doc for detailed explaination of all arguments for each forecast model.

Below are some example code to create forecast models.

#import forecast models
from zoo.zouwu.model.forecast import LSTMForecaster
from zoo.zouwu.model.forecast import MTNetForecaster

#build a lstm forecast model
lstm_forecaster = LSTMForecaster(target_dim=1, 
                      feature_dim=4)

#build a mtnet forecast model
mtnet_forecaster = MTNetForecaster(target_dim=1,
                        feature_dim=4,
                        long_series_num=1,
                        series_length=3,
                        ar_window_size=2,
                        cnn_height=2)

2.Use forecaster.fit/evalute/predict in the same way as tfpark.KerasModel

3.For univariant forecasting (i.e. to predict one series at a time), you can use either LSTMForecaster or MTNetForecaster. The input data shape for fit/evaluation/predict should match the arguments you used to create the forecaster. Specifically:

4.For multivariant forecasting (i.e. to predict several series at the same time), you have to use MTNetForecaster. The input data shape should meet below criteria.


Using AutoTS

The automated training in zouwu is built upon Analytics Zoo AutoML module (refer to AutoML ProgrammingGuide and AutoML APIGuide for details), which uses Ray Tune for hyper parameter tuning and runs on Analytics Zoo RayOnSpark.

The general workflow using automated training contains below two steps.

  1. create a AutoTSTrainer to train a TSPipeline, save it to file to use later or elsewhere if you wish.
  2. use TSPipeline to do prediction, evaluation, and incremental fitting as well.

You'll need RayOnSpark for training with AutoTSTrainer, so you have to init it before auto training, and stop it after training is completed. Note RayOnSpark is not needed if you just use TSPipeline for inference, evaluation or incremental training.

from zoo import init_spark_on_local
from zoo.ray import RayContext
sc = init_spark_on_local(cores=4)
ray_ctx = RayContext(sc=sc)
ray_ctx.init()
from zoo import init_spark_on_yarn
from zoo.ray import RayContext
slave_num = 2
sc = init_spark_on_yarn(
        hadoop_conf=args.hadoop_conf,
        conda_name="ray36",
        num_executor=slave_num,
        executor_cores=4,
        executor_memory="8g ",
        driver_memory="2g",
        driver_cores=4,
        extra_executor_memory_for_ray="10g")
ray_ctx = RayContext(sc=sc, object_store_memory="5g")
ray_ctx.init()
   ray_ctx.stop()

Both AutoTSTrainer and TSPipeline accepts data frames as input. An exmaple data frame looks like below.

datetime value extra_feature_1 extra_feature_2
2019-06-06 1.2 1 2
2019-06-07 2.3 0 2

1.To create an AutoTSTrainer. Specify below arguments in constructor. See below example.

 from zoo.zouwu.autots.forecast import AutoTSTrainer

 trainer = AutoTSTrainer(dt_col="datetime",
                         target_col="value",
                         horizon=1,
                         extra_features_col=None)

2.Use AutoTSTrainer.fit on train data and validation data. A TSPipeline will be returned.

 ts_pipeline = trainer.fit(train_df, validation_df)

3.Use TSPipeline.fit/evaluate/predict to train pipeline (incremental fitting), evaluate or predict.

 #incremental fitting
 ts_pipeline.fit(new_train_df, new_val_df, epochs=10)
 #evaluate
 ts_pipeline.evalute(val_df)
 ts_pipeline.predict(test_df) 

4.Use TSPipeline.save/load to load from file or save to file.

 from zoo.zouwu.autots.forecast import TSPipeline
 loaded_ppl = TSPipeline.load(file)
 # ... do sth. e.g. incremental fitting
 loaded_ppl.save(another_file)