Tuning a Neural Network Model for Water Sensor Data

22.10.2021

Motivation

Motivation

Detecting contaminants in water distribution systems
- Measuring specific contaminants: expensive
- Measuring all contaminants: impossible
Also: detecting other irregularities
- e.g., equipment/sensor failures

The Data

Source

Water quality data from research project (ImprovT)
Measured at water distribution system of Thüringer Fernwasserversorgung
Used for academic challenge (GECCO industrial challenge 2018)
Data available (+ some results):

https://www.spotseven.de/gecco/gecco-challenge/gecco-challenge-2018/

https://www.spotseven.de/gecco/gecco-challenge/gecco-challenge-2018/gecco-challenge-2018-results/

Variables: Inputs

pH — pH value
Redox — Redox potential
Leit — Electric conductivity
Trueb — Turbidity
Cl — Chlorine dioxide 1
Cl_2 — Chlorine dioxide 2
Tp — Temperature
Fm — Flow rate 1
Fm_2 — Flow rate 2

The Task

Challenge: detect events (supervised)
Task now: detect anomalies (unsupervised)

Preprocessing

Data split: training, validation, test
Denoise: moving average
Detrend: convert to differences between time steps
Missing values: last observation carried forward
Scaling: zero mean, unit variance
- Based on training data mean and variance
Reshape: sequence blocks (64 samples x 9 features)

The Model

Autoencoder (AE)

AE reconstructs its own input
Trained on ‘normal’ data
‘Large’ reconstruction error \(\rightarrow\) anomaly
Here: 1D-convolutional layers

Handling the Data

Initially trained within 20 epochs
- Only non-event training samples
Validation data analyzed in batches
- Flag anomalies in batch
- Update AE with non-anomalies
- Go to next batch

Parameters?

Name	Description	Min	Max
batch_size	batch size	16	1024
nfilter	# of filter in first/last layer	10	100
lr	learning rate (training)	\(10^{-5}\)	\(10^{-2}\)
lrup	learning rate (update, lr x lrup)	\(10^{-2}\)	\(10^{0}\)
activation	activation function	{relu,swish,sigmoid}
drp	dropout rate	\(10^{-3}\)	\(10^{-0.3}\)
num_layers	# of layers in encoder and decoder	1	4

Parameter Tuning

Method: Surrogate Model-Based Optimization

Learn relation between performance and parameters

Here:
- 150 evaluations
  - Measure: Area under ROC curve AUC (on validation data)
- Surrogate model: Gaussian process regression
- Search: evolutionary algorithm

Tuning Result: Best Parameters

Name	Description	Min	Max	Best
batch_size	batch size	16	1024	82
nfilter	# of filter in first/last layer	10	100	66
lr	learning rate (training)	\(10^{-5}\)	\(10^{-2}\)	\(10^{-3.30}\)
lrup	learning rate (update, lr x lrup)	\(10^{-2}\)	\(10^{0}\)	\(10^{-1.87}\)
activation	activation function	{relu,swish,sigmoid}		relu
drp	dropout rate	\(10^{-3}\)	\(10^{-0.3}\)	\(10^{-2.56}\)
num_layers	# of layers in encoder and decoder	1	4	1

Tuning Result: Performance

Performance Comparison on Unseen Test Data

AE: Autoencoder
IF: Isolation Forest
LOCF: Last Observation Carried Forward (baseline)

Example Outputs (pH)

Example Outputs (Redox)

Conclusions

Reasonable comparative performance on test data
Still: no ‘perfect’ fit
Some open issues:
- Performance measures, benchmarking
- Unknown ‘true events’ in data
- Searching for better network architectures
- Include parameters of preprocessing steps
- Threshold update

Thanks for your attention. Questions?

Tuning Result: Progress

Tuning Result: Sensitivity

Tuning Result: F1