I'm working on an anomaly detection system for IoT devices using sensor data (temperature, relative humidity, and pressure). The devices are sealed but have some natural "breathability," meaning they show normal pattern changes over time (days) that could be mistaken for leaks using standard detection methods (like Van der Waals equations won't work). I have tried setting up a couple of things, and am now trying an auto-encoder, but I feel like my layers are not correct or I am missing some steps for this. Feel free to discuss any mistakes I might have made.
input data graph
Current Setup
Input data: Sequences of 38 4D vectors (time, humidity, temperature, pressure - all normalized) Each device can have multiple windows (e.g., 10 windows = 380 data points per device) Initial approach: Hybrid CNN-LSTM autoencoder Goal: Learn normal "breathing patterns" to distinguish between natural variations and actual leaks Made improvements: Learning Rate Finder (cyclical test), Cross-Validation, adaptive learning rate with early stopping, multiple folds with model reinitialization
Layers:
def build_hybrid_ae(self):
encoder_inputs = Input(shape=(self.sequence_length, self.input_dim))
x = encoder_inputs
if self.use_attention:
x = self.attention_layer(x)
# CNN feature extraction
for n_filters in [32, 16]:
conv1 = Conv1D(n_filters, kernel_size=3, padding='same', activation='relu')
x = conv1(x)
x = BatchNormalization()(x)
# LSTM processing
lstm1 = LSTM(16, return_sequences=False)
x = lstm1(x)
x = Dropout(self.dropout_rate)(x)
encoded = Dense(self.encoding_dim)(x)
self.encoder = Model(encoder_inputs, encoded, name='encoder')
# Decoder
decoder_inputs = Input(shape=(self.encoding_dim,))
x = Dense(16)(decoder_inputs)
x = RepeatVector(self.sequence_length)(x)
# LSTM decoder
x = LSTM(16, return_sequences=True)(x)
x = Dropout(self.dropout_rate)(x)
# CNN decoder
x = Conv1D(16, kernel_size=3, padding='same', activation='relu')(x)
x = BatchNormalization()(x)
decoded = Conv1D(self.input_dim, kernel_size=3, padding='same', activation='sigmoid')(x)
self.decoder = Model(decoder_inputs, decoded, name='decoder')
Loading the data
class SlidingWindows:
def __init__(self, windows, train_split=0.8, random_state=42):
self.windows = windows
self.train_split = train_split
self.random_state = random_state
self.RANGES = {
'temp': (-100.0, 70.0),
'humid': (0.0, 100.0),
'press': (868.0, 1085.0)
}
def normalize_data(self, val, start, end):
"""Normalize to [0,1] range"""
return (val - start) / (end - start)
def denormalize_data(self, val, start, end):
"""Convert back to original range"""
return val * (end - start) + start
def load_data(self, time_based_split=False):
X = np.array([[
[
ts.timestamp(),
float(self.normalize_data(temp, *self.RANGES['temp'])),
float(self.normalize_data(humid, *self.RANGES['humid'])),
float(self.normalize_data(press, *self.RANGES['press']))
] for ts, temp, humid, press in w['data']
] for w in self.windows], dtype=np.float32)
if time_based_split:
# Sort by mean timestamp of each window
mean_times = np.mean(X[:, :, 0], axis=1)
sorted_indices = np.argsort(mean_times)
X = X[sorted_indices]
else:
# Random split
rng = np.random.default_rng(self.random_state)
indices = rng.permutation(len(X))
X = X[indices]
split_idx = int(len(X) * self.train_split)
return (X[:split_idx],), (X[split_idx:],)
sliding_windows = create_sliding_windows(valid_recording_periods, df, threshold_value)
slidingwindows = SlidingWindows(sliding_windows)
Results
Graph showing bad training results from the auto encoder
Steps taken to improve the encoder
Because the results were bad, I tried some things, as mentioned: Learning Rate Finder (cyclical test), Cross-Validation, adaptive learning rate with early stopping, multiple folds with model reinitialization.
But these results were not that much better. It tries to fit the data to 0.5 every time, it seems.
Results p.2
Graph after improvements