Every bridge, building, and critical infrastructure around us is constantly under stress.
Wind loads, temperature fluctuations, traffic vibrations, and material aging all take their toll over time.
When disasters strike, such as a bridge collapsing, a massive sinkhole opening in a street, or a pipe bursting, it's rarely due to a single crack that appeared overnight. Instead, cracks and corrosion gradually worsen over time until the critical failure occurs.
For this reason, the question that keeps structural engineers awake at night is simple yet profound: How do we know when a structure is starting to fail before it becomes dangerous?
This is where Structural Health Monitoring (SHM) comes in. SHM is a field that combines civil engineering, sensor technology, and data science to continuously monitor the integrity of structures. By installing networks of sensors (accelerometers, strain gauges, temperature sensors) on bridges, buildings, tunnels, and other critical infrastructure, engineers can collect time series data that captures the structure's "heartbeat."
As one can easily imagine, people who work in SHM deal with a lot of time series. In this blog post, we'll explore the core challenge of detecting when something is genuinely wrong with a structure versus when the sensors are just picking up normal environmental variations. In particular, we'll build a complete damage detection pipeline that can identify structural damage even when temperature effects are present in the data.
Here's our roadmap:
- Setting up a virtual experiment: We'll create a realistic SHM scenario with simulated sensors on a structure
- Generating the data: We'll build synthetic time series that capture temperature-dependent structural behavior
- Adding some anomalies: We'll inject controlled "damage" signatures into our data to simulate cracks and structural issues
- Detecting them using Nixtla's pipeline: We'll use Nixtla's anomaly detection tools to automatically identify damage while filtering out temperature effects
Let's get started.
The Scenario: Virtual Experiment Setup
Consider an aircraft panel with sensors that continuously monitor its structural integrity. More specifically, these sensors generate time series signals that reflect the panel's condition. When a crack forms, it disrupts the signal pattern and we should be able to classify that as an anomaly. The challenge? Temperature also changes the signals. Materials expand when warm and contract when cold, creating pattern shifts that can look like damage but aren't. This means that when we see a change in the time series it is not necessarily due to the damage appearing in the structure.
In the following part, we'll describe exactly this scenario: we will simulate sensor data with varying temperature relationships, and we will show how to detect actual damage despite the temperature effect.
Let's get started!
Data Generation
Chirplet Wave
To simulate realistic SHM data, we need signals that look like what actual sensors measure. In the real world, when waves propagate through a structure, they create complex patterns. They have localized bursts of energy that decay over time. The mathematical name for these "wave packages" is chirplet. This is the function we will use:
def chirplet(t, tau, fc, alpha1, alpha2=0.0, beta=1.0, phi=0.0):
u = t - tau
env = np.exp(-alpha1 * u**2)
phase = 2*np.pi*(fc*u) + alpha2*u**2 + phi
return beta * env * np.cos(phase)
And this is how it looks:
# Generate simple chirplet
t = np.linspace(0, 1, 1000)
y = chirplet(t, tau=0.5, fc=10, alpha1=50)
plt.figure(figsize=(10, 6))
plt.plot(t, y, color='#98FE09', linewidth=2)
plt.title('Chirplet Signal', fontsize=16)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.grid(True, alpha=0.3)
plt.show()
Full Signal
A single chirplet is just the building block. In reality, when you send a wave through a structure, it doesn't just travel in a straight line. It bounces off edges, scatters at damage sites, and takes multiple paths to reach the sensor. The result is a complex signal made up of many overlapping chirplets arriving at different times with different amplitudes. Without boring you with the details, the main components are: the primary burst (direct signal), echoes and reflections from structural features, and multiple path arrivals, each one contributing to the overall time series we observe.
This is the function that we will use to model a signal:
def generate_random_signal():
rng = np.random.default_rng()
fs, T = 1000, 0.6
t = np.arange(int(T * fs)) / fs
y = np.zeros_like(t)
# small global jitter
s = 1.0 + rng.normal(0, 0.0005) # ~0.05% time stretch
amp_scale = np.exp(rng.normal(0, 0.01)) # ~1% amplitude drift
tt = t * s
# primary burst
y += chirplet(tt, tau=0.08, fc=120, alpha1=900, beta=1.0)
# echoes (very small variation)
echo_times = np.array([0.12, 0.16, 0.20, 0.26, 0.33, 0.41, 0.50])
echo_gains = np.exp(-4.0*(echo_times-0.10)) * 0.6
for tau0, g0 in zip(echo_times, echo_gains):
tau = tau0 + rng.normal(0, 4e-5) # ~0.2 ms jitter
fc = 120 + rng.normal(0, 0.1) # small freq jitter
phi = rng.uniform(-0.1, 0.1) # small phase shift
y += chirplet(tt, tau=tau, fc=fc, alpha1=800, beta=g0, phi=phi)
# scattering coda (tiny jitter, smooth decay)
for tau0 in np.linspace(0.22, 0.58, 100):
tau = tau0 + rng.normal(0, 4e-5)
fc = 120 + rng.normal(0, 0.1) # small freq jitter
g = 0.5 * np.exp(-6*(tau0-0.20))
y += chirplet(tt, tau=tau, fc=fc, alpha1=1200, beta=g)
y *= amp_scale
idx = pd.date_range(start="2025-01-01", periods=len(t), freq=f"{1000/fs}ms")
df = pd.DataFrame({"unique_id": "sensor_A", "ds": idx, "y": y})
return t, y, df
Let's visualize what this looks like:
# Generate and plot a random signal
t, y, df = generate_random_signal()
plt.figure(figsize=(12, 4))
plt.plot(t, y, color='#98FE09', linewidth=1.5)
plt.title('Simulated Sensor Signal', fontsize=16)
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.grid(True, alpha=0.3)
plt.savefig('images/random_signal.svg')
plt.show()
Full Dataset (Temperature Dependency)
The random signal we just generated is our starting point: it represents what a sensor might record at a single, fixed temperature. But in the real world, structures operate across a range of temperatures. An aircraft panel might see -20°C on the tarmac in winter and +40°C in summer. We need to understand how our sensor signal changes across this entire temperature range.
Here's what we'll do: imagine an engineer taking measurements at different temperatures in a climate-controlled lab. They set the temperature to 0°C, send a wave pulse, and record the signal. Then they increase it to 5°C, repeat the measurement, then 10°C, and so on up to 40°C. For each temperature, they get a slightly different signal amplitude and shape.
In our simulation, we'll do exactly this, but virtually. For every single time point in our 600-sample signal (that's 0.6 seconds of data at 1000 Hz), we'll create a mathematical relationship that describes how that particular moment's amplitude varies with temperature.
There are three random options for the variation that we will simulate:
- The relationship is linear, amplitude increases proportionally with temperature
- The relationship is polynomial, a more complex, curved relationship between amplitude and temperature
- The relationship is sinusoidal, with cyclical variations
These different patterns, which are randomly injected for every index (out of the 600), mimic the complex physics happening in real materials as temperature changes.
The result? Instead of one signal at one temperature, we'll have a complete dataset showing how our sensor readings evolve across the entire temperature range: exactly what we need to build a robust damage detection system.
This is the code to do so:
def chirplet(t, tau, fc, alpha1, alpha2=0.0, beta=1.0, phi=0.0):
u = t - tau
env = np.exp(-alpha1 * u**2)
phase = 2*np.pi*(fc*u) + alpha2*u**2 + phi
return beta * env * np.cos(phase)
def generate_mixed_dependency_timeseries(n_temperatures=100, temperature_range=(0, 1),
fs=1000, T=0.6, seed=42, change_probability=0.1):
rng = np.random.default_rng(seed)
# Generate time array
t = np.arange(int(T * fs)) / fs
k = len(t) # number of time steps
# Generate temperature values
temperature_values = np.linspace(temperature_range[0], temperature_range[1], n_temperatures)
# Initialize arrays
timeseries_data = np.zeros((k, n_temperatures))
dependency_types = np.zeros(k, dtype=int) # 0=linear, 1=polynomial, 2=sinusoidal
dependency_names = ['linear', 'polynomial', 'sinusoidal']
# Generate column names
column_names = [f"temp_{temp:.3f}" for temp in temperature_values]
# Generate base signal for reference
base_t, base_y, base_df = generate_random_signal()
# Start with random dependency type
current_dependency = rng.choice([0, 1, 2])
dependency_types[0] = current_dependency
# For each time step, create a mathematical relationship across temperatures
for time_idx in range(k):
base_value = base_y[time_idx]
# Check if we should change dependency type
if time_idx > 0 and rng.random() < change_probability:
current_dependency = rng.choice([0, 1, 2])
print(f"Time step {time_idx}: {dependency_names[current_dependency]}")
dependency_types[time_idx] = current_dependency
if current_dependency == 0: # Linear
# Linear dependency: signal = a * temp + b
a = rng.uniform(-0.05, 0.05) # Random slope between 0.01 and 0.05
b = base_value # intercept
for temp_idx, temp in enumerate(temperature_values):
timeseries_data[time_idx, temp_idx] = a * temp + b
elif current_dependency == 1: # Polynomial
# Polynomial dependency: signal = a * temp^2 + b * temp + c
# Random coefficients within reasonable ranges
a = rng.uniform(-0.005, 0.02) # Random quadratic coefficient
b = rng.uniform(0.01, 0.03) # Random linear coefficient
c = base_value # constant
for temp_idx, temp in enumerate(temperature_values):
timeseries_data[time_idx, temp_idx] = a * temp**2 + b * temp + c
elif current_dependency == 2: # Sinusoidal
# Sinusoidal dependency: signal = amplitude * sin(freq * temp + phase) + offset
amplitude = rng.uniform(-0.1, 0.1) # Random amplitude
freq = rng.uniform(1, 4) * np.pi # Random frequency (1π to 4π)
phase = rng.uniform(0, 2*np.pi) # Random phase
offset = base_value # offset
for temp_idx, temp in enumerate(temperature_values):
timeseries_data[time_idx, temp_idx] = amplitude * np.sin(freq * temp + phase) + offset
# Add small amount of noise for realism
noise_std = 0.0002 # Small but noticeable noise
for time_idx in range(k):
noise = rng.normal(0, noise_std, n_temperatures)
timeseries_data[time_idx, :] += noise
return t, timeseries_data, temperature_values, column_names, dependency_types
This will create a timeseries_size x num_temperatures matrix. As our time series has 600 points and we are using num_temperatures = 100, by default we will have a 600 x 100 matrix. Let's plot some columns.
def plot_random_temperatures(timeseries_data, temperature_values, n_temps=6, seed=42):
"""Plot n random temperature time series."""
np.random.seed(seed)
temp_indices = np.random.choice(len(temperature_values), n_temps, replace=False)
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()
colors = ['#02FEFA']*6
for i, temp_idx in enumerate(temp_indices):
ax = axes[i]
ax.plot(timeseries_data[:, temp_idx], color=colors[i], linewidth=2)
ax.set_title(f'Temperature = {temperature_values[temp_idx]:.3f}', fontsize=14)
ax.set_xlabel('Time Steps', fontsize=15)
ax.set_ylabel('Signal Value', fontsize=15)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Now the question to answer is the following:
At what index, and what temperature, does the signal matrix present an anomaly?
From the following plot, it is extremely hard to answer this question, but if we transpose the signal matrix and plot for a fixed index at different temperatures, we get the following:
time_steps_to_show = [0, 10, 22]
dependency_names = ['linear', 'polynomial', 'sinusoidal']
dep_colors = {'linear': '#98FE09', 'polynomial': '#02FEFA', 'sinusoidal': '#0E00F8'}
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes = axes.flatten()
for i, time_step in enumerate(time_steps_to_show):
ax = axes[i]
dep_type = dependency_types[time_step]
dep_name = dependency_names[dep_type]
color = dep_colors[dep_name]
ax.plot(temperature_values, timeseries_mixed[time_step, :], 'o-',
markersize=2, color=color, alpha=0.8)
ax.set_xlabel('Temperature', fontsize=12)
ax.tick_params(axis='x', labelsize=14)
ax.tick_params(axis='y', labelsize=14)
ax.set_xlabel('Temperature', fontsize=14)
ax.set_ylabel('Time Series Value', fontsize=14)
ax.set_title(f'Time Step {time_step} ({dep_name})', fontsize=20)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Dependency Types Visualization
Now, if at temperature = 1 you see -0.06 at time step = 0, then it is obviously not an anomaly, because it follows the linear trend. However, if you see the same value at time step = 22, there is obviously a problem.
This approach is crucial because in real-world monitoring, we can't control the temperature. A crack might form when the aircraft is at 15°C, but we're trying to detect it using measurements taken at 25°C. Without understanding the temperature dependency, we'd have no way to distinguish "the signal changed because of temperature" from "the signal changed because of damage." Our synthetic dataset, with its known temperature relationships, gives us the perfect test set to develop and validate detection algorithms that can make this distinction reliably.
Perfect, so let's see how these anomalies will look like.
Anomaly Creation
Now comes the interesting part: injecting damage into our synthetic data. We have a clean dataset showing normal structural behavior across different temperatures. To test our detection algorithms, we need to add controlled "anomalies".
In reality, damage/anomalies can manifest in many ways: gradual signal drift over time, sudden level shifts, or complex pattern changes. However, one of the most common and detectable signatures is a spike anomaly, a sudden, localized jump in amplitude at a specific time and temperature. This mimics what happens when a crack causes an unexpected wave reflection or scattering event.
For simplicity, we'll focus on spike anomalies in this tutorial. The concept is straightforward: we pick a specific time step (say, index 30 out of our 600 samples) and a specific temperature (say, 0.5), and we add a sudden amplitude increase at that exact point in our data matrix. This creates a clear, detectable deviation from the expected temperature-dependent pattern.
This is the function we will use:
def add_anomaly(timeseries_data, time_step, temperature_value, spike_magnitude=0.1, temperature_values=None):
data = timeseries_data.copy()
if temperature_values is not None:
# Find closest temperature index
temp_idx = np.argmin(np.abs(temperature_values - temperature_value))
actual_temp = temperature_values[temp_idx]
else:
# Assume temperature_value is already an index
temp_idx = int(temperature_value * len(timeseries_data[0]))
actual_temp = temperature_value
# Add spike anomaly
data[time_step, temp_idx] += spike_magnitude
print(f"Added spike anomaly:")
print(f" Time step: {time_step}")
print(f" Temperature: {actual_temp:.3f} (index {temp_idx})")
print(f" Spike magnitude: {spike_magnitude}")
return data
Let's see how this looks:
- Creation of the dataset:
t_mixed, timeseries_mixed, temperature_values, column_names, dependency_types = generate_mixed_dependency_timeseries(
n_temperatures=100,
temperature_range=(0, 1),
fs=1000,
T=0.6,
seed=42,
change_probability=0.1 # 10% chance of changing dependency type at each time step
)
- Adding the anomaly:
# Example: Add anomaly at time step 30, temperature 0.5
timeseries_with_anomaly = add_anomaly(
timeseries_mixed,
time_step=30,
temperature_value=0.5,
spike_magnitude=0.01,
temperature_values=temperature_values
)
- Displaying the anomaly:
plt.figure(figsize=(12, 6))
time_step = 30
plt.plot(temperature_values, timeseries_mixed[time_step, :], 'o-', color='#98FE09', linewidth=2, label='Original')
plt.plot(temperature_values, timeseries_with_anomaly[time_step, :], 'o-', color='cyan', linewidth=2, label='With Anomaly')
plt.axvline(x=0.5, color='lime', linestyle='--', alpha=0.7, label='Anomaly Location')
plt.title(f'Time Step {time_step} - Signal vs Temperature (Before/After Anomaly)', fontsize=16)
plt.xlabel('Temperature')
plt.ylabel('Signal Value')
plt.legend()
plt.grid(True, alpha=0.3)
plt.savefig('images/anomaly_detection.svg')
plt.show()
Time step 30 - Signal vs Temperature (Before/After Anomaly)
Now that we have injected the anomaly, let's see if we are going to be able to detect it.
Anomaly Detection through StatsForecast
Anomaly Detection Algorithm
The detection approach leverages Nixtla's StatsForecast library with an AutoETS (Exponential Smoothing State Space) model. The core idea is elegant: instead of trying to manually define what "normal" looks like at each temperature, we let a probabilistic forecasting model learn the expected pattern from the data itself. The model analyzes the time series across temperatures, fits to the underlying trends and relationships, and produces prediction intervals, essentially confidence bands that represent where we expect normal values to fall.
Here's the process for a given time step:
- Feed the signal values across all temperatures into the AutoETS model
- The model generates fitted values along with upper and lower prediction bounds (typically at 90% confidence)
- Any data point falling outside these prediction intervals is flagged as an anomaly
This approach automatically accounts for complex temperature dependencies, and only raises "alarms" when values truly deviate from expected behavior.
Algorithm implementation
It might sound complicated, but the implementation is extremely simple.
from statsforecast import StatsForecast
from statsforecast.models import AutoETS
def detect_anomalies(
timeseries_data: np.ndarray,
temperature_values: pd.DatetimeIndex,
timegpt_level: float = 0.90
) -> pd.DataFrame:
y = np.asarray(timeseries_data, dtype=float)
n = len(y)
if len(temperature_values) != n:
raise ValueError("temperature_values length must match timeseries_data length.")
# 1. Build long-format df expected by StatsForecast
start = pd.Timestamp("2024-01-01")
ds = pd.date_range(start, periods=n, freq="D")
df = pd.DataFrame({"unique_id": 0, "ds": ds, "y": y})
# 2. Initialize the model
levels = [int(round(timegpt_level * 100))]
sf = StatsForecast(models=[AutoETS(season_length=1)], freq="D", n_jobs=-1)
# 3. Fit & get forecasts with fitted=True
_ = sf.forecast(df=df, h=1, level=levels, fitted=True)
# 4. Retrieve insample fitted values & intervals
insample = sf.forecast_fitted_values()
model_name = [c for c in insample.columns if c not in ("unique_id", "ds", "y") and "-lo-" not in c and "-hi-" not in c][0]
lo_col = f"{model_name}-lo-{levels[0]}"
hi_col = f"{model_name}-hi-{levels[0]}"
# 5. Flag anomalies: outside the interval
is_anom = ~insample["y"].between(insample[lo_col], insample[hi_col])
is_anom = is_anom.astype(int).to_numpy()
# 6. Assemble output
out = pd.DataFrame({
"ds": insample["ds"].values,
"y": insample["y"].values,
"unique_id": 0,
"is_anomaly": is_anom,
"temperature_values": temperature_values,
})
out["_fitted_mean"] = insample[model_name].values
out["_lo"] = insample[lo_col].values
out["_hi"] = insample[hi_col].values
out.attrs["model_name"] = model_name
out.attrs["level"] = levels[0]
return out
df_out = detect_anomalies(
timeseries_data=timeseries_with_anomaly[time_step,:],
temperature_values=temperature_values
)
Let's break down what this function does:
Step 1: Build the DataFrame. StatsForecast expects data in a specific format with unique_id, ds (datetime), and y (values). We transform our numpy array into this format.
Step 2: Initialize the Model We use AutoETS, a probabilistic model that provides prediction intervals. Any point outside the expected range will be flagged as unusual. Read more about the model here.
Step 3: Fit the Model. We fit the model with fitted=True to retrieve in-sample prediction intervals, showing what the model considers "normal" for each historical point.
Step 4: Detect Anomalies. Any data point falling outside the prediction interval is flagged as an anomaly, and we assemble the final output with all relevant information.
And we can see from the following plot that we are able to retrieve the anomaly very well!
plt.plot(df_out['temperature_values'].values,df_out['y'].values, color='#98FE09', linewidth=2, label='Original')
plt.plot(df_out[df_out['is_anomaly']==1]['temperature_values'].values,df_out[df_out['is_anomaly']==1]['y'].values, '.', color='#02FEFA', linewidth=5, label='Anomaly', markersize=10)
plt.xlabel('Temperature')
plt.ylabel('Signal Value')
plt.title(f'Anomaly Detection at time step : {time_step}')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Anomaly Detection at time step 30
There we go: anomaly detected!
From a time series perspective, we can see the following plot.
temperature_idx = np.argmin(np.abs(temperature_values-0.5))
plt.plot(t_mixed, timeseries_with_anomaly[:,temperature_idx], color='#98FE09', linewidth=2, label='Original')
plt.axvline(x=t_mixed[30], color='#02FEFA', linestyle='--', alpha=0.7, label='Anomaly Time Step')
plt.xlabel('Time')
plt.ylabel('Signal Value')
plt.title(f'Anomaly Detection at time step : {time_step}')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Anomaly Detection at time step 30
As we can see, that point doesn't look like an anomaly per se but it is clearly an anomaly when we consider the temperature perspective: well done!
Conclusion
Let's recap what we covered in this post:
We introduced Structural Health Monitoring (SHM) and explained why continuous sensor monitoring is critical for detecting cracks in structures like aircraft panels.
We built a synthetic structural monitoring dataset that mimics real-world dynamics: chirplet-based sensor signals with complex temperature dependencies (linear, polynomial, and sinusoidal relationships). This approach captures the physics of how structures respond to environmental changes.
We explicitly injected spike anomalies to simulate structural damage like cracks. Since we know exactly where and when we added these anomalies, we can verify that our detection algorithm correctly identifies them at the right time steps and temperatures.
We demonstrated temperature-compensated anomaly detection. By using Nixtla's StatsForecast with AutoETS, we built a system that learns normal temperature-dependent behavior and only flags true deviations (not just temperature effects).
We validated the approach. The algorithm successfully captured the complex temperature patterns across our dataset while reliably detecting the injected spike anomaly at time step 30, temperature 0.5, which is exactly where we placed it.
Overall, this workflow shows how synthetic data, combined with Nixtla's forecasting models, provides a robust foundation for damage detection in engineering structures.