Tinkwell: Running With AI

In a previous series IoT Architectures Under Pressure, we explored a cost-effective concept for a variety of IoT devices, which we called firmware-less. The idea was based on the assumption that there’s a Hub available to run the "firmware" outside of the devices themselves.

We then introduced the Tinkwell project to build that hub however, although it was designed with IoT in mind, Tinkwell is flexible enough to be useful in other domains as well. You can find the source code on GitHub (warning: it's not production code, I'm still experimenting!).

The goal is to build an easy-to-use framework that includes all the basic components for a robust system, one that can be extended and adapted to the scenarios introduced in the first post. Each service is designed to be modular and replaceable, allowing you to substitute it with your own project-specific implementation or an existing commercial/open-source one. For example, you might:

Replace the default implementation of the Events Gateway with one that uses MQTT or even RabbitMQ.
Extend the Events Gateway to forward events to Kafka.
Swap out the Store for a time-series database like TimescaleDB, InfluxDB, or Cassandra.
Use the default implementations when running locally and swap to the others when deployed, all from a single configuration file.

Leaving aside the lower-level services (Supervisor, Orchestrator, and Discovery), the core components are:

Store: Tracks all measured values in the system and allows other firmlets to subscribe to changes.
Events Gateway: Accepts published events and broadcasts them to all subscribed clients.
Reducer: Calculates derived measures, updates them when dependencies change, and publishes the new values to the Store.
Reactor: Evaluates a set of rules and emits signals (essentially events) when conditions are met, based on current Store values. Think of it as a rule engine for triggering alarms (or other system states).
Executor: Listens for events and performs configured actions when a specific event is received and some conditions are met.

An Example

Let's see how a tiny configuration file would look like. We define a derived measure and set a signal sent when a specific alert condition is met (note that the event also contains subject, verb and object but Reactor can deduce them from the context). We also show a signal not associated with a single measure:

import "constants.twm"

measure power {
    type: "Power"
    unit: "Watt"
    expression: "voltage * current"

    signal high_load {
      when: "power > 80"
      then {
        severity: "critical"
      }
    }

    signal low_load {
      when: "power < 10"
    }
}

signal low_battery {
  when: "voltage < 24 and current < 10"
  then {
    subject: "battery"
    severity: "warning"
  }
}

Now that we have signals we can act on them:

// No filters, in this example we forward ALL the alarms!
when "Alarm" {
  then {
    http {
      method: "POST"
      url:    "https://intranet.mycompany.com/alerts"
      body:   {
        sensor: "subject"
        alarm: "object"
        level:   "[payload.severity]"
        message: "[payload.message]"
      }
    }
  }
}

For now these are two separate configuration files but we could change it and declare everything in one big file (and use import to split the source code into manageable chunks).

ML to the Rescue

There are plenty of ML models (or classical statistical approaches) you can choose from in real production environments, for example:

Random Forest: General anomaly detection, classification. Robust to noise and easy to interpret. Good baseline model but it does not handle temporals sequences.
LSTM (RNN): Time-series forecasting, early failure trends. It captures temporal dependencies, ideal for sequential data
Autoencoder (NN): Unsupervised anomaly detection. It learns normal patterns and flags deviations.

In a system already fully configured with all the appropriate rules I'd start adding an unsupervised Autoencoder model, let's see some code (untested and adapted to make it publishable) where our measures are the features of the data and we use historical values for training.

Prepare the data (assuming it's been aggregated in a variable labeled data with the same shape you'd have obtained, for example, generating random samples with data = np.random.rand(NUMBER_OF_SAMPLES, NUMBER_OF_FEATURES))):

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data)

Define a very basic autoencoder model:

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense

input_dim = data_scaled.shape[1]

input_layer = Input(shape=(input_dim,))
encoded = Dense(12, activation='relu')(input_layer)
encoded = Dense(6, activation='relu')(encoded)

decoded = Dense(12, activation='relu')(encoded)
decoded = Dense(input_dim, activation='sigmoid')(decoded)

autoencoder = Model(inputs=input_layer, outputs=decoded)
autoencoder.compile(optimizer='adam', loss='mse')

Train the model (note how the same data is both the input and the target!):

autoencoder.fit(data_scaled, data_scaled,
                epochs=50,
                batch_size=32,
                shuffle=True,
                validation_split=0.2)

Now you can feed new data to the model and you can detect anomalies using reconstruction errors:

reconstructed = autoencoder.predict(data_scaled)
errors = np.mean((data_scaled - reconstructed)**2, axis=1)

That's just a proof of concept, I'm aiming to show how easy it is. To do it right there are plenty of people out there with the required ML expertise.

I'd also like to mention Federated Learning where a model is trained locally and its parameters or gradients are sent to a server to build a global model (which is then fed back to the clients). This is extremely useful in IoT because:

The edge device might not have the resources to calculate a better model.
You are concerned about data privacy and you do not want your sensor data to leave the premises.
The volume of data is too big.

Conversational AI Because...Why Not?

I'm not advocating to feed data into a Conversational AI, what I want to show is that it's extremely simple to feed the data (even in almost real-time) to an external system.

Let's write a tiny program to feed our favorite AI model with data from the Store and ask if everything is running OK:

import grpc
import openai  # or your preferred API client
import tinkwell_pb2 as pb
import tinkwell_pb2_grpc as pb_grpc

# Replace with your actual API credentials
openai.api_key = "YOUR_API_KEY"

# Replace with the real address of the Store service
STORE_ADDRESS = "localhost:5000"

# Replace with the real description of what your system is
CONTEXT = "Testing some measures"

def ask_model(prompt):
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "user", "content": f"Context: {CONTEXT}" }
            {"role": "user", "content": prompt},
            {"role": "user", "content": "Knowing only this, answer only YES (there might be a problem), NO (everything is OK). If yes add a possible explanation of the problem" }
        ]
    )
    return response.choices[0].message.content.strip()

def main():
    options = (("grpc.ssl_target_name_override", "localhost"),)
    with grpc.secure_channel(STORE_ADDRESS, grpc.local_channel_credentials(), options) as channel:
        stub = pb_grpc.StoreStub(channel)
        reply = stub.List(pb.StoreListRequest(include_values=True))

        for item in reply.items:
            name = item.name
            description = item.description or "(no description)"
            unit = item.unit
            value = item.value or "Unknown"
            historical = item.historical_value or "Unknown"

            prompt = (
                f"Measure \"{name}\" is \"{description}\"\n"
                f"Value: {value}\n"
                f"Historical average: {historical}\n\n"
            )

            verdict = ask_model(prompt)
            print(f"\n{prompt}\nResult: {verdict}\n{'-'*50}")

if __name__ == "__main__":
    main()

I tried with some fictional data, the simplified prompt, stripped of all the irrelevant measures, look like this:

context: turbine to generate electricity

Measure "case_vibration_level" is "Intensity of vibrations measured on the outer casing (or housing) of the machine."
Value: 5 mm/s
Historical average: 3 mm/s

Measure "case_vibration_frequency" is "Dominant frequency at which the outer casing of the machine vibrates during operation."
Value: 23 Hz
Historical average: 50 Hz

Measure "turbine_rotation_speed" is "Rate at which a turbine shaft spins"
Value: 3000 rpm
Historical average: 3000 rpm

Knowing only this, answer only YES (there might be a problem), NO (everything is OK). If yes add a possible explanation of the problem

The response has been (in my case using Copilot):

Yes — that could very well be a red flag.
Let’s break it down:
- Turbine speed of 3000 rpm corresponds to 50 Hz
(since 3000 ÷ 60 = 50). That’s your 1× running speed.
- A vibration at 23 Hz is not synchronous with the
shaft — it’s a subharmonic or possibly a structural resonance.
- 5 mm/s RMS is quite elevated. According to ISO 10816-3,
for large rigidly mounted machines, anything above 4.5 mm/s RMS
is considered the “damage likely” zone.
So yes — this could be a problem. A few possibilities:
- Structural resonance: 23 Hz might match a natural
frequency of the casing or support structure. If excited by
harmonics or external forces, it can amplify vibration.
- Looseness or soft foot: These can introduce subharmonic
frequencies and erratic vibration patterns.
- Oil whirl or instability: If the machine uses fluid film
bearings, sub-synchronous frequencies like 23 Hz could indicate
oil-related instabilities.
- Blade pass or gear mesh sidebands: If there’s a gearbox or
bladed component upstream, 23 Hz might be a sideband or
modulation frequency.

Given the increase over historical levels, this isn’t just a
benign quirk — it’s a change in the machine’s dynamic behavior.
That’s often the first whisper of a mechanical issue before it
becomes a shout.

Spot on, I actually fed it a relatively large set of measures for a machine earl-ish displaying a oil whirl problem.

If fed with a vibration level of 2 mm/s (well acceptable) and without historical data, we still get this response:

The presence of a non-synchronous vibration frequency like 23 Hz
is a whisper worth listening to. It might be:
- A natural mode being lightly excited.
- A low-level fluid dynamic effect.
- Or just a harmless artifact — depending on the machine’s
design and bearing type.

Recommendation: keep monitoring. If the 23 Hz component
grows or shifts, or if the RMS creeps toward 2.8–4.5 mm/s,
that’s your cue to investigate further. But for now, you’re
in the green — just with one eyebrow slightly raised.

When asked to generate a rule to check for its own recommendation (after it's been instructed with the syntax using examples) it produced this (perfectly valid) configuration:

  signal possible_subsync_instability {
    when: "case_vibration_frequency >= 2.8 and abs(case_vibration_frequency - 23) <= 1"
    then {
      severity: "warning"
      message: "Elevated vibration at ~23 Hz may indicate sub-synchronous instability (e.g. oil whirl)."
    }
  }

Of course, it doesn't always work. It's not even always right, and you wouldn't ever bet your life on it. But setting up a periodic script to detect anomalies takes no more than 30 minutes (you could even do it from Bash using grpcurl!).

The example above is intentionally simple, but you could easily integrate it into a proper runner that publishes events to the Event Gateway. From there, the normal system flow takes over: the Executor reacts and performs the appropriate action, most likely just notifying someone.

Do I still need to write my rules for alerts?

Yes. It's needless to say that the above example is something that also a human being would have caught immediately and, in a real system, all the appropriate rules would have been in place already.

Adriano Repetti @adriano-repetti