Strong background in scalable systems, full-stack development, and large data pipelines.
On the AI side, delivers end-to-end ML and LLM systems, from business problem framing to production deployment and monitoring. Hands-on across agentic LLM applications (Claude, LangGraph, RAG), inference infrastructure (vLLM, distributed routing), and classical ML for forecasting, classification, and anomaly detection.
Focused on the operational discipline that turns prototypes into reliable products: evaluation, observability, cost control, and shipping Data and AI that moves business metrics towards automation, decision support, and measurable impact.
Spotting Insider Trading on Polymarket Without a Surveillance Team
Every trade on Polymarket leaves a trace. A wallet address, a timestamp, a position size, a resolution outcome. On its own, a single winning bet on a geopolitical event means nothing. Someone has to win. But when the same handful of wallets shows up again and again, buying early and heavy on events that turned on information most of the world did not have, the noise starts to look like a signal.
Polymarket is pseudonymous, the data is public, and there is no regulator running a surveillance system over it. That is bad for enforcement and good for analysis. Everything you need to flag suspicious trading is already on-chain. The work is in pulling it out, structuring it, and asking the right questions.
This is a business-friendly walk-through of what that looks like in practice. The full version is in the technical course, but you do not need to read code to understand the pattern.
What "Insider Trading" Means Here
In a regulated market, insider trading has a legal definition tied to material non-public information and a fiduciary duty. None of that applies on a decentralized prediction market. There is no formal regulatory weight to the phrase here.
What we mean is narrower and more behavioral: an account using information that was not publicly available at the time of trade entry to take positions whose profitability depends on that informational advantage. The practical question is not legal guilt but statistical anomaly. Does this account's record have a plausible innocent explanation?
Geopolitical markets are the high-signal domain. They resolve on discrete real-world events such as elections, military actions, diplomatic decisions, leadership changes. Skill in these markets is not about synthesizing public data faster than the competition. The informational edge is binary. You either have access to the right conversation, the right document, or the right source, or you do not. And the number of people who do is usually small.
That concentration is what makes the pattern visible. When an informational edge exists in these markets, it tends to be traceable to a small number of actors, who tend to behave in similar ways.
The Behavioral Fingerprint
A trader with genuine information advantage does not behave like a lucky guesser, and they do not behave like a skilled analyst. They behave like someone who already knows the answer.
They enter positions early, before the market has had a chance to reprice around emerging information. They size those positions heavily, because the risk they perceive is lower than the risk the market is pricing. They size them consistently from event to event, because conviction does not fluctuate when you already know the outcome. And they win at a rate that has no comfortable explanation in terms of skill or chance.
Three populations sit in the data, and the goal of any detection effort is to pull them apart:
Noise traders, who churn through small bets with no particular timing signal and mediocre win rates
Skilled analysts, who win at an elevated rate but spread their activity across many event types and enter early when uncertainty is real
Suspected insiders, whose wins concentrate on geopolitical events, whose entries cluster near the moment information becomes public, and whose position sizes barely vary
None of these features alone is a conviction. Together, they form a profile that warrants scrutiny.
Noise traders, skilled analysts, and suspected insiders look different on win rate, entry timing, and sizing consistency — the three signals separate the populations
One Wallet Is Suspicious. A Cluster Is Harder to Dismiss.
A single wallet with a 95% win rate across ten geopolitical markets is suspicious. A network of fifteen wallets, each with a 70% win rate across the same ten markets, entering positions within a narrow time window of each other and sizing them consistently, is more suspicious and harder to see without the right tools.
The coordinated case is designed, whether consciously or not, to fragment the statistical signal across multiple identities. Each individual account looks less anomalous. The network does not.
This is why detection has to operate at two levels simultaneously. At the account level, you score individual wallets on dimensions like win rate, trade frequency on geopolitical events, entry timing relative to resolution, and consistency of position size. At the network level, you look for co-trading patterns: wallets that show up on the same events, at similar times, on the same side, with comparable sizes.
The two levels reinforce each other. An account that is borderline suspicious on its own becomes much more suspicious when it sits inside a cluster of similarly-profiled wallets that consistently traded the same events. A cluster of unremarkable wallets that always trade together starts to mean something when you notice they always win.
Each level on its own flags only the obvious cases — together they catch coordinated networks designed to hide in the statistical noise
The Detection Pipeline
The full pipeline is not exotic. It has three stages, and most of the work is in the first one.
**Extract and clean.** Pull trade data from Polymarket's public APIs and the underlying on-chain records. Resolve wallet addresses, compute per-trade profit and loss against the actual resolution outcome, filter out noise trades and dust positions, and structure everything into three clean tables: one trade per row, one position per wallet-market pair, and one aggregate row per wallet. This is unglamorous and where most of the value is won or lost. A model trained on data with hidden gaps, mismatched timestamps, or mis-scaled prices will quietly learn the wrong thing.
**Detect.** Build a behavioral feature vector per wallet — geopolitical win rate, entry timing, timing consistency, position size consistency, win rate divergence between geopolitical and other markets, market concentration, wallet age, average return on capital. Cluster those vectors to find behaviorally similar wallets. Separately, build a co-trading similarity matrix from the trade data and cluster that to find wallets that actually traded together. Combine the two into a score per wallet, then sort.
**Visualize.** This part is not decoration. It is a second analysis pass. Four lenses on the same signal — a wallet graph where coordinated clusters appear spatially, a win-rate distribution where the flagged accounts should sit in the far right tail, an event-level heatmap where high-risk accounts should win on the same markets, and a timeline of entries relative to resolution. When all four lenses agree, you have a finding. When they disagree, you have something to investigate.
Most of the work is in the first stage — clean data and a small set of carefully chosen features beat any clever model on noisy inputs
What This System Is Not
The output of this kind of pipeline is a ranked list of suspicious accounts and clusters, scored by the strength of the anomaly signal. It is not a legal case. It is not a verdict. There is no ground truth label set you can validate against, because no one publishes a list of confirmed insiders. You are working with signals, not proof.
That shapes how the results should be read. The right way to use the output is as an investigative tool. The top of the list points you at the wallets where a human analyst should look harder: at the specific events, the specific entry timestamps, the specific news cycles that surrounded them. The model does the prioritization. The interpretation is still human work.
A few practical limits are worth keeping in mind:
Win rates on small numbers of trades are unstable. A wallet with six trades and a 100% win rate is mostly noise. Detection should require a minimum trade count and apply shrinkage toward the population mean for low-volume accounts.
Skilled analysts can look superficially similar to insiders on a single dimension. The defense against that is multi-dimensional scoring. Insiders tend to win specifically on geopolitical events while looking mediocre elsewhere. Analysts tend to win at a moderate rate across many domains.
Coordinated activity is not always insider activity. Market-making bots, copy-trading services, and informal social trading groups all produce co-trading signals. Pairing the network signal with behavioral suspicion is what separates them.
The Pattern Is Already in the Data
The point of the exercise is not that any of this is hidden. It is that the pattern has always been visible on-chain. What is missing on prediction markets is the surveillance infrastructure that exists, by default, in regulated finance.
A small, focused pipeline can replicate the core of that infrastructure on a defined corpus of geopolitical markets. The features are not complex. The models are not exotic. The discipline is in pulling clean data, choosing the right behavioral dimensions, scoring at both the account and network level, and rendering the findings in a form that someone other than the engineer who built it can read.
When that is in place, what was a vague claim that "insiders trade on Polymarket" becomes a ranked list of wallets with explainable features and convergent evidence. The list does not prove anything. It tells you where to look.
Notes · May 18, 2026
AI and Machine Learning for Predictive Maintenance at Industrial Scale
Predictive maintenance and downtime detection are some of the most discussed use cases for AI in industry. The promise is appealing: stop a machine from failing before it fails, save maintenance costs, avoid unplanned downtime, and squeeze more uptime out of expensive equipment.
In practice, getting there is much harder than the slide decks suggest. The model is almost never the hard part. The hard part is everything around it: connecting to the machines, moving the data, keeping the pipeline alive, labelling enough events to train on, deploying inference where latency actually matters, and then making the predictions visible to the people who can act on them.
I want to walk through the layers that actually matter when building this kind of system in a real plant, on a real production line, with real machines that were not designed with AI in mind.
The Data Is Trapped Inside the Machines
Before any model can be trained, the data needs to leave the machine. This sounds obvious, but it is where most projects stall.
Industrial equipment rarely speaks a single common language. PLCs may expose data over OPC UA, Modbus, Profinet, EtherNet/IP, or proprietary protocols. CNC machines may have their own controller interfaces. Older equipment may only expose dry contacts, analog signals, or serial output. Sensors retrofitted onto legacy assets often live on a completely separate network, sometimes wireless, sometimes wired into a small gateway sitting in an electrical cabinet.
Getting a clean, time-aligned, high-frequency stream out of all of this requires a real data connector layer. That layer needs to:
Handle multiple protocols simultaneously
Buffer locally when the network drops
Time-stamp events at the source rather than at ingest
Normalize tag names and units into a coherent model
Survive PLC reboots, controller updates, and maintenance shifts without losing data silently
The lesson here is that the connector layer is infrastructure, not a script. It must be monitored, versioned, and treated as a first-class part of the system. A model trained on data with hidden gaps or shifted timestamps will quietly learn the wrong thing.
Edge VMs and Why the Cloud Is Not Always the Answer
Once the data is flowing, the next question is where it should be processed.
Sending everything to the cloud sounds clean, but for high-frequency signals on production lines, it often does not work. Vibration data, motor current, acoustic signals, or process variables sampled at hundreds or thousands of hertz add up quickly. Bandwidth becomes a cost. Latency becomes a constraint. And if the cloud link goes down, the line should not lose its predictive layer.
This is where edge infrastructure matters. A small VM or container running on a ruggedized industrial PC near the asset can:
Aggregate and downsample high-frequency signals before shipping them upstream
Run lightweight inference locally with predictable latency
Buffer data during connectivity outages and replay it later
Apply first-pass anomaly filters so the cloud only sees what matters
The architecture usually ends up being hybrid. The edge handles the fast loop, where milliseconds matter for stopping a machine or flagging an anomaly. The cloud handles the slow loop, where heavier models, retraining, and long-term storage live. Drawing the line between the two is one of the most important design decisions in the project.
Building the Ingestion Pipeline
Whether the upstream sits in AWS, Azure, GCP, or on-prem, the ingestion pipeline has to handle a few realities at once.
Industrial data is bursty. A line may be idle for hours and then produce millions of points in a few minutes during a production run. The pipeline must absorb this without dropping events. It must also keep data ordered, because for predictive maintenance the sequence of events is often more informative than any single value.
A typical setup involves a streaming layer such as Kafka, Kinesis, or MQTT brokers feeding into a stream processor, then landing the data in a time-series store for raw signals and an object store or warehouse for aggregated and labelled data. On top of that sits a feature pipeline that turns raw streams into windowed statistics, spectral features, rolling baselines, or whatever the model expects.
A few things tend to bite teams that have not built this kind of pipeline before:
Schema drift, where a new sensor or firmware update changes a payload silently
Clock skew between edge nodes, which destroys any cross-machine analysis
Backfills, where missing data is replayed and accidentally counted twice
Feature pipelines that work in batch for training but cannot be reproduced exactly in streaming for inference
The lesson is that the same features must be computable in both modes. If training and inference disagree on what a feature means, the model will degrade in production for reasons nobody can explain.
Fast loop at the edge, slow loop in the cloud, predictions back on the line — each column is a separate engineering responsibility
Choosing the Right Models for This Context
Once data flows reliably, the model conversation can finally start. And it is rarely about picking the fanciest architecture.
For predictive maintenance, the useful question is not "which model is best" but "what kind of problem am I actually solving." A few common framings:
Anomaly detection on a single asset with very little labelled failure data
Remaining useful life estimation, where the target is a continuous time-to-failure
Fault classification when enough labelled failure modes exist
Process drift detection, looking for slow shifts rather than sudden faults
Quality prediction, where the target is a downstream defect rather than a machine failure
Each of these calls for a different approach. Unsupervised or self-supervised methods often dominate early in a project, when failure labels are scarce. Autoencoders, isolation forests, and one-class models can detect deviations from a learned baseline of normal behavior. They are imperfect but they give you something useful on day one.
As the project matures and real failures get recorded, supervised learning becomes possible. Gradient boosted trees on engineered features remain very competitive in this space. Deep models, including 1D CNNs, temporal convolutions, and transformers, can outperform them when there is enough labelled data and the signals are rich, such as vibration or acoustic streams.
There is also a growing role for pretraining and post-training in this domain. A model pretrained on large amounts of unlabelled signal data from many assets can capture general patterns of normal behavior, which is then fine-tuned with a small set of labelled events from a specific machine or line. This is similar in spirit to how foundation models are used elsewhere, and it works well precisely because labelled failures are rare and expensive to obtain.
Data Labelling Is the Real Bottleneck
Supervised learning sounds straightforward until you try to collect labels.
In an industrial setting, a "failure" is rarely a single clean event. It may be a slow degradation that ended in a stoppage, a near-miss caught by an operator, a quality defect traced back to a specific machine, or a maintenance intervention that may or may not have been necessary. Labels live in maintenance logs, in operator notebooks, in CMMS tickets, in shift handover notes, and sometimes only in the memory of the technician who fixed the problem.
A serious labelling effort usually requires:
Aligning maintenance records with sensor data on a common time axis
Working with operators and maintenance teams to confirm what really happened
Distinguishing between root cause events and downstream symptoms
Capturing the period leading up to a failure, not just the failure itself
Recording confirmed normal periods, which are just as important as failure windows
This is slow, manual, and unglamorous, and it is where most of the real model performance is won or lost. A modest model on well-labelled data will usually beat a sophisticated model on noisy or inconsistent labels.
Pretraining, Fine-Tuning, and the Long Loop
A useful pattern that has emerged is to separate two timescales of learning.
The first is a long, offline loop where models are pretrained or retrained on large historical datasets, possibly across multiple sites. This is where heavy compute, careful validation, and broad pattern learning live. It is where pretraining on unlabelled signals and post-training on labelled events both happen.
The second is a short, online loop where models are adapted to the current state of a specific asset. This may take the form of recalibrating thresholds, updating baselines, or fine-tuning a head on top of frozen representations. It is what keeps the system honest as wear, seasons, raw materials, and operating conditions shift over time.
Without the short loop, even good models drift. Without the long loop, the system never improves from accumulated experience. Both are needed.
The slow loop pretrains and fine-tunes on historical data; the fast loop keeps thresholds and predictions honest against live signals
Observability Is Not Optional
A predictive maintenance system that cannot be observed will not be trusted, and a system that is not trusted will not be used.
Observability here has two faces. The first is the classic one: live dashboards showing raw signals, derived features, process state, and machine status. Operators and maintenance teams need to see the actual data, not just an alert. When a model flags an anomaly, the first thing anyone will ask is "what does the signal look like right now, and what did it look like before." If that question cannot be answered in a few seconds on a screen, the alert will be ignored.
The second face is model observability. Predictions, confidence scores, predicted labels, and anomaly indicators need to be shown alongside the live data, ideally on the same dashboard. Over a machine, over a production line, over a cell, the relevant predictions should be visible in context. Beyond that, the system should track:
Prediction distributions and how they shift over time
Input feature distributions compared to training data
Alert rates per asset, per shift, per product
True positives and false positives once labels become available
Latency from event to prediction to display
Without this, model degradation goes unnoticed until something breaks. With it, the team can intervene early, retrain on the right data, and build the institutional trust that makes the system actually useful.
Raw vibration, expected range, predicted state, and recent alerts on a single screen — the layout operators actually trust
The Human Layer
It is tempting to treat predictive maintenance as a pure technical problem, but the people in the plant are part of the system.
Operators, maintenance technicians, line supervisors, and reliability engineers all interact with the predictions in different ways. An alert that is meaningful to a reliability engineer may be noise to an operator who needs to keep the line running. A model that asks for an intervention every shift will be silenced within a week. A model that only fires once a quarter will be forgotten.
Designing the alerting layer, the thresholds, the escalation paths, and the user interface is as important as designing the model. Predictions should land in the existing workflow, whether that is the CMMS, an HMI screen, a mobile alert, or a shift report. The goal is not to replace decisions but to inform them.
Conclusion
Predictive maintenance at industrial scale is not really about machine learning. It is about building a reliable path from a sensor on a motor to a decision made by a human, with a model somewhere in the middle.
The infrastructure has to be solid: data connectors that survive plant conditions, edge nodes that handle the fast loop, pipelines that move data without losing it, and feature definitions that mean the same thing in training and inference. The modelling has to be honest: unsupervised baselines while labels are scarce, supervised models as labels accumulate, pretraining and fine-tuning to make the most of both. The observability layer has to make all of it visible, in real time, in context, on screens that people actually look at.
When all of these layers work together, predictive maintenance stops being a demo and starts being part of how the plant runs. When any one of them is weak, the whole system quietly drifts into being ignored, regardless of how good the model is on paper.
The lesson is the same one that shows up in every applied ML project: the model is a small part of the work. Building the rest of the system well is what makes the model matter.
Notes · May 11, 2026
When Food LCA Data Gets Messy: Lessons From Working With Agriculture Datasets
Agriculture and food data look deceptively simple from the outside. A kilogram of wheat, a liter of milk, a ton of tomatoes: these sound like concrete things. But when you start working with life cycle assessment data, especially when comparing or computing food impacts across datasets, you quickly discover that the numbers are not as stable as they appear.
I have spent a lot of time struggling with agricultural and food datasets in LCA. Not because the data is useless, but because it is complex, layered, and easy to misunderstand. The same product can have different impact values depending on geography, modeling choices, elementary flow mapping, precision, allocation rules, and impact assessment method versions.
Over time, I have learned that variance in LCA results is not always an error. Sometimes it is a signal that the underlying assumptions are different.
Regionalization Matters
One of the biggest sources of variation is regionalization.
Agriculture is deeply local. Crop yields, irrigation needs, fertilizer practices, electricity mixes, soil emissions, land use, climate, and supply chains vary significantly from one region to another. A tomato grown in a heated greenhouse in northern Europe is not the same environmental system as a tomato grown in open fields in southern Europe. Beef, rice, coffee, soy, milk, and wheat can all change substantially depending on where and how they are produced.
This creates a challenge when datasets contain regional, national, continental, or global-average processes. If one dataset uses a global average and another uses a country-specific process, the results may differ even if both are "correct" within their own modeling frame.
The lesson: before comparing values, check the geography. A mismatch between GLO, RoW, Europe, and a specific country can explain a lot.
1 kg of tomato modelled in four production contexts — all values are correct within their own frame
Small Values Are Surprisingly Fragile
Another issue I have run into is precision, especially for very small values.
In food LCA, many flows have tiny values: trace emissions, pesticide residues, micronutrient-related flows, land transformation fractions, or small upstream contributions. These values may look irrelevant individually, but when they are rounded, truncated, converted, or aggregated, they can behave strangely.
For example, a value stored with high precision in one system may become 0.0000 in another export. A small methane or nitrous oxide flow may be rounded differently. A tiny elementary flow can become important if its characterization factor is large.
This is especially painful when computing impacts against methods like Environmental Footprint 3.1, where small mapped flows can still influence categories such as toxicity, eutrophication, climate change, or resource use.
The lesson: zeros are not always true zeros. Sometimes they are just lost precision.
Mapping Against EF 3.1 Is Harder Than It Looks
A major source of frustration has been mapping inventory data against EF 3.1.
At first glance, applying an impact method sounds mechanical: take inventory flows, match them to characterization factors, multiply, aggregate. In practice, the difficult part is often the matching.
Flow names may differ. Compartments may differ. Subcompartments may differ. CAS numbers may be missing or inconsistent. One dataset may use an older nomenclature, while EF 3.1 expects another. Some flows map cleanly; others require interpretation. Some should not be mapped at all unless the compartment and context are right.
This can create large differences between computed impact results and reference results. The issue may not be the arithmetic. It may be that "ammonia, air" was mapped correctly, while another flow with a similar name was mapped incorrectly, duplicated, ignored, or assigned to the wrong compartment.
The lesson: impact assessment is only as reliable as the flow mapping. Always audit unmatched, ambiguously matched, and multiply matched flows.
Clean matches, ambiguous matches, and missing flows all flow through the mapping layer — each one shifts the result
Agriculture Has Modeling Choices Everywhere
Beyond geography and mapping, agricultural LCA contains many methodological choices that can shift results:
Allocation between co-products, such as milk and meat, oil and meal, grain and straw
Treatment of biogenic carbon
Land use and land use change assumptions
Fertilizer emission models
Manure management assumptions
Irrigation and water scarcity regionalization
Organic versus conventional production systems
Yield assumptions
Farm-gate versus retail or consumption boundaries
Inclusion or exclusion of packaging, processing, storage, transport, cooking, and waste
These are not small details. They define the system being measured.
Two datasets may both describe "1 kg of food product," but one may stop at farm gate while another includes processing and packaging. One may allocate burdens economically, another physically. One may include land use change, another may not. The numbers can diverge before anything is technically wrong.
The lesson: the product name is not enough. You need the system boundary and modeling assumptions.
Three datasets, same '1 kg of product', three different system boundaries — the numbers diverge before anything is technically wrong
Reference Units Can Be Tricky
Food data often moves between units: kilograms of fresh product, dry matter, protein content, edible portion, cooked weight, raw weight, market weight, or economic value.
This creates subtle but serious comparability problems. A dataset for "1 kg maize grain" is not necessarily comparable to "1 kg maize at farm," "1 kg dry maize," or "1 kg maize meal." Moisture content alone can change the interpretation. For animal products, edible yield and carcass allocation can complicate things further.
The lesson: always check what the reference flow actually represents.
The Data Is Not Broken. It Is Contextual.
The biggest lesson I have learned is that agricultural LCA data should not be treated as a single universal truth. It is contextual data produced through methodological choices.
Variance does not automatically mean one dataset is wrong. It can mean the datasets are answering slightly different questions.
That said, this does not mean "anything goes." Good LCA work requires transparency, traceability, and careful interpretation. When results differ, the task is to understand why:
Is it geography?
Is it precision?
Is it flow mapping?
Is it allocation?
Is it system boundary?
Is it impact method version?
Is it unit conversion?
Is it a missing or unmatched flow?
Once you start asking those questions, the variance becomes less mysterious.
Conclusion
Working with agriculture and food datasets in LCA has taught me that visibility is essential before transformation.
Before mapping flows, applying EF 3.1 characterization factors, normalizing units, regionalizing processes, or aggregating results, it is important to first understand what is actually in the dataset. Simple first-stage data analysis can reveal many of the issues that later become difficult to debug: missing flows, unexpected zeros, different levels of precision, regional inconsistencies, unit mismatches, duplicate mappings, and unusual outliers.
In other words, the first step should not be transformation. It should be observation.
For food and agriculture data, this is especially important because variance is not always a mistake. It can come from real differences in geography, farming systems, modeling assumptions, or methodological choices. Without enough visibility into the raw data, it becomes very easy to "fix" something that was not broken, or to hide an important signal through aggregation.
A good LCA data workflow should therefore start with transparency: inspect the dataset, profile it, compare distributions, identify gaps, and understand the assumptions before applying complex transformations. Only then can mapping, computation, and interpretation be done with confidence.
The lesson is simple: before trying to make agricultural LCA data consistent, make it visible.