Fuelling Intelligence: How Mid-IR Spectroscopy Builds the Ultimate Ethanol Process Model

In the race for higher yields and lower carbon intensity, data is the new feedstock. While traditional HPLC provides a snapshot of the past, Keit’s IRmadillo spectrometer offers a continuous stream of real-time chemical data. But the true power of this technology lies in how that data can be used to build machine learning (ML) models that don't just monitor the process, but actively optimize it.

Here is how you can use Mid-IR spectroscopy data to build a smarter, predictive production model.

  1. The Foundation: High-Density Data Collection

Any robust machine learning model requires high-quality training data. The IRmadillo installs directly into the process lines—whether in liquefaction, propagation, fermentation, or distillation—and simultaneously measures multiple chemical species, including Ethanol, Sugars (DP1-DP4+), Lactic Acid, Glycerol, and Nitrogen (PAN/FAN).

Unlike daily lab samples, this generates a dense, continuous dataset. This "live feed" allows you to capture the subtle dynamics of process changes that infrequent sampling misses, creating a rich historical database essential for training predictive algorithms.

  1. Training the Model: Calibrated concentration data or raw spectra?

Machine learning models can be trained, either from the calculated concentration values that the IRmadillo is usually calibrated to produce, or directly from the raw spectra. Changes in the spectra reflect the concentrations of DP$, ethanol, lactic etc, but they also contain far more information than that. The ML model-building uses all the relevant data available in the spectra to build a more robust model.

Using the calibrated concentration data enables the user to determine what failure mode is in play and can thus be useful when determining what remedial actions to employ, but, once calibrated, the concentration data has reduced value as the model will simply be able to correlate spectral variation with specific failure modes. In this way, the model could identify “lactobacillus infection” without directly measuring lactic acid growth.

  1. Closing the Loop: Real-Time Optimization

The ultimate goal of this model is actionable control. With real-time insights, the model can drive decisions that directly impact the bottom line:

  • Infection Correction:a good statistical or machine learning model can detect the onset of an infection hours earlier than your HPLC reading can. This enables automatic dosing of antibiotics, only using the amount required based on the rate of lactic growth
  • Smart Dosing: Detect nitrogen shortages (FAN/PAN) early and automate urea or ammonia dosing to rescue a batch.
  • Optimized Inputs: Use real-time feedback to reduce enzyme usage during liquefaction without sacrificing hydrolysis.

By transforming raw spectral data into a predictive machine learning model, producers can move from reactive troubleshooting to proactive optimization—securing higher yields and a lower Carbon Intensity (CI) score.