By Dave DeFusco
When it comes to building effective machine learning models, one of the most critical steps is feature engineering. This process involves selecting and transforming raw data into meaningful “features” that the model can use to make predictions. Done well, feature engineering can significantly enhance a model’s accuracy and reliability.
In a recent study, “Mutual Information Reduction Techniques and its Applications in Feature Engineering,” researchers in the Katz School’s Graduate Department of Computer Science and Engineering explore which features matter most and how they work together in the best way. They presented their findings at the 2025 IEEE International Conference on Consumer Electronics in January.
The traditional approach relies heavily on mutual information (MI), a statistical measure that tells us how much one piece of information reveals about another. For example, in a machine learning model predicting loan defaults, mutual information could identify features like credit score or income as highly relevant to the prediction.
While MI is powerful, most methods focus solely on maximizing the MI between features and the target outcome. This ensures the model gets the most relevant data, but it overlooks an important problem: redundancy among features. For instance, if two features—like monthly income and annual income—are highly correlated, they add repetitive information that can clutter the model and slow it down.
“We introduce a new way of thinking: instead of only looking for features with the highest MI scores, our study also focuses on reducing mutual information between the features themselves,” said David Li, senior author of the paper and program director of the M.S. in Data Analytics and Visualization. “This is important because by minimizing redundancy, we create a set of features, each adding unique, valuable information to the model. This approach ensures the model becomes more efficient, accurate and better at handling complex tasks like classification.”
The method starts with an MI matrix, which shows how much information each feature shares with others. By applying mutual information reduction techniques, the process identifies and removes overlapping information. This results in a refined dataset where each feature stands out for its unique contribution.
The researchers also incorporated Weight of Evidence (WOE), a transformation technique that boosts a feature’s predictive power, particularly in binary classification tasks like “yes/no” or “approve/reject” decisions. WOE captures subtle nuances in the data, ensuring that even after redundancy is reduced, the features remain highly informative.
To test their method, the researchers applied it to a loan default dataset. Using a “brute-force” method to fine-tune parameters, they successfully minimized mutual information between features. The result? A leaner, smarter model with significantly reduced redundancy.
They then layered on the WOE transformation, which further enhanced the model’s performance—especially for logistic regression models commonly used in risk management. This dual approach not only improved accuracy but also offered better insights into the factors driving loan defaults.
This breakthrough offers a smarter way to build machine learning models that aren’t bogged down by irrelevant or repetitive data. The implications are vast:
- Efficiency: Less redundant data means faster models.
- Accuracy: A cleaner feature set improves predictive performance.
- Interpretability: Models built on unique, non-redundant features are easier to understand, making them invaluable in fields like finance, healthcare and customer analytics.
“The study opens the door to new possibilities,” said Ruixin Chen, lead author of the study and a student in the M.S. in Artificial Intelligence. “Future research could explore automated ways to optimize mutual information reduction, apply the technique to more complex datasets or expand its use in unsupervised learning tasks.”