Professional networking firm LinkedIn today announced a new open-source toolkit named FeatureFu which helps developers to build their machine learning models around statistical modelling and decision engines. The feature uses a small Java directory called Expr that developers can use to edit and build over an existing set of features.
The company is aiming to unify the feature engineering process, thereby removing one of the major drawbacks many large-scale recommendation systems face.
The problem, according to LinkedIn, is that most of today’s systems constitute of two major teams: one that handles the offline modeling and one that takes care of the online feature-serving/model-scoring part of the system. The division in the system leads to many problems that LinkedIn believes can be solved by FeatureFu.
[…] This system is brittle and vulnerable to online/offline parity issues because features generated can be different due to subtle implementation discrepancies and dependencies. Additionally, a small change in feature generation (e.g. binning a continuous numeric feature into a few discrete bucketized features) requires a significant amount of work – likely all that is needed for an online code change – with a long turnaround period. This is typically a blockade in experimenting feature/model techniques.
With that aside, the main motto behind this new open-source kit is encouraging feature engineering. The company wants to make feature engineering more powerful than ever using this toolkit.
LinkedIn believes that when their business needs a software, they first look to see if there are pre-existing software projects in open source. If they aren’t able to find any, only then do they opt for creating a service themselves. Also, if the software isn’t a business differentiator, they often open-source it, so that others can use it too.
In the blog post announcing the toolkit, Bing Zhao, Senior Software Engineer at LinkedIn writes about the future of FeatureFu:
In future versions, we will introduce more feature generation and analysis tools.We also welcome contributions of all kinds including pull requests, code contribution, bug reports, documentation enhancements and new ideas or feedback !