What is Data Pitch?
The Data Pitch Innovation programme is EU-funded, designed to use shared data to build sustainable businesses that generate economic and social growth. Within the six-month programme, we received mentoring, training, introduction to investors, equity free funding plus many more benefits.
On the Data Pitch Accelerator, delivered by the Open Data Institute (ODI) and the University of Southampton, Luna Connect applied machine learning models to automate lending for SMEs across the UK and Ireland, therefore providing quicker lending decisions.
Luna Connect is a SaaS platform for digital lending that reduces costs and risk using alternative data and machine learning, taking the hassle out of online lending. Our platform provides banks, credit unions and alternative lenders with the tools they need to process loan applications online, gathering supporting data digitally and using artificial intelligence to automatically process the data and make better, quicker decisions.
Lending to SMEs has traditionally been high risk and high cost. Taking up to 6 weeks to get from application to decision. Through the work we did on the Data Pitch programme our AI model can make a prediction in less than a second.
5 things we have learnt from the Data Pitch Programme
1. Availability of financial and banking data
Finding an openly accessible data source for financial transactions and historical lending data is not easy due to the private nature of banking data.
Action: Publicly available datasets proved inadequate for building an accurate credit decision model. We are working alongside customers providing private datasets with banking transactions and historical loan applications. By training our model with this private data, we see the benefits and value of using real banking datasets.
2. Data pre-processing
The number one problem facing Machine Learning is the lack of good data. Good quality data is essential for the algorithms to function as intended and to ensure a high predictive accuracy. We discovered many data quality issues due to manually entered loan application data.
Action: To overcome this, we spent time on data exploration and analysis to fully understand our datasets. We implemented steps for data cleaning, dealing with missing data and selecting features for our credit decision model.
3. Imbalanced datasets
Our historical loans dataset was imbalanced in terms of the ratio between loans paid and loan defaults, an unequal representation of classes for our machine learning model. This is expected as it reflects the real-life situation, however we need a model that is also very accurate at predicting the minority class, loan defaults.
Action: We took a data approach to overcome this challenge by using over sampling techniques for the minority class in training the model. Using oversampling on the imbalanced datasets improved our accuracy measures.
4. Accuracy metrics for loan credit decision
Predictive accuracy is not necessarily the best measure for credit decision models. Due to the imbalanced dataset, often a high accuracy can be achieved by simply predicting the majority class all of the time. Therefore, the model needs to good at predicting the minority class also.
Action: In our credit decision model, a low false positive rate is more significant than a low false negative rate. In a real-world practical situation, a potential borrower that is classified as good when he is actually a bad credit risk, is a financial cost to the lending company. This is far worse than predicting a good customer as bad.
5. Deploying machine learning model to production
Business value comes from deploying machine learning models into production. Only in production can a deployed model actually serve the business. And unfortunately, the path to production remains difficult for many companies. Model serving is an essential component of a production machine learning infrastructure stack. It’s how you convert your trained model into an API for real-time predictions - this is needed to run a prediction on a loan application as it is being submitted in real time.
Action: Moving machine learning models from training and testing locally to serving in production at scale is a challenge. We created a custom hosting solution for our models and learnt to be aware of common problems which can include model versioning, retraining the model, preprocessing input before predictions and scaling infrastructure on-demand.