Predicting the Level of Damage Caused by the Earthquake. Protogen Team Has Joined a Competition 

The “Richter’s Predictor: Modeling Earthquake Damage” is a competition hosted by DrivenData with more than 2400 competitors. The goal is to predict the level of damage to buildings caused by the 2015 Gorkha earthquake in Nepal based on aspects of building location and construction.

To tackle this challenge, we built and used our Automated Machine Learning model (Protogen AutoML) which is designed for tabular type data and contains ensembles of different state-of-the-art models. The competition is not yet finished but our team is already at 29th place in it!

With every project, we always tried to build effective methods with high prediction accuracy and low risks. However, the research and hyperparameter tuning are too time-expensive, so the trade-off between time and model performance was inevitable. This motivation led us to build an automated mechanism that will outperform existing AutoML models. A big part of the challenge was to build a complete system. The pipeline consists of quite a number of steps which we tried to build as optimal as possible. Our approach takes over the problem of simultaneously selecting a learning algorithm and setting its hyper-parameters. The main difference from other ensemble models is that this AutoML trains a meta-learner each epoch and tunes the hyper-parameters in respect of that result. 

The “Richter’s Predictor: Modeling Earthquake Damage” competition was a good fit for us to analyze the performance of our system. Without any external preprocessing and using only original features provided by the competition we already could get 73.4% F1-score, whereas the winner now got 75.58% F1-score. Later, using the information about the importance of features in our model we could generate new features and aggregations and boost the performance of the model up to 75.08% F1-score. Our result outperformed Microsoft’s Azure Automated Machine Learning system with ~1.2%.

Protogen is a financial credit risk scoring system built on state-of-the-art machine learning techniques. The system aims at predicting the probability of customers defaulting, with high accuracy and speed, as well as providing post prediction analysis tools for the interpretation of the system’s workings. It takes into account social and open banking data to build credit history patterns including, a sentiment model for evaluating real-time customer relation status. It does this by having a recommender system that offers relevant cross and upsells suggestions, and it can be extended with behavioral models based on customer’s actions in real-time.

Protogen is using modern machine learning algorithms that sequentially learn from their mistakes. For financial data that generally contains a lot of missing values and outliers, this is the best approach since from one side advanced data imputation techniques are being used, from the other side the model can handle unbalanced datasets, which is a common case for credit default risk-related data.


  • Topics:
  • Artificial Intelligence

Top Stories

High Five! You just read 2 awesome articles, in row. You may want to subscribe to our blog newsletter for new blog posts.