The Team


In our previous posts, we outlined our EDA, baseline models, and data loading procedures. We also introduced the models we were using to classify cassava leaf diseases and provided checkpoints on our procedures, as well as issues we came across. We hope to address these issues in this post. Links to the first two posts are below!

Part 1

Part 2

Loading with Raw Data & Applying VGG/ResNet

Due to the imbalance we found in the EDA, we change the splitting method from simple train test split to Stratified Kfold with n_split equals the number of labels. This keeps the ratio…

The Team

Recap from Part 1

On our previous post, we introduced our problem and dataset, which was a multi-label classification problem where we have to correctly distinguish a cassava leaf disease from a pool of 5 different cassava leaf diseases. We conducted EDA on our dataset and discovered that the majority class (Cassava Mosaic Disease) occupied over 60% of the dataset. Though this disease makes up a significant portion of our dataset, one might argue that the dataset can’t truly be considered “imbalanced”, unlike anomaly detection datasets, where data is imbalanced 90-to-10 in many cases.

After EDA, we…

The Team

The Problem + Dataset

This is the first of three blog posts that documents our group’s experience solving an image recognition problem using ML/DL methods.

Our dataset, drawn from the Cassava Leaf Disease Classification Kaggle competition, is comprised of over 20,000 images of cassava leaves taken from relatively inexpensive cameras. In addition, the dataset also provide mappings of each image to a disease/health status. These mappings include:

  • Cassava Bacterial Blight (CBB)
  • Cassava Brown Streak Disease (CBSD)
  • Cassava Green Mottle (CGM)
  • Cassava Mosaic Disease (CMD)
  • Healthy

Naturally, the goal of this project is to successfully distinguish multiple diseases…

At first glance, support vector machines sound like something you’d find in Tesla’s gigafactory, but they’re anything but that. In this article, we’ll be running through the (mathematical) mechanics of support vector machines, surveying its many applications, and examining the usage of support vector machines in facial recognition technology, along with its ethical considerations.

What are support vector machines?

This section will provide you everything you need to know about SVMs in a quick, but comprehensive manner before we explore how they’re applied in different contexts, especially in facial recognition.

The support vector machine (SVM) is a supervised machine learning algorithm that can perform classification…

Kevin Le


Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store