Project information

  • Client: Ceramaret
  • Categories: Unsupervised classification, Multi-label classification, Natural Language Processing
  • Main technologies: Python, Scikit-Learn, Spacy

Summary

K-Défauts was a project that was commissioned by Ceramaret, a ceramic components manufacturing company. Due to the fragile nature of ceramic material, Ceramaret was experiencing a loss of around 10% that they wished to reduce. In cases where a piece was incorrectly manufactured, a quality engineer would write a notice of non-compliance detailing the issue(s) at hand. These issues could range from improper dimension and functionality to aspect and pollution-related problems.

The primary objective of the project was to classify textual descriptions of the issues into one or multiple classes. To achieve this goal, an unsupervised classification was initially conducted to see if the data could be divided into different classes. Since it was the case, labeled dataset was subsequently provided enable classification into a single class. Textual data augmentation was implemented to improve the results, which resulted in a significant increase in the weighted average f1-score from 0.64 to 0.73.

A larger dataset that was multi-labeled was later furnished, which enabled the team to achieve a final accuracy of 85% when at least one label was wrong (out of seven labels).

The project was realised in Python, with Scikit-Learn used for the machine learning aspects, while natural language processing was done with Spacy.