Project information

  • Client: Ceramaret
  • Categories: Unsupervised classification, Multi-label classification, Natural Language Processing
  • Main technologies: Python, Scikit-Learn, Spacy

Summary

K-Défauts was developed in collaboration with Ceramaret, a leading manufacturer of advanced ceramic components. Due to the highly fragile nature of ceramics, the company was experiencing a ~10% material loss during production. My objective was to automate the categorization of these production errors by analyzing the textual "notices of non-compliance" written by quality engineers.

These quality reports detailed complex, multi-faceted issues—ranging from functional and dimensional defects to aesthetic and pollution-related problems. By leveraging Natural Language Processing (NLP), I built an automated text classification pipeline to map these raw engineer notes directly into actionable defect categories.

Technical Approach & Results

The initial data was completely unlabeled. To solve this, I first applied unsupervised classification techniques to cluster the textual descriptions and prove that distinct defect categories could be mathematically isolated.

Once a labeled dataset was established, I developed a supervised NLP model using Python, Scikit-Learn, and spaCy. To handle the limited initial data, I implemented textual data augmentation, which successfully increased the weighted average F1-score from 0.64 to 0.73.

The Result: As the project scaled, I transitioned the architecture to handle a larger, multi-labeled dataset (where a single part could have multiple types of defects). The final model achieved a highly robust 85% accuracy across seven distinct defect categories, providing Ceramaret with an automated tool to track and mitigate production losses.