Project information
- Client: Ceramaret
- Categories: Unsupervised classification, Multi-label classification, Natural Language Processing
- Main technologies: Python, Scikit-Learn, Spacy
Summary
K-Défauts was developed in collaboration with Ceramaret, a leading manufacturer of advanced ceramic components. Due to the highly fragile nature of ceramics, the company was experiencing a ~10% material loss during production. My objective was to automate the categorization of these production errors by analyzing the textual "notices of non-compliance" written by quality engineers.
These quality reports detailed complex, multi-faceted issues—ranging from functional and dimensional defects to aesthetic and pollution-related problems. By leveraging Natural Language Processing (NLP), I built an automated text classification pipeline to map these raw engineer notes directly into actionable defect categories.
Technical Approach & Results
The initial data was completely unlabeled. To solve this, I first applied unsupervised classification techniques to cluster the textual descriptions and prove that distinct defect categories could be mathematically isolated.
Once a labeled dataset was established, I developed a supervised NLP model using Python, Scikit-Learn, and spaCy. To handle the limited initial data, I implemented textual data augmentation, which successfully increased the weighted average F1-score from 0.64 to 0.73.
The Result: As the project scaled, I transitioned the architecture to handle a larger, multi-labeled dataset (where a single part could have multiple types of defects). The final model achieved a highly robust 85% accuracy across seven distinct defect categories, providing Ceramaret with an automated tool to track and mitigate production losses.