Non-invasive classification of cultured cells is an important task for in vitro drug panel screening, phenotypic profiling for manufacturing therapeutic cell lines/proteins, and the evaluation of culture health and crosscontamination. Holographic microscopy and machine learning (ML) are attractive tools for these applications but must address two challenges: high performance in multi-class classification problems (< 2 classes), and identification of important image-derived features for a given classification problem. This paper aims to achieve high performance classification of four cell lines, two breast cancer (MDA-MB-231 and MCF-7) and two noncancer (human gingival fibroblast, HGF; and human gingival keratinocytes, GIE no3B11), with varied epithelial and mesenchymal morphologies; to determine the best machine learning model for this classification; and to identify features most strongly influencing model performance. We trained and evaluated three ML algorithms: Support Vector Machines (SVMs) with various kernels, Random Forest, and AdaBoost, using a previously defined set of 17 features derived from holographic microscopy images; selected components after Principal Component Analysis (PCA); and a subset of original features after feature selection. Grid searching was conducted to determine the optimal set of hyperparameters for each machine learning algorithm before training. The multi-class ML model created using the Random Forest algorithm was reliable and had an average F1 score of 0.89, 0.86, 0.84, and 0.84 for GIE, HGF, MCF7, and MDA cell lines respectively. Moreover, with the feature selection technique, the model performance for each cell line can be further improved. Top important features were geometric (area, perimeter, eccentricity), histogram-based (skew) and textural (contrast, correlation, homogeneity).
|