Bone classification is fundamental in paleontology, evolutionary biology, and museum collection management, where natural history museums house billions of specimens documenting biodiversity. Traditional bone identification has relied on specialist expertise and manual methods, but with rapid digitization of collections and advances in 3D scanning, computer-assisted approaches are increasingly adopted. Among digital formats, 3D point clouds (Qi et al. 2017a) produced using scanning techniques such as surface scanning, photogrammetry, and computed tomography (CT) have become preferred representations as they preserve fine structural details while remaining computationally efficient. However, the absence of inherent ordering in 3D point space poses fundamental challenges for conventional deep learning models, and scaling manual classification to vast digital collections remains problematic.<br> This study evaluates and compares state-of-the-art deep learning architectures, specifically designed for 3D point cloud processing, on mammalian bone classification. We assembled a dataset of 1,641 3D bone specimens representing 14 bone-type classes, 302 species, 203 genera, 40 families, and 28 orders, obtained through digitization as part of the e-COL+ project*1, which aims to digitize a wide range of zoological specimens in 3D and is led by the MNHN, Paris. This taxonomic breadth increases morphological variability and provides a valuable benchmark for evaluating generalization capacity. All 3D meshes were standardized by converting to point clouds, downsampled to 1,024 points using Furthest Point Sampling (FPS) algorithm, and normalized to unit spheres (Fig. 1).<br> Training data underwent augmentation including random rotations, scaling, and Gaussian jittering, with an 80–20 train-validation split. We evaluated seven architectures: PointNet (Qi et al. 2017a), PointNet++ (Qi et al. 2017b), Dynamic Graph CNN (DGCNN) (Wang et al. 2019), Point Transformer (Zhao et al. 2021), Point Cloud Transformer (Guo et al. 2021), PointMLP (Ma et al. 2022) and PointNeXt (Qian et al. 2022). Models were implemented in PyTorch, trained for 150 epochs using Adam optimizer with step-based learning rate scheduling, and employed cross-entropy loss with label smoothing and class weighting to address imbalance.<br> Results in Table 1 demonstrate clear performance differences in overall accuracy, mean accuracy and f1-score metrics across model families. PointMLP achieved the highest performance with 98.29% ± 0.2% overall accuracy, showing excellent discriminative capacity and stable generalization. PointNeXt and Point Transformer followed closely with 98.19% ± 0.5% and 97.89% ± 2.4% overall accuracy respectively, highlighting effective feature learning capabilities. Earlier architectures like PointNet performed substantially worse at 82.53% ± 2.4% overall accuracy, reflecting limited capacity for capturing local geometric dependencies. The progression from PointNet to PointNet++ (87.95% ± 0.5%) demonstrates the importance of hierarchical feature learning, while DGCNN (93.27% ± 1.7%) and PCT (95.88% ± 0.8%) show intermediate performance levels. Notably, PointMLP exhibited the lowest variance across all metrics, indicating superior training stability and generalization consistency.<br> The findings reveal that architectures emphasizing sophisticated feature aggregation mechanisms (PointMLP, PointNeXt, Point Transformer) currently provide the most accurate and generalizable solutions for bone point cloud classification. Modern MLP-based approaches like PointMLP demonstrate particular effectiveness, combining computational efficiency with superior performance (98.01% ± 0.3% mean accuracy, 97.89% ± 0.3% mean F1-score). Advanced architectures including Point Transformer and PointNeXt achieve comparable accuracy levels (&gt;97% overall accuracy) while offering different trade-offs between model complexity and geometric reasoning capabilities. Earlier generation models (PointNet, PointNet++) remain viable for baseline comparisons but exhibit significant performance gaps relative to state-of-the-art architectures.<br> This research demonstrates the potential of point cloud deep learning for advancing automated morphological analysis in biodiversity research. Despite limited training data, these models accurately classify complex biological structures, with top-performing architectures exceeding 98% classification accuracy, underscoring deep learning effectiveness for such applications. The success of modern deep learning architectures opens new possibilities for large-scale automated classification of museum collections, supporting digitization efforts of natural history institutions worldwide. Future work should focus on investigating the scalability of high-performing architectures to larger taxonomic datasets and exploring multi-modal approaches combining geometric and taxonomic information.