Authors - Md Mahmudul Hoque, Md Kawser Islam, Md. Mamunur Rahman Moon, Abdullah Rakib Akand, Md. Hadi Al-amin, H.M. Azrof Abstract - The automatic recognition of virus particles in transmission electron microscopy (TEM) images remains a demanding task, primarily owing to strong inter-class similarity, scale variability, and pronounced class imbalance. In this study, several convolutional neural networks and transformer-based architectures were comparatively evaluated for the classification of 22 virus categories using the TEM virus dataset. All models were trained under identical preprocessing and optimization conditions, and imbalance effects were mitigated through a weighted crossentropy formulation. Performance was quantified using overall accuracy together with macro-averaged precision, recall, and F1 score. Among standalone models, the Swin Transformer achieved the highest accuracy (0.8831) and macro-F1 score (0.8444), followed by DeiT (accuracy 0.8669). Convolutional architectures exhibited comparatively lower balanced performance, with ResNet50 demonstrating substantial degradation (accuracy 0.5887) under imbalanced conditions. To exploit complementary representational properties, decision-level hybrid strategies were implemented. The performance-weighted hybrid attained an accuracy of 0.8831 and the highest macro-F1 score (0.8528), slightly surpassing the equal-weight hybrid configuration. These observations indicate that architectural heterogeneity contributes to improved inter-class balance without sacrificing overall predictive accuracy. Future work may explore scale-aware representations, feature-level fusion mechanisms, and expanded TEM datasets to further enhance robustness and generalization in virus identification tasks.