Authors - Sona Ravindran, K Nattar Kannan Abstract - This research examines the transfer learning deep learning models in multimodal human activity recognition based on wearable sensor data. Raw IMU signals are converted to Gramian Angular Field (GAF) images to improve the feature representation and tested on WISDM and PAMAP2 datasets of 18 activity classes. Five CNN models, namely VGG16, MobileNetV2, ResNet50, DenseNet121, and EfficientNetB0, are trained and evaluated in the same conditions and measured by classification accuracy, statistical significance, and computation efficiency. GAF representations are always better than raw signals. DenseNet121 and ResNet50 have 99% accuracy, VGG16 and MobileNetV2 perform competitively and EfficientNetB0 performs worse. Most of the differences in performance are statistically significant (p < 0.05).