Authors - Seungmin Lee, Ju-Won Park Abstract - The module-based static operating environment, which is widely used in domestic and international supercomputer operating centers, encounters numerous problems in supporting artificial intelligence / machine learning (AI/ML) parallel workloads because the variety of platforms and packages used make it difficult to build all execution environments. To address these issues and dynamically provide diverse execution environments, container-based cloud technologies are being widely utilized in high-performance computing (HPC) cluster systems. However, container runtime toolkits like Shifter and Singularity, which are widely used in the HPC field, present problems, such as the need for image format conversion, writing scheduler job script files, environmental setup, and direct management of the container lifecycle. This study proposes a solution to these problems by utilizing Kubernetes, which has become the de facto standard for container orchestration as it supports AI/ML parallel workloads even in HPC environments. Supporting Kubernetes-native parallel workload execution offers several advantages. First, image conversion is unnecessary because it directly uses Docker images. Second, human errors are minimized because the operator automatically handles the environment setup required for parallel execution. Third, in case of failures, automatic recovery and re-execution are possible by leveraging Kubernetes’ powerful container lifecycle management capabilities. In addition, this study introduces the distributed learning function of the KISTI Supercomputer web portal (MyKSC), which has been implemented using the proposed method.