Loading…
Thursday April 9, 2026 12:15pm - 2:15pm GMT+07

Authors - Karuppasamy E, Krithika V, Harish P, Pravinbaalaa V, Satheeskumar
Abstract - The large online data consist of duplication and plagiarized contents. Due to Artificial Intelligence, data generation has become very easy. But, it may also lack an ethical data generation process. Hence, there is a need of validating plagiarism free data for authentic usage. In this research work, authors focus on word-level plagiarism detection methods in Natural Language Processing. The proposed method uses a comparative analysis of cosine similarity, Euclidean distance and Manhattan distance methods for word-level plagiarism detection for different n-gram sizes. The inculcation of n-gram size improved the accuracy compared to unigram based methods. The experimental results of the cosine similarity method outperform Euclidean and Manhattan distance methods by achieving an average accuracy range of 88 % to 92 % and 75 % to 80 % for direct plagiarism and lightly paraphrased text respectively. The future work is to identify reused images and visual contents.
Paper Presenter
Thursday April 9, 2026 12:15pm - 2:15pm GMT+07
Virtual Room F Bangkok, Thailand

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link