Loading…
Saturday April 11, 2026 9:30am - 11:30am GMT+07

Authors - Kalidasu Lochani Krishna Priya, Nupur Ajit Kale, Apeksha Pandurang Mujumale, Anagha Vijaysinha Rajput
Abstract - The  large  online  data  consist  of  duplication  and  plagiarized  contents. Due  to  Artificial  Intelligence,  data  generation  has  become  very  easy.  But,  it may  also  lack  an  ethical  data  generation  process.  Hence,  there  is  a  need  of validating  plagiarism  free  data  for  authentic  usage.  In  this  research  work, authors  focus  on  word-level  plagiarism  detection  methods  in  Natural  Language Processing.  The  proposed  method  uses  a  comparative  analysis  of  cosine similarity,  Euclidean  distance  and  Manhattan  distance  methods  for  word-level plagiarism  detection  for  different  n-gram  sizes.  The  inculcation  of  n-gram  size improved  the  accuracy  compared  to  unigram  based  methods.  The  experimental results  of  the  cosine  similarity  method  outperform  Euclidean  and  Manhattan distance  methods  by  achieving  an  average  accuracy  range  of  88  %  to  92  %  and 75  %  to  80  %  for  direct  plagiarism  and  lightly  paraphrased  text  respectively. The future work is to identify reused images and visual contents.
Paper Presenter
Saturday April 11, 2026 9:30am - 11:30am GMT+07
Virtual Room F Bangkok, Thailand

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link