Overview
Preventing plagiarism in online assessments has always been important. The integrity of an assessment also relies on plagiarism detection.
Note: The Plagiarism detection is available only for the Coding question type of HackerRank.
To support HackerRank's diverse customer base with different needs and assessment purposes, HackerRank offers a code similarity model of plagiarism detection and a new AI-powered plagiarism detection system. To check the AI-powered plagiarism detection, see Plagiarism Detection Using AI.
Plagiarism Flag
Our plagiarism flag is an indicator that someone might have plagiarized the code. Although we use code similarity and behavioral signals, we cannot determine the source of plagiarism. We recommend that a developer should review the code in the playback to decide if this is an actual case of plagiarism or not. We do not recommend auto-rejecting a candidate based on the plagiarism flag.
- The Codeplayer will only show the most recent language used by the candidate for submitting a solution to a question.
- If questions do not have the playback option available, HackerRank recommends you refer to other signals available on the Candidate report and additional signals in the CSV report to decide on the candidate attempt.
Detecting Plagiarism
HackerRank uses a primitive Moss (Measure of Software Similarity) approach to detect plagiarism, in addition to a new approach of Plagiarism Detection Using AI.
HackerRank introduces the AI-powered Plagiarism Detection solution with a 93% accuracy rate, reducing false positives dramatically. The AI-powered solution flags cases of cheating reliably, even with tools such as ChatGPT. We recommend you to use AI-powered plagiarism detection for detecting suspicious activity, however if you want to go with the Moss approach, see the section Using the Moss Approach.
Using the Moss Approach
Moss is an improved algorithm that tokenizes the code. The tokenized versions of all candidates' source codes are compared to identify pairs of documents with substantial overlap. Some candidates try to change the variable name or introduce white spaces to deceive plagiarism detection. Moss typically does not work in their favor because the program's structure is unchanged, and the number of tokens and line matches between the documents remains the same. Additionally, we will look at the candidate's suspicious activities (like copy-paste and so on) and other behavioral signals during the attempt to further ascertain whether the candidate plagiarized.
The plagiarism check is conducted for the same question used within and across HackerRank customers. Additionally, we will also look for cases of plagiarism across different versions of a language as well. While flagging for plagiarism, the recent submission of a candidate is compared across all the previous submissions. We will also flag the previous submission for plagiarism only if there is some overlap in duration between the two attempts.
Verifying Plagiarism Activity on the Candidate Report
- Navigate to the Tests page and select the required Test.
- Click the Candidates tab, and select a Candidate who has been flagged with a suspicious activity. Note that a red triangular icon next to the Candidate's name indicates that a candidate has been flagged for plagiarism.
- The Candidate’s Test Summary page will display the Suspicious Activity tile, including indications of plagiarism.
- Click the View option in the Suspicious Activity tile or click View report for a question. The report of the candidate is displayed.
- The Attempt Activity Tab on the report will show if the candidate has submitted code that has been flagged as plagiarized.
- Plagiarism is detected, and you will review the flagged behaviors to confirm or override the plagiarism flag.
- Click Yes or No based on the suspicious activity of the candidate. If you click No, the plagiarism flag for that candidate will be removed. Clicking Yes retains the plagiarism flag for that question. Note that you can select the Yes or No option only after reviewing the plagiarism detection from the Attempt Activity tab.
- Click View Matches. The plagiarism matches are displayed for the candidates.
- Click View in the Output Diff column to check the candidate's suspicious activity.
- The candidate’s suspicious activity is displayed:
Understanding Plagiarism Activity on the Candidate Report
HackerRank's plagiarism detection tool also shows matches with other candidates across HackerRank for Work who might have submitted the same or similar code to the question in any test. We match for a similar code with the following:
For information on the additional indicators of plagiarism detection and how to use them, see the Plagiarism Best Practices guide.
- All candidates attempted any test across any company on the HackerRank for Work platform.
The code matching happens in the following way:
- If a matched candidate is from the same test as the current one, you can see the code match percentage, the date, similar code submission, and the candidate's email.
- If a matched candidate is from a different test, but you have access to the test, you will still be able to see the code match percentage, date of identical or similar code submission, name of the test, and the candidate's email.
Note: If a matched candidate is from a different test that you do not have access to, you will still be able to see the code match percentage date of duplicate or similar code submission but not the test name or candidate email.
- Code Match Percentage and View Output Diff
The Match Percentage shows the percentage of matches between the current candidate's code and matching codes. To the far right of the match, the percentage is the option to 'view diff,' clicking displaying matching codes with highlighted differences. Based on this, possible, partial, or no-match scenario, we recommend the match percentage.
Based on the result, the user can take a call as to how to proceed with the candidature.
The following screenshot shows how Moss can detect candidates who have the same structure of code and logic used but changed the variable names, used for loop instead of while loop.
The tags HIGH, MEDIUM, and LOW are calculated on a match percentage as shown below:
- HIGH: >= 90 %
- MEDIUM: >= 80% AND < 90%
- LOW: < 80 %
Override the Suspicious Activity Detection
The system is not fully automated and is a decision support system for Hiring Managers or TAs. They can override the decision by providing their feedback at a candidate level after they have reviewed the code playback.
- Click the Detailed Report view.
- In the Attempt Activity tab, select Yes or No based on the plagiarism detected.
- Clicking No will remove the plagiarism for the question for which a candidate has been flagged.
Additional Plagiarism Indicators with Generated CSV Reports
The plagiarism information can also be verified on the report in CSV format you can download from the platform. The report includes the performance report of all candidates in a single file. Through the Report Preferences settings in HackerRank, you can choose the candidate performance data you want to download in your file.
For information on the additional indicators of plagiarism detection and how to use them, see the Plagiarism Best Practices guide.
Once an assessment is completed, go to the Candidates tab of the Test and select the status Completed. You will get an option to download the report.
After you download the CSV report, you can see the following columns in the report:
- Attempt plagiarism: The value can be YES or NO
- Yes or No includes “ Inside the Test & Outside the Test” plagiarism results.
For more information on downloading Test Reports, access the Downloading PDF and Test Reports article.
- Plagiarism Percentage
- By default, the system will flag a candidate for plagiarism when the match score is greater than 75% if the candidate writes at least 10 lines of code. If there are <10 lines of code, HackerRank will flag candidates for plagiarism only if the match score is greater than 90%. This is to ensure that we don’t flag candidates for questions that require writing simple code snippets (of <10 lines).
- We recommend a sliding scale based on the difficulty level of the questions in your assessment. (This is to keep the number of candidates being flagged as potential plagiarism within reason for easier questions)
- Easy questions = 90%
- Medium questions = (80-90%)
- Hard Questions = 75%
For questions that do not have plagiarism detection capability, users can leverage proctoring capabilities such as Tab Proctoring and Copy-paste proctoring.
This can be done by referring to the following columns in the CSV report:
- For Tab proctoring:
- Out-of-window duration: Show the duration (in seconds) for which the candidate was out of the Test tab.
- Number of window exits: Show the number of times a candidate moved out of the tab.
- For Copy-paste proctoring:
- Copy-paste frequency: Show the number of times a candidate tried to copy-paste their code into the coding editor.