HackerRank Projects for Data Science allows you to configure automatic scoring for your custom project-based questions, enabling you to assess Data Scientists with real-world scenarios.
This is an optional step in the Setup Project phase of the Data Science question creation process. In this step, you can upload evaluation scripts, submission files, solution notebooks, and more in the Add Evaluation Files section. These files remain hidden from candidates and will be downloaded in the Jupyter session for scoring and evaluation. The scoring command, specified in the hackerrank.yml
configuration file, will automatically execute once the candidate submits their solution.
This article outlines the process of setting up automatic scoring for your Data Science questions.
Setting Up Automatic Scoring for Data Science Questions
There are two methods for setting up automatic scoring: Standard Metrics and Custom Scoring.
1. Standard Metrics
-
Select a Metric: Choose a metric for evaluating the candidate's submission.
-
Define Submission File: Specify the file name that candidates should use for their final submission.
-
Upload Output File: Upload the actual output file that contains the expected results for generating the score. Map the required fields for scoring.
-
Upload Expected Submission File: Provide the expected submission file to validate the scoring setup. Map the required fields accordingly.
-
Validate and Save: Click the Validate and Save buttons to apply changes. In case of errors, refer to the error logs to debug the issue.
2. Custom Scoring
-
Create a Scoring Program: Write a program to evaluate the candidate's submission. For example, a Python script that compares the candidate's
submission.csv
with theactual_output.csv
, calculating metrics such as accuracy, recall, precision, f-score, etc. The score should be returned in the format:FS_SCORE: X%
#!/usr/bin/env python
# coding: utf-8 import pandas as pd
from sklearn.metrics import accuracy_score
import sys def score(actual_data, sub_data):
if actual_data.shape != sub_data.shape:
print('Shape Mismatch')
return 0
if actual_data.columns.tolist() != sub_data.columns.tolist():
print('Columns Mismatch')
return 0
actuals = actual_data['popularity'].tolist()
preds = sub_data['popularity'].tolist() try:
return accuracy_score(actuals, preds)
except:
print('Error in Evaluation')
return 0 def read_data(actual_file, submission_file):
try:
actual_data = pd.read_csv(actual_file)
sub_data = pd.read_csv(submission_file)
return actual_data, sub_data
except:
print('File Not Found')
print("FS_SCORE:0 %")
try:
actual_data, submission_data = read_data(sys.argv[1], sys.argv[2])
score = score(actual_data, submission_data)
print("FS_SCORE:" + str(score * 100) + " %")
except Exception as e:
print(e)
print('Score could not be calculated')
print("FS_SCORE:0 %") -
Upload Files: Upload all required files, including the scoring script (e.g.,
score.py
,score.sh
) and the actual output file (actual_output.csv
), in the Evaluation section. -
Update
hackerrank.yml
: Edit thehackerrank.yml
file to include the scoring command. Example:bashScoring command: python evaluation_files/score.py evaluation_files/actual_output.csv
-
Edge Case Handling: Ensure your evaluation script can handle edge cases and return a score in all scenarios. An error in the script will result in a no-score, and the submission will require manual review.
-
Validate and Save: Click the Validate and Save buttons to apply changes. In case of errors, refer to the error logs to debug.
Note: To update evaluation files, re-upload the new files and delete the previous version.
Manual Scoring Option
HackerRank also offers manual scoring for Data Science questions. Learn more about manual scoring here.