Performance evaluation of
plagiarism detection method based
on the intermediate language
Vedran Juričić
Tereza Jurić
Marija Tkalec
Plagiarism detection method
Method for detecting plagiarism in source code for
.Net languages
C#
Visual Basic.Net
C++
…
Identify similar code fragments
Determine similarity between source files
Based on intermediate language
2
Plagiarism detection
First
Second
1. using System.Text;
1. using System.Text;
2. namespace Test
{
3. class Math
{
4.
public double GetMaximum(double[] Input)
{
5.
double result = Input[0];
6.
foreach (double temp in Input)
{
7.
if (temp>result)
8.
result = temp;
}
9.
return result;
}
}
}
2. namespace Test
{
3. class Math
{
4.
public double GetMaximum(double[] Input)
{
5.
double result = Input[0];
6.
for (int i=0;i<Input.Length;i++)
{
7.
if (Input[i]>result)
8.
result = Input[i];
}
9.
return result;
}
}
}
Similarity = Number of overlapping lines / Total number of lines
= 6 / 9 = 66,66%
3
But…
First
Second
1. using System.Text;
1. using System;
2. namespace Test
{
3. class Math
{
4.
public double GetMaximum(double[] Input)
{
5.
double result = Input[0];
6.
foreach (double temp in Input)
{
7.
if (temp>result)
8.
result = temp;
}
9.
return result;
}
}
}
2. namespace OtherTest
{
3. class MyClass
{
4.
public double ReturnMaximum(double[] Array)
{
5.
double current = Input[0];
6.
for (int j=0;j<Input.Length;j++)
{
7.
if (Input[j]>current)
8.
current = Input[j];
}
9.
return result;
}
}
}
Similarity = Number of overlapping lines / Total number of lines
= 0 / 9 = 0,00%
4
Problems
Modification of variable names, types, constants
Modification of class member definitions
Line and command reordering
…
Solution
Detail analysis
Complex preprocessing
For each supported language
5
Our solution
Convert from source language to low-level
language (Common Intermediate Language)
By using existing tools
Compiler
Disassemler
Tools exist for all .Net languages
6
Our solution
using System.Text;
namespace Test
{
class Math
{
public double GetMaximum(double[] Input)
{
double result = Input[0];
foreach (double temp in Input)
{
if (temp>result)
result = temp;
}
return result;
}
}
}
C# language
C# compiler
.method public hidebysig instance float64
GetMaximum(float64[] Input) cil managed
{
// Code size
61 (0x3d)
.maxstack 2
.locals init (float64 V_0, float64 V_1, float64 V_2,
float64[] V_3, int32 V_4, bool V_5)
IL_0000: nop nop
IL_0001: ldarg.1ldarg.1
IL_0002: ldc.i4.0ldc.i4.0
IL_0003: ldelem.r8
ldelem.r8
IL_0004: stloc.0 stloc.0
IL_0005: nop nop
IL_0006: ldarg.1ldarg.1
IL_0007: stloc.3 stloc.3
…..
…
IL_0037: ldloc.0 ldloc.0
IL_0038: stloc.2 stloc.2
IL_0039: br.s br.s
IL_003b
IL_003b: ldloc.2 ldloc.2
IL_003c: ret
ret
} // end of method C::GetMaximum
Common Intermediate Language
7
Plagiarism detection system
Evaluate the performance
Analyze and compare behavior to most commonly
used plagiarism detection systems:
MOSS
JPlag
CodeMatch
8
Tested systems
MOSS
Developed in 1994.
Commonly used in computer science faculties
Supports 26 programming languages
JPlag
Developed in 1996.
Commonly used in education
Supports C, C++, C# and Java
9
Tested Systems
CodeMatch
Developed in 2003.
Commercial software
Supports 26 languages
ILMatch (our system)
Developed in 2010.
Supports all .Net languages (currently 59 languages)
10
Testing
6 test categories
50 test cases covering common code modification
techniques
Evaluation methods
Precision, recall
F-measure
11
Results
MOSS
JPlag
Highest F-measures
CodeMatch
ILMatch
12
Positive
No impact
User comments
Code formatting
Modification of variable and class names
Modification of class members
Changing data types
Some impact
Replacing expressions and loops
Rewritting code in different language
13
Further work
Significant impact
Reordering operands
Reordering class members
Adding redundant statements and variables
Improvements in comparison algorithm
14
© Copyright 2026 Paperzz