AustLII Home | Databases | WorldLII | Search | Feedback

Legal Education Digest

Legal Education Digest
You are here:  AustLII >> Databases >> Legal Education Digest >> 2010 >> [2010] LegEdDig 44

Database Search | Name Search | Recent Articles | Noteup | LawCite | Author Info | Download | Help

Todd, P --- "Plagiarism detection software: legal and pedagogical issues" [2010] LegEdDig 44; (2010) 18(3) Legal Education Digest 46


Plagiarism detection software: legal and pedagogical issues

P Todd

The Law Teacher, Vol. 44, No. 2, July 2010, pp137–148

It is necessary to dispel the myth that plagiarism detection software is effective only at combating cutting and pasting from Internet sources, and not, for example, the more traditional types of plagiarism, such as copying passages from books. Even if it were true, the software could, at least, put us back into the position we were in before the Internet was in widespread use. But it is not true. It may not happen for other reasons, but from a purely technical standpoint it should be possible to counter all types of plagiarism, at any rate in essay-based coursework, and put ourselves into a better position than we were in during the pre-Internet era.

Nor are the uses of plagiarism detection software limited to the matter of guarding against the known and obvious pitfalls of plagiarism. It can assist examiners by showing how essays are constructed, whether or not they are technically plagiarised. It can be useful in supervision and examination of theses. We should expect journal editors increasingly to use plagiarism detection software for articles submitted. Plagiarism detection software can be useful for the students themselves, before finally submitting their work (arguably they should already know where their essays are sourced from, but poor note-taking may lead to mistakes and failures of attribution). In any case, their ability to see the reports their markers will see is relevant to some of the conclusions drawn at the end of the article.

One way of avoiding plagiarism is to set tasks which make plagiarism more difficult. Some of these assessment techniques have pedagogical merit in their own right, in which case we should consider adopting them, whether or not plagiarism is an issue. The fear of plagiarism may therefore perhaps trigger a review of our assessment which we should have made in any event. We may nonetheless legitimately conclude that there are skills which can be best assessed using the traditional long essay, written over an extended period – it is, after all, essentially what we are doing when we write our own academic pieces. It would be totally unacceptable to have to abandon a good assessment method just because we are not prepared to use the tools which are available to us to tackle the plagiarism menace.

In order to use plagiarism detection software, it is necessary for the marker to have the essays in a digital form. The simplest way to do this is to require submission in digital form, possibly alongside paper submission for those of us who, even today, object to reading material on screen. Alternatively, scanning and optical character recognition (OCR) software can be used, but the OCR software needs to be almost 100 per cent accurate to be of value. With the continuing improvement of screens, and the increasing familiarity of academics with on-screen reading and editing of email, it seems difficult to believe that objections to online submission are sustainable, other than in the very short term.

There are three main techniques used by the software packages currently available. First (fairly obviously), there are those which employ search engine techniques, to find matches on the Internet. Secondly, there are those which find similarities between files on a single computer; these are intended primarily to detect collusion. Thirdly, there are those, of which Turnitin is the best-known example, which build up their own archive databases from past essay submissions, and agreements with publishers. It is this third type which provides us with the tools to defeat plagiarism from any source, whether or not that source is Internet-derived.

Many packages use only the first or only the second technique. For example, EVE2 only finds matches on the Internet, whereas CopyCatch Gold and WCopyfind are collusion detectors. Turnitin uses the first and third techniques (and it is also possible to use its archive to check for collusion). Viper uses all three. In 2001, a number of packages were reviewed in a Technical Review of Plagiarism Detection Software Report, prepared for Joint Information Systems Committee (JISC). EVE2, Turnitin and CopyCatch were included in the review. The review of WordCHECK links to a website which now links to Viper. Though the report is now quite old, and the software itself has moved on, the general principles identified in the report remain valid.

Plagiarism from an Internet source can sometimes be detected simply using a search engine, such as Google, especially if the plagiarised source uses unusual words or language. A marker restricted to using Google will be working at a handicap, however, quite apart from the sheer hassle of making what could be many searches, for each essay submitted. Google searches are restricted to 32 words, whereas plagiarised passages (in law at least) are often considerably longer than that. Nonetheless, software that automatically tests every sentence of a file using a search engine, such as PlagiarismDetect.com, can be surprisingly informative.

Running the same essay through PlagiarismDetect.com, Viper and Turnitin will produce different results. Because it is not possible to search the whole of the Internet in an acceptable period of time, various devices are used to cut down the search. Search engines use indexes, and order their results on a probabilistic basis. But because most people who use Google are not looking for plagiarised sources, one cannot expect the search to be optimised for that activity. Software written explicitly for plagiarism detection should be able to index more effectively (after all, only a very small part of the Internet is likely to be useful as a plagiarism source), and other techniques are also used to reduce search times. Probably the most effective tools will be subject-specific, and it is notable that Viper asks, for each document tested, for a subject category. Search times can also be very much longer (and hence find more) than would be appropriate for a Google search, especially if the marker is organised and performs the search while doing something else, or batch- processes the essays. Nonetheless, because only a small part of the Internet is actually searched, even the most blatant verbatim copying can sometimes go undetected.

There is also the issue of what is searched for. Restricting the search to an exact string will not catch the student who makes minor changes to a plagiarised passage, whereas not so restricting it can result in many false leads. A software package needs to be able to compare passages of realistic lengths, and not be fooled by minor differences between the target and the checked passage. If the report allows the marker to easily compare suspicious parts of the essay with original sources, the match need not be particularly exact, especially if the software is to be of value in discovering how students construct essays, as well as in detecting plagiarism strictly so defined.

Even packages which only find material sourced from the Internet should become increasingly more effective, as material is increasingly made available online, unless digital rights management techniques are used to protect such material.

There are packages which compare documents on a single computer, whose main use is for detecting collusion between students, if, for example, all submitted essays are stored in a directory on the marker’s computer. Essays can also be compared against anything else on the marker’s computer, enabling (subject to copyright) private databases of likely sources to be held locally; indeed, Viper’s instructions positively encourage checking material held locally. In a specialist area, even quite a small local archive is likely to be a formidable tool.

In the JISC report referred to earlier, academics ‘reported that the primary source of plagiarised material was work copied from textbooks and theses. The second most common source was material cut and pasted from the Internet’. The proportion of Internet-sourced material has almost certainly increased since 2001, when the JISC report was published, and the reports are necessarily only of instances where plagiarism is found, not of those where the perpetrator has escaped detection. Nonetheless, quite a lot of plagiarised material is not accessible on the Internet. Some packages, for example Turnitin and Viper, archive submitted essays, so that future submissions can be compared, not just against Internet sources, but also against these archives. A passage copied from a textbook or other source, which is accessible in printed form only, will not be picked up in the first essay submitted. All future essays using the same source will appear similar, however, not to the original source, but to the first essay submitted. As the archive grows, this technique can be expected to capture most sources, including not only books, but also essay banks. Turnitin also has arrangements with publishers, enabling it to further increase the comprehensiveness of its archive databases.

Computers can greatly assist us, therefore, in combating a problem that they themselves have created. But we cannot abrogate our judgement to the machine. Care and professional judgement are needed to interpret the results. For example, Turnitin’s ‘Overall Similarity Index’, which records the percentage of the essay which matches an Internet or archived source, means very little, and a marker has to read the report very carefully to evaluate it. Many legal phrases, statutory provisions, etc., will naturally be on the Internet, and an ‘Overall Similarity Index’ of zero (even assuming the option has been taken to exclude direct quotes) would be neither expected nor indeed desirable. To some extent, perfectly legitimate paraphrases might also be caught, or conceivably the adoption of a writer’s views, but in an original context. But after all, careful evaluation is what markers do. It properly remains the role for the academic, and not the machine, to make a final judgement.

It is difficult to see that a student can object to the use of plagiarism detection software as such.

Any marker will check for plagiarism, as thoroughly as time and other resources allow.

There has been litigation in the United States, however, about the archiving of essays. In A.V. v iParadigms LLC, high school students sued iParadigms, the producers of Turnitin, claiming that the archiving of their essays amounted to a breach of their copyright in them. iParadigms claimed that they were entitled to the defence of fair use, and also that the students, by clicking on an ‘I Agree’ button when they created their user profiles to submit essays to Turnitin, had consented to the use. In the US Court of Appeals, iParadigms succeeded on the fair use issue, and the court did not need to consider the issue of consent.

Space does not permit a detailed consideration of intellectual property law, but we should certainly not assume that fair dealing is defined in the same way in the UK as fair use in the US, nor that archiving essays would be regarded as a permitted act.

It is conceivable that a plagiarism archive would be protected by the notice and take-down provisions of the Electronic Commerce (EC Directive) Regulations 2002, at any rate until it had notice of a copyright infringement, but this cannot be certain, since the user of the service (probably the university) might well be regarded as ‘acting under the authority or the control of the service provider’, in which case the immunity conferred by the regulations would not apply.

Given that in the UK, a fair dealing defence would almost certainly fail, and given also the fragility of a public interest defence, it would be wise for universities also to obtain the consent of students, before submitting essays to an archive.

That is not the end of the problem. A student whose essay contains an appropriate proportion of quotes from elsewhere, properly acknowledged, will not infringe the copyright of the author quoted, but a plagiarised essay will, and the archiving might therefore infringe third party rights. Again, it would be wise, if possible, to deal with this through consent, and it is probable that many publishers would indeed consent to allowing their material to be used in the fight against plagiarism. Indeed, it might be possible to set up a licensing scheme, similar to that operated by the Copyright Licensing Agency (CLA) in respect of photocopies, etc.

Consent might not always be obtainable, however, an obvious category of objectors being writers of essays intended for sale in paper mills, and the owners of such sites. The application of the public interest defence would be the same for paper mills as already observed for essay banks, and, given its fragility, it would be wise for software designers to design so as to be able to exclude essays where plagiarism is identified, as well as material where stringent objection is taken, by the copyright owners, to its use.

One of the objections taken by the students, in A.V. v iParadigms LLC, to the archiving of their essays, was that if they later submitted the same work to a literary journal it would appear to be plagiarised, though their own work. The District Court, whose view was upheld in the US Court of Appeals, had said:

Anyone who is reasonably familiar with Turnitin’s operation will be able to recognize that the identical match is not the result of plagiarism, but simply the result of Plaintiff’s earlier submission. Individuals familiar with Turnitin, such as those in the field of education, would be expecting the works submitted to have been previously submitted.

If this reasoning is convincing, it is another example of the care that needs to be taken when considering a Turnitin (or similar) report.

It is possible to counter plagiarism by setting tasks which make plagiarism more difficult. In law, from my own experience I know that we can set problem questions, changing them each time the assignment is set, and use very short deadlines, making plagiarism more difficult. We can reduce the proportion of coursework assessment and increase the role of the traditional examination. We can increase use of oral presentations and (at least to assess basic knowledge) multiple choice questions, which are impossible to plagiarise.

Many of these suggestions make sense whether or not there is a plagiarism issue, and of course, we should in any case regularly review our assessment objectives and practices.

If our arsenal includes routine use of plagiarism detection software, that will force us to think more carefully about how we teach and assess. At the very least, information on how students construct essays will inform us how to set future work. We may find, for example, heavy reliance on a particular source, and set future assignments to discourage its use. The information may assist our guidance of students from cultures whose engrained views on use of sources differ from our own. A new problem will also emerge. If students know in advance that their essays will be tested, and in particular if they can see the reports, it is unlikely that they will be tempted to cut and paste substantial amounts of material, but they might instead be tempted to try to reduce their ‘Overall Similarity Index’ (or equivalent) as far as possible. To some extent this activity might be a useful exercise, but students should be warned against taking it too far. Unless the software becomes sufficiently sophisticated to ignore standard legal phrases, students should be advised not to aim for a zero score. This will involve skilful instruction on our part.

In the longer term, it is possible that plagiarism detection techniques will fail to live up to their potential. A darker possibility is for new software to emerge, written to frustrate plagiarism detection, thereby creating an additional challenge for plagiarism detection software.

But it is also possible that plagiarism detection software will become a seriously effective tool. Plagiarism requires at a minimum the copying of a text document from another source without acknowledgement.

Whatever the motivation, plagiarism constitutes bad work, but if it results from a failure of understanding, or from time pressure or incompetence, it may not be appropriate to penalise it further. What causes our concern is the student who deliberately passes off another’s work as his or her own, pretending to a merit that he or she does not possess. We take plagiarism so seriously, and punish it, because the motivation might be of the second type, rather than the first.

But suppose we lived in a world where the student knew the essay would be tested, and all sources discovered; indeed, he or she could even see the report, before the essay was submitted. Copying will be evidence of incompetence, rather than dishonesty. Indeed, once the taint of dishonesty is removed from the equation, we might even place a value on the ability simply to find material effectively on the Internet (while the value we place on this skill may not be high, we cannot entirely deny its utility in the modern world).

Plagiarism detection software will never be able to defeat the determined and well-funded cheat. Students will still be able to buy bespoke essays, written by others on their behalf, and if they are never re-used they will continue to go undetected. Since these essays will lose their value after just one use, and since the ghost-writers will know that their own work will be tested, it may be supposed that this form of cheating will become more expensive, and therefore rare. We may prevent only 95 per cent or 99 per cent of cheats, but that is a lot better than nothing, even if we cannot detect 100 per cent. That is, after all, the basis of much crime prevention.

I suggest, however, that the battle is one we have no choice but to join.


AustLII: Copyright Policy | Disclaimers | Privacy Policy | Feedback
URL: http://www.austlii.edu.au/au/journals/LegEdDig/2010/44.html