It is not only the setting of assessments which affects their validity and reliability, but also how they are marked (or graded). The options which are available to you for marking need to be taken into account at the point of design.
In some cases, such as problem sheets, the design of the assessment takes longer than the marking, and usually the scheme is fairly self-evident. The learning being tested is usually convergent, which means that correct answers are clear, and the only real problems concern half-correct answers: if someone has got the answer to a maths problem wrong, do you give credit for the fact that they only went wrong in the latter stages of the working?
Whatever the decision, it is fairly easy to be consistent and hence reliable in its application.
This is less true in the case of essay-type questions. In fact, one of their problems is that they are so easy to set—most experienced teachers can think of an essay question off-the-cuff in fifteen seconds—that we often have little clear idea of what we will get back.
In the case of basic-level work, it is possible to determine a marking scheme which gives a set number of marks for mentioning particular issues:
Outline the longer-term consequences of the Schleswig-Holstein question. (5 marks)
gives a marking scheme of:
Lord Palmerston commented (1 mark)
that only three people ever understood the question (1 mark)
and of them:
- one was dead
- one was mad
- and he himself had forgotten
This approach is used to maintain consistency in the marking of large-scale examinations where a number of markers are used (e.g. GCSEs and AS and "A" levels in the UK), but even there it may be supplemented by marks awarded for more global factors, such as clarity of expression.
The temptation when marking substantial numbers of essays is to rush to a global mark, which takes into account a large number of factors, and facilitates comparison among members of the student group, but is probably highly unreliable, even when accompanied by a few remarks scrawled in the margin and at the end of the submission. Such a mark is often based on the teacher's conviction that, "I may not be able to describe a 57% (or a C+) essay, but I know one when I see one". Unfortunately (?), this is not good enough. For one thing, consider how many times you have marked a run of half-a-dozen disappointing essays, and then come across a moderately good one, to which you have given a higher mark than it deserves, out of sheer relief!
One way forward
As ever, the alternative is to ask—when setting the assignment in the first place—"Just what do I want the students to demonstrate?"
(Note that when you set several alternative titles, they all need to be assessable against broadly the same criteria)
You may decide that there are five major factors, such as (just for the sake of this argument—they will not apply to every subject):
Demonstration of knowledge of the content of the module.
Ability to bring critical understanding to bear on the material: not accepting everything at face-value: exercise of reasonable judgement about what is important and what not.
Use of Sources
Evidence of reading, both from the set texts and beyond them, and appropriate appeal to authorities to support and refute arguments.
The overall construction of the argument of the essay, including the drawing of relevant conclusions
Structure and expression
The essay as a piece of writing: its flow, style, and grammatical construction
Try to make each of these factors as much as possible independent of each other (which is more difficult than it seems, as this less-than-perfect example shows)
- It is good practice to have a mark-sheet which uses such standard headings, and can then be used for feedback to the students. With large numbers, this may have to be automated (see below): with smaller numbers, you can comment individually.
Next, think about the various levels at which each of these may be demonstrated on, say, a five-point scale, where "1" is low and "5" is high.
- You could of course go from "0" to "4", which is probably more "accurate", but this could result in an overall mark of "0", and although this may reflect your feelings about a particular piece of work, the convention is in practice that students get at least 15% for simply trying!
This yields a table similar to the one below:
|1||Little or no evidence of familiarity with content of module||Little or no evidence of critical evaluation of material||Sources not used to support substantive assertions or argument||Either no discernible, or seriously flawed academic argument||The assignment has unacceptable failings in structuring and/or clarity of written expression|
|2||Evidence that relevant module content is broadly understood, but with significant gaps or misapprehensions||Evidence of limited critical evaluation in some areas, with some lost opportunities or misunderstandings||Limited and uncritical use of a restricted range of sources||Argument is sometimes trivial, confused or flawed||The assignment has failings in structuring and/or clarity of written expression, which impair its capacity to communicate|
|3||Evidence that relevant module content is adequately understood, but with some gaps or misapprehensions||Evidence of a general critical stance, although some material not evaluated||Use of a range of appropriate sources, but without critical evaluation, or missing some significant items||Argument is let down by occasional confusion or flaws||While the assignment has some failings in structuring and/or clarity of written expression, these do not impair its capacity to communicate|
|4||Evidence of extensive knowledge of the relevant module content, without major misapprehensions||Evidence of good critical appreciation and evaluation of relevant theory and research and a systematic attempt to relate it to the topic||Use of a wide range of appropriate sources with some critical awareness of their status and relevance||Argument is sound and substantial, although not original||A generally well-structured and expressed assignment, which communicates clearly|
|5||Evidence of superior, comprehensive and deep knowledge of the relevant module content||Evidence of thorough critical appreciation and evaluation of relevant theory and research and a systematic and creative attempt to relate it to the topic||Use of a wide range of appropriate sources, indicating personal research, and with full critical awareness of their status and relevance||Argument is sound and substantial, with significant elements of originality||An assignment whose clear structure and expression significantly enhances its argument|
(There is a lot to be said, in the interests of transparency, for publishing the matrix in the module handbook.)
Now you can do the following steps in either order. The order given here is the one most people will adopt, because it is closer to the intuitive approach, but there are grounds for arguing that you should do it the other way round. I follow this present order, I admit, because my institution has global criteria for the determination of the level/grade of all assignments, and I have to make sure that my marks fit those.
- Mark the essays, giving them an appropriate mark (1-5) on each factor.
- Construct the overall numerical mark, probably out of 100, by weighting each of the factors according to their perceived importance in relation to the task as a whole. Thus "Structure" may be much less relevant as a criterion of assessment than "Knowledge": so Structure is incorporated x2 (a "4" for Structure = 8% of the final mark), but Knowledge is weighted x5 (a "4" for Knowledge = 20% of the final mark). Once the final mark has been calculated, this can if necessary be translated back into a nominal grade (A, B, C, etc.)
It is of course possible to develop a little mail-merge macro to do this for you, and to generate a useful feed-back sheet by automatically inserting the content of each of the above cells according to the score on the five-point scale.
The net result is that if you are ever asked to justify your marking, you will be able to do so without trouble and, much more important, the students are getting feedback and grading which is as reliable as you can make it given the inherent subjectivity of the task. More to the point, if there are several of you assessing a large cohort, agreeing such a scheme in advance adds immeasurably to the consistency of the marking.