Qualitative evaluation for program improvement

Qualitative evaluation for program
improvement

This is a resource file which supports the regular public program "areol" (action research and evaluation on line) offered twice a year beginning in mid-February and mid-July. For details email Bob Dick bdick@scu.edu.au or bd@uq.net.au

... in which an evaluation process in the general style of action research illustrates some points about qualitative research and evaluation

(An earlier version of this paper was prepared for and distributed at the IIR conference on evaluation, Brisbane, Tuesday 7 to 8 September, 1992. It was revised in 1994 and again, a minor revision, in 1997.)

Contents

Abstract

Action research

The Snyder model

Systems models

Prior activities

Participative strategies

Process evaluation

Outcome evaluation

Short cycle evaluation

Rigour vs relevance

Mechanisms for increasing rigour

Advantages and disadvantages of qualitative research

Qualitative and quantitative

Other considerations

Notes

Abstract

The centrepiece of this paper is a description of a mostly-qualitative research process known as the Snyder evaluation model. An action research approach to evaluation, it enables me to address a number of issues about qualitative research, and qualitative evaluation in particular.

Some of the issues are to do with the nature of evaluation. The Snyder model incorporates two forms of long-cycle evaluation: process, and outcome. It also provides for short cycle process evaluation to yield a qualitative alternative to the measurement portion of total quality management.

Most of the other issues relate to the difference between quantitative and qualitative research, and quantitative and qualitative evaluation. In particular, some of the advantages and disadvantages of qualitative evaluation are identified. It is argued that what is usually compared is not just qualitative and quantitative approaches; rather, a package that is quantitative and predetermined and independent is being compared to a package that is qualitative and responsive and participative. The supposed trade-off between rigour and relevance is shown to be a partial trade-off only. A further trade-off between local relevance and global relevance (generalisability) is discussed.

Qualitative and quantitative methods are shown to be complementary, at least in part. Methods of increasing the rigour of qualitative evaluation are briefly described.

Action research

Action research is a research method which is usually mostly qualitative and usually participative. Its very name describes its aims: to achieve both action and research outcomes within a single study.

From this starting point I can identify a number of components of any action research methodology. At the most broad-brush level of description it requires...

strategies for intervention: the strategic action component; and

strategies for understanding: the strategic research component.

To achieve these in practice requires other finer grain processes, for example for involving people or collecting data or the like...

processes for intervention: the tactical action component; and

processes for understanding: the tactical research component.

Both action and research are data-based. Some means of collecting and interpreting data are therefore necessary. These may include some or all of...

a content model: some model or theory or taxonomy or the like which enables the collected data to be categorised; or

data strategies: the strategic component for collecting and interpreting data; or

data processes: the tactical component for collecting and interpreting data.

These components (Figure 1) can serve as a checklist to check the adequacy of any action research process, including evaluation processes.

In actuality there are two distinct families of action research. One has research as its main emphasis, but tries to do this in ways which provide action outcomes too. The other has action as its focus, and the research mostly takes the form of understanding on the part of those involved. If it adds to published knowledge, that is a rare bonus.

To reflect this, the research components can be further subdivided. Research as scientific knowledge may be the most important focus of action research which emphasises the research component. In contrast, the form that research may take in action-oriented action research may be understanding on the part of the participants. We might call these two varieties "action research" and "action research".

Fig. 1
The components of an action research process. (The subdivision of the research component is explained in the text)

My own interests as a practitioner and as a "teacher" ¹ of practitioners allow you to predict my preferred approach: action research rather than action research. As more of our postgraduates expressed a wish to do field research related to action for their theses, I found myself wondering if the research component could be strengthened without sacrificing the action component. The Snyder evaluation model ² was one of the results.

The Snyder model

I have chosen the Snyder model to demonstrate qualitative evaluation because it illustrates three different varieties of evaluation, each serving a different purpose. It has separate components for each. Two of the components are widely discussed in the literature...

process evaluation, which seeks to understand the functioning of a program or unit or organisation (henceforth just program; ³

outcome evaluation, which seeks to determine how effective a program is. ⁴

The third component, which I will argue is eventually the most valuable, is...

short cycle evaluation, which sets up the feedback loops which can be used to enable a program to become self-improving.

The Snyder model can be used participatively, or by an independent evaluator. My emphasis here is on participative uses, as most consistent with the action research approach I favour.

Underlying each of these components is the content model used to interpret the data. As are many evaluation models, it is goal oriented and derived from general systems theory.

Systems models

General systems theory can be applied to anything which can be viewed as resource-consuming and goal-oriented. The system is whatever is being studied: in the present instance, whatever is being evaluated. It draws inputs from its environment. It delivers outputs to its environment. Internal processes transform the inputs into outputs. Sometimes, outputs trigger the environment into changing the inputs; this external link between output and input is feedback. Figure 2 summarises.

     Fig. 2

     The main elements of a systems model

The terms I will use for these components are resources (for inputs) and activities (for processes). Outputs are subdivided into three categories, depending on their time span. Immediate effects (or just "effects") are outputs which are achieved at the same time as the activities are carried out. Targets are the goals or objectives which are not immediate but are expected to be achieved some time in the future. Their time span is typically that of the decision-cycle or budgeting or funding cycle of the program or unit: often annual. Ideals are ultimate goals. They are distant and unrealisable, but define the better world which the program actors wish to achieve in their more optimistic dreams.

These five elements (resources -> activities -> effects -> targets -> ideals) are the focus of data collection in the various components of the Snyder model.

I can now redefine the three phases of the Snyder model (process, outcome, and short-cycle evaluation) with reference to these elements...

The goal of process evaluation is to understand how the program functions. It does this by studying the way in which the elements are linked together. That is, it focuses on the way in which resources are consumed by activities and produce immediate effects as targets are pursued in the course of aspiring to the ideals.

The goal of outcome evaluation is to decide how well the program is achieving its goals. It does this by building on the process evaluation. The understanding derived from the process evaluation can be used to devise measures which can act as a proxy for the less-measurable targets and ideals. These measures are often called performance indicators.

The goal of short cycle evaluation is to transform the program into a self-improving program. It does this by building on the process and outcome evaluations. Feedback loops are set up to allow program members to monitor and improve their performance on an ongoing basis.

I'll briefly consider these in turn shortly, after a brief account of some of the strategic intervention issues.

Prior activities

A participative intervention process is most likely to work well if certain prior activities are carried out. Their purpose is to help the client group to develop realistic expectations about the intervention to follow. Three prior activities, in particular, seem to be important: negotiating appropriate roles with the client, building relationships between the stakeholders, and reaching agreement on the process to be used. The first of these is in a sense a combination of the second and third, applied to the relationship between evaluator and client.

In short, before you get down to business, deal first with relationships and then with the processes to be used. Do this first for your own relationship with the people you are to work with; then do it for their relationships with each other.

Participative strategies

My preferred approach to action research is participative. No doubt this reflects to some extent my experience as a consultant. Some, too, is based on a desire to extend to others the autonomy which is important to me. By and large, however, it has been my experience that people are more strongly committed to the decisions they make themselves than they are to the decisions that are made for them. I assume therefore that participation increases the likelihood of action.

In evaluation processes the results of participation are immediately apparent. Most people, it seems, would rather do a job well than poorly. If they obtain a better understanding of how their activities are linked to resources, targets and ideals, their behaviour changes. From the point of view of evaluation as research this contaminates the conclusions which can be drawn. However, for the most part the whole point of intervention is in allowing project improvement. Participation usually enhances this.

A further issue in participation is the information it provides. When people are in command of the information collected they have less need to fear the use that will be made of it. They can afford to be more frank; and for the most part this yields better data.

This can be increased further by extending the list of participants beyond the project team. If clients, suppliers, representatives from funding agencies and the like are also included, the likelihood of better information and action is improved.

The evaluator, in effect, becomes a facilitator. The stakeholders -- those people who have a stake in the project -- are guided by the evaluator through the three stage of process evaluation, outcome evaluation and short cycle evaluation.

Process evaluation

The first component of a Snyder evaluation is a process evaluation. Instead of focussing on the elements of the content model (resources, activities, effects, targets, ideals) it studies the links between them. The purpose is to understand how activities consume resources, and how they produce immediate effects in the pursuit of targets and ideals.

In general, the way this is done is to define two adjacent elements and compare them. So ideals are compared to targets, targets to effects, and activities to resources. ⁵ A mismatch is a sign that the actors don't understand the system, and that at least one of the two elements needs adjustment (Figure 3).

Step by step, the process evaluation takes this form...

1.   Semi-independently define ideals and ideals and targets

2.   Compare ideals and targets, and adjust them as necessary

3.   Semi-independently define the activities, and from them deduce the effects, both intended and unintended, desirable and undesirable

4.   Compare the effects and targets, and adjust them as necessary

5.   From the activities deduce the resources

6.   Compare activities and resources, checking that the most resource-expensive activities are also those that contribute most to the important targets and ideals.

Fig. 3
In a Snyder process evaluation the elements are defined, and the links between the elements are then compared and adjusted

I'll describe this in a little more detail for one pair of elements. This may convey more of the flavour of the process.

1.1 Define the ideals

My preferred way of doing this is to use a miniature version of search, which asks people to define a future and distant vision. The instructions I use go something like this:

"Imagine that it is 2002, and in the last ten years your project has been spectacularly successful. So now, in 2002, you have more than achieved all you could have wished. Imagine further that you are about to go out among the project staff, and its clientèle, and the world at large. What would you expect to see and hear and experience that would be evidence to you of its success."

The result is a list of items spanning the project and its immediate and more distant environment. These can then be arranged in order of priority, perhaps using some form of multiple voting method.

1.2 Define the targets

Ask people to forget the ideals they have just defined, and define the targets they are presently working towards. I prefer to move to a different (part of the) room and remove the list of ideals from view, to signal the break. At the very least, I ask people to try to put the ideals out of mind, so that they targets are defined without recourse to the ideals.

A target is a goal that is more tangible than an ideal, and probably has definite standards of achievement and a definite time frame. Sometimes the targets can be defined by recourse to documentation, especially if the project has been through some form of recent strategic planning. If so, this can provide a more meaningful comparison between targets and ideals.

2.1 Compare the ideals and targets

This is most easily done by taking each ideal in turn and identifying which targets contribute strongly to it.

2.2 Note any targets which contribute to few, if any, ideals; and note any ideals to which few, if any, targets contribute. These "orphans" or near-orphans suggest that there are targets or ideals which are superfluous or missing.

Targets are compared to immediate effects in a similar way, except that special attention is also given to unintended immediate effects. For the third comparison (between resources and activities) the check is to ensure that the activities which are most important are also those for which the most resources are made available.

The process evaluation component can be used as a free-standing evaluation as it is. It also prepares for outcome evaluation and short cycle evaluation; both of them depend upon the understanding of the project activities which the process evaluation provides.

Outcome evaluation

With the knowledge gained from the process evaluation, the outcome evaluation can now be done. The goals of this segment may be to develop performance indicators for use in the short cycle evaluation. Alternatively (or as well) its purpose may be to assess or demonstrate the effectiveness of the program.

The general procedure is to consider each of the ideals in turn. Trace each ideal back through targets, effects, activities and resources, identifying its precursors at each of these levels. Figure 4 represents this graphically. What you typically find is that qualitative and quantitative precursors can usually be found within resources, activities and effects. These are potential performance indicators. Often, too, the qualitative indicators are closer to the vision; the quantitative indicators are commonly easier to use, but less directly linked to the vision.

Fig. 4

In outcome evaluation, performance indicators are developed by tracing the ideals back to the point where they are evaluable

Then, from these potential indicators, choose a "package" of indicators for each ideal. The end result to aspire to is one which observes the following conditions...

All else being equal, quantitative indicators are to be preferred. They are more easily recorded, more easily compared, and allow emergent trends to be identified. Of course, all else is seldom equal, and they are usually also less direct indicators of the ideals. Therefore choose a mix of qualitative and quantitative: quantitative for ease and comparability, qualitative for safety.

You can overcome some of effects of the fuzziness of qualitative indicators by focussing on change. For example, if you are devising a measure of morale, there are difficulties in asking informants for a direct judgment of the level of morale. If they reply "reasonable", its hard to know if that is better or worse than the previous month when they said "fair to middling". It is generally easier to ask directly if morale is level, rising or falling.

Indicators located at the resources and effects levels are to be preferred over those at the activities level. If people achieve outcomes (that is, effects) within resource constraints, how they do so need not be an issue. To locate indicators at the activities level risks constraining people unnecessarily.

Indicators which offer immediate and frequent feedback are to be preferred over those which are less frequent and immediate. Behaviour is shaped more effectively by regular feedback which closely follows the behaviour. ⁶ For example, it is useful to track performance in terms of targets, but the feedback provided does little to shape day-by-day behaviour.

Multiple indicators are better than single indicators. Any single indicator is contaminated by other influences. With multiple indicators, some of the contaminants cancel out. For example if absence increases it may just mean that an influenza epidemic has hit town. If absence and labour turnover and variability of production and grievances also increase at the same time, it is more reasonable to assume that something has threatened morale.

Indicators which combine desirable and undesirable resource use and outcomes are more sensitive than indicators limited to one or the other. It is otherwise possible for gains in the positive indicators to be achieved at the cost of increases in the negative indicators.

The task is essentially one of choosing a combination of indicators which adequately sample the ideals, and which do so frequently enough to guide behaviour on a daily or more frequent basis.

With the indicators chosen, you are then able to use them to reach a conclusion about the overall effectiveness of the project. You are in a position to comment on the achievement of the ideals, backing up your conclusions with evidence from the indicators and their links to the ideals.

Sometimes you are required to do this for some external body such as a funding agency which requires evidence of effectiveness. Even without this requirement it is worth doing. It gives the project team evidence on achievement which allows them to check the accuracy of their process evaluation.

Alternatively, or in addition, you can use the indicators as preparation for the third phase of the evaluation. This is the creation of a short cycle evaluation system for continuous project improvement.

Short cycle evaluation

According to general systems models, one expects some feedback to the project from its immediate environment. However, this feedback is possible long term, infrequent, and possibly selective. The purpose of short cycle evaluation is to select appropriate performance indicators. This by itself will not necessarily shape behaviour, so mechanisms are then set in place to provide information on the indicators regularly to those who can make the most use of them.

The steps for doing this are, briefly, as follows...

1.   The outcome evaluation has provided performance indicators which are an adequate sample of each of the ideals. From these, without destroying the adequacy of the sample, select those which are easily and regularly provided.

2.   Identify the source of each of these indicators.

3.   Create a mechanism whereby that source provides each of the indicators to those who can use it at frequent and regular intervals.

4.   Schedule regular reviews of the ideals, the indicators and the mechanisms.

This concludes a very brief discussion of a Snyder evaluation. It is not enough, perhaps, for you to be able to facilitate an evaluation using it. However, it will serve my purpose of illustrating some key points about qualitative action research in general and qualitative evaluation in particular.

After a discussion of the relative merits of quantitative and qualitative evaluation, I turn to a discussion of the ways in which a Snyder evaluation builds in action and research through the processes it uses.

Rigour vs relevance

Discussions of the relative merits of quantitative and qualitative data are not uncommon. A frequent position taken in some qualitative literature is that in general quantitative data may be more reliable (that is, rigorous), but qualitative data can be more valid (or relevant).

This, I think, is a convenient simplification if you bear in mind three issues...

That it isn't absolute. There is a partial trade-off between them. It's a trade-off because under some circumstances you can surrender one of them to gain more of the other. It's partial because you can sometimes gain more of one without having to give up the other to achieve it.

That the trade-off isn't fixed. Some trades are less demanding than others, depending on the processes used.

That paying attention to strategy and tactics, intervention and research, often allows some of the best of both worlds.

Further, it isn't entirely a quantitative vs qualitative issue. Quantitative research (and evaluation) tends also to be done by an evaluator as independent, and to be designed before being conducted. Qualitative research (and evaluation) lends itself to being used participatively, and in a way that is responsive to local developments. In other words, the comparison is often between a quantitative/ independent/predetermined package and a qualitative/participative/responsive package. The components of these two packages tend to reinforce each other, and the rigour or relevance they allow (Figure 5).

Fig. 5
The comparison is often less between qualitative and quantitative than between two different packages

The Snyder model clearly lies towards the right hand end of this continuum. This is true, at least, for the form I have described here. Because its apparent relevance is what has more effect on practical outcomes, perhaps that is fair enough. The approach does contain some features, however, which improve the rigour without undermining the relevance. (There is also a hidden problem about relevance, too, that I'll pick up again later.)

First, the relevance. Qualitative methods are more easily used with unskilled participants or informants. They can be more responsive, too, because there isn't the overhead of having to develop a new metric every time you change your mind about what you are doing. However, this needs to be qualified a little. You will have noted, I imagine, that the apparent relevance may be less apparent to someone not involved in it. This is an important argument for involving representatives of all the stakeholders.

Note, too, that the quality of information influences more than just research outcomes. The research outcomes are two-fold: understanding on the part of the participants, and on the part of the evaluator. The first of these is what is likely to produce action. The second may feed into publishable research.

Even fuzzy and inaccurate information may be persuasive to the participants if they have compiled and interpreted it themselves. So it may well motivate them to action. It is going to be informed action, however, only if the information is accurate. In the interests of both research and action it is worthwhile to improve the rigour.

Mechanisms for increasing rigour

I haven't really discussed the tactical level of the processes. Much depends upon the micro-processes used, the general style of the evaluator, and the adequacy of the relationship formed with the participants. However, that is beyond the scope of this paper. ⁷

At the strategic level, at least four mechanisms for increasing rigour can be identified...

The presence of all stakeholders increases the likelihood that different perceptions on the project will be available to participants. Groupthink is less likely when differing interests and views are present and expressed.

During the process evaluation the comparison between adjacent elements helps to identify inconsistencies in the information. This is the reason for asking participants to put the ideals out of their mind before identifying the targets.

The outcome evaluation, by providing some form of assessment of the effectiveness of different parts of the project, provides an additional check on the perceptions which underpin the process evaluation.

However effective the project or the evaluation, the short cycle evaluation sets up feedback which can be used to improve the system. Further, the ideals and the indicators are reviewed from time to time to allow the feedback mechanisms themselves to be improved over time.

Several of these embody what an important way of increasing the rigour of qualitative evaluation. At each step of the process, an attempt is made for there to be multiple sources of information. Sometimes this comes from having different informants, and sometimes from encouraging different perspectives which can be compared. On other occasions it depends upon the use of different methods, for instance process and outcome evaluation. In the latter case it is usually called "triangulation". The more general principle may be called "dialectic".

To build dialectic into a process requires the use of more structure than some qualitative researchers are accustomed to. It also depends upon the use of appropriate tactical processes. The aim is to create a climate of informed and cooperative debate between different perspectives, however these perspectives arise.

I am suggesting that widespread use of dialectic can increase the rigour of qualitative processes without harming the relevance.

Advantages and disadvantages
of qualitative research

A number of points can be made about qualitative research in partial summary. The core of its advantage is that it can be done in the local dialect rather than in an arcane language. This enables it to be used in partnership with the project staff and other stakeholders. By not requiring the development of a new metric every time there is a change of direction, it can be more responsive. Most of the advantages are indirect, arising from its participative and responsive use. The advantages don't occur automatically: to access them you have to plan and act accordingly.

(On the other hand, you are not limited to using responsive or participative methods. Qualitative information migrates more easily along the rigour-relevance continuum that does quantitative.)

To continue the summary... To some extent, these advantages risk undermining the rigour that quantitative research offers. Some of that rigour, however, can be recaptured by using a structured process that embodies dialectic.

It appears, then, that the trade-off between rigour and relevance is by no means absolute. There are other trades, however, both involving relevance.

The first of them is to do with structure. I have argued that structured dialectic can provide rigour which may otherwise be lacking. The cost, if you are not careful, is a loss of responsiveness. To achieve structure requires planning; but plans set in concrete destroy responsiveness. To escape from this, you can include lots of contingencies in your plans. More importantly, you can include replanning sessions as regular planned events.

The other trade-off is more problematic, and less easily evaded. It is between two varieties of relevance, which might be called local and global. Local relevance is relevance to the stakeholders. Global relevance, akin to generalisability, is relevance to the wider research community. For the most part, responsiveness and participation increase local relevance at the cost of global relevance.

The same trade-off exists between local and global credibility. With reasonable processes, negotiated with the stakeholders, the finding have high credibility for those who shared the responsibility for the evaluation. But sometimes the data or outcomes mean little to those who didn't take part: an example of the "you had to be there" phenomenon.

If your main concern is action then global relevance may not be an issue. The research component is directed at the understanding of the participants. This is achieved with relative ease, and can actually enhance the action. It is when the research outcomes are your major goal that global relevance becomes important.

It is easy to overstate the issue and assume that quantitative research has few problems with generalisability. However, you cannot assume that research which is qualitative and participative and responsive safely generalises to other settings. This may hinder its publishability, and for some people this has to be a concern.

There are two escape strategies. Publish methodological papers (the methodology is probably generalisable even when the specific content of the study is not). Alternatively, seek the cooperation of your participants in including some marker variables in the study so that you can compare it to other studies. I might mention in passing that the use of a structured approach may improve global relevance at some risk to local relevance.

Qualitative and quantitative

For the process evaluation component usually only qualitative information is gathered. The outcome component, however, makes use of both quantitative and qualitative in conjunction. The convenience, repeatability and comparability of the quantitative information usually also carries costs in the form of its indirect relationship to the targets and ideals. Adding qualitative data compensates for this.

Similar comments may be made about the short cycle evaluation. In fact, the short cycle component can be used as a partly-qualitative substitute for other continuous improvement methods such as total quality management.

In brief, you don't have to choose between qualitative and quantitative. You can combine them. And it is often advantageous to do so.

Other considerations

In what I have discussed, I have concentrated on the strategic research aspects of qualitative evaluation. I have made light of the intervention component despite its importance. I provided little information about the tactical level, or about the evaluator's relationships and style, although they are crucial. So far I have completely overlooked the motivations of the evaluator, which are fundamental.

On the one hand, your motivation can be to control the project staff -- to ensure that they are doing what you or some other person wants them to do, rather than what they want to do. When this is so, you can assume that their motivation will be to subvert your attempts. Humans are innovative and enterprising creatures, and they will find ways to defeat your evaluation.

On the other hand you can be motivated by a desire to improve their own control of their task and their performance. Most of them will then share this motivation. You can use the responsiveness and flexibility of qualitative evaluation to enable people to do what they, as mature adults, wish to do. To my mind, that is when qualitative evaluation is most valuable.

Notes

"Teacher" is a misnomer, in my view. I don't think effective "teachers" teach. I think that learners learn. To my mind, that's an important distinction. [ back ]

I learned the evaluation process described here from Wes Snyder, hence the name. I have since modified it somewhat, so Wes is not responsible for its present form. However, I think the major features of it I learned from him, and I think he would still approve of the version described here. [ back ]

This is similar though not identical to what is usually called "formative evaluation". [ back ]

This is similar though not identical to what is usually called "summative evaluation". [ back ]

Activities and effects are defined together, and so there is little point in comparing them. [ back ]

The conditions are those which cover instrumental conditioning as explored by Skinner and his followers. Yes, I'm enough of a behaviourist to believe that behaviour is shaped by rewards and penalties. However, for people I assume the important rewards and penalties are more often in the form of feelings than of material rewards. [ back ]

For those interested, some of the micro-processes for collecting and interpreting information in a group setting are described in Bob Dick (1991), Helping groups to be effective, second edition. Brisbane: Interchange. [ back ]

_____

Copyright (c) Bob Dick 1995-2000. This document may be copied if it is not included in documents sold at a profit, and this and the following notice are included.

This document can be cited as follows:

Dick, B. (1997) Qualitative evaluation for program improvement [On line]. Available at
http://www.uq.net.au/action_research/arp/qualeval.html

navbar 4

Maintained by Bob Dick; this version 1.05w last revised 20000105

A text version is available at URL ftp://ftp.scu.edu.au/www/arr/qualeval.txt