Choosing a statistical test: A cheat sheet

Students who are new to statistics tend to find it tricky to remember which test to use under which circumstances. The following diagram is intended as a decision aid only. There are many many more statistical tests that are not shown here, but these are the basic ones most commonly taught on psychology courses.

Even if you end up finding out that you need a more complex or niche statistical test, an aide memoire like this can still be useful if it reminds of the relevant keywords to search.

Remember, always check the assumptions of the test you choose.

Alternatives to pro-rating for missing data

Behavioural scientists often collect data using multi-item measures of latent constructs. For instance, clinical psychologists measure anxiety and depression using self-report questionnaires composed of multiple psychometric items, whilst child psychologists measure developmental progress by asking parents batteries of questions. Missing data are extremely common on such questionnaires, and one usually finds that data are missing at the item level. In other words, participants miss out items, either by accident, or because they don’t want to answer certain questions. Thus, researchers are left with some data relating to a given construct for a particular participant, but not complete data.

The usual solution is pro-rating, where the mean of the completed items for a given scale (or sub-scale) is taken as the imputed value for any missing items on that scale. If you’re not familiar with the practice, I’ve written a post about it. As Mazza, Enders, and Ruehlman (2015) point out, this pro-rating practice is common across most of the behavioural sciences, and especially in psychology. But, as they and others also discuss, there are potentially serious issues with the practice.

Pro-rating makes a number of assumptions about the structure of the data. To my mind, for pro-rating to be an ideal solution, one needs make the following assumptions:

  1. The domain coverage of the completed items overlaps sufficiently with the domain coverage of the missing items.
    Some scales explicitly sample behaviours from a variety of related domains and therefore breach this assumption. The most obvious examples are scales for mental health conditions which map items to diagnostic criteria. Answering a question about whether you struggle to sleep is a relatively poor stand-in for an item about whether you have lost the ability to enjoy things you once did.
  2. The scale has good internal consistency.
    The basic idea behind pro-rating is that within a scale, items are so highly correlated with each other that they can to some degree stand in for one another. The basic idea therefore falls apart if the scale has poor internal consistency (i.e. poor inter-item correlations).
  3. A high proportion of the items are complete and can be used to calculate the scale score.
    Estimating someone’s IQ from 90% of the items in an intelligence test is one thing, but giving them only 5% of the items and then pro-rating the rest would clearly be inferior. Nobody seems to agree on what the cut-off should be, but pro-rating 20% missing data seems to be routine (Mazza, Enders, and Ruehlman, 2015). Graham (2009) suggests proration might be considered more reasonable the more items used, and that we should never use less than half (!) the scale items.
  4. The mean score on the missing and completed items are similar.
    Put another way, average item difficulty is reasonably well matched across the missing and completed items. See Enders (2010).
  5. Factor loadings or item-total correlations are similar across missing and non-missing items.
    One way to think about this is that we would be introducing more error into the measurement if the missing items were those with the best loadings on the factor. Again, see Graham (2009).

Reading the above list, anyone familiar with a handful of psychometric scales will readily see that these are not always reasonable assumptions. But you will also see that it’s fairly straightforward to check these assumptions.

Pro-rating also causes two notable side-effects, of which researchers should be mindful:

  1. The definition of your scale now varies across participants. It no longer has k items. It has a different number of items relating to the amount of missing data for each participant. If your data are not missing at random (because, say, someone who scores high on a certain construct is unwilling to answer some of the questions relating to that construct) you may have extra problems to worry about.
  2. Pro-rated data will artificially inflate estimates of internal consistency. This is because you have just created values (for participants with missing data) as a linear composite of other values for that participant. For this reason, any estimate of internal consistency should be calculated without pro-rating.

A good many methodologists have therefore suggested that researchers should prefer other methods of dealing with missing data. These other methods are now generally possible in a range of statistical packages, but they are more computationally complex.

More importantly, at the time of going to press, they’re generally not available. SPSS and Jamovi are the two most used statistical apps in psychology. SPSS does include multiple imputation, but Jamovi does not. Neither of them (at time of going to press) includes Full Information Maximum Likelihood options.

I am not a statistician, and nor do I play one on the internet. But I am a working psychological scientist, and in my humble opinion, it is acceptable to make the commonplace and pragmatic decision to use prorating, so long as the above assumptions are checked, and the above warnings are taken into account. It goes without saying that these things should also be addressed in our reporting.

References

Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford Press.

Mazza, G. L., Enders, C. K., & Ruehlman, L. S. (2015). Addressing Item-Level Missing Data: A Comparison of Proration and Full Information Maximum Likelihood Estimation. Multivariate Behavioral Research, 50(5), 504-519. doi:10.1080/00273171.2015.1068157

Computing variables and pro-rating in SPSS

The basics

SPSS allows you to compute new variables, based on existing ones. This is really useful if, for instance, you want to create a total score for a psychometric scale or other questionnaire.

⚠️ Be sure to read this post on assumption checks you should perform.

You access the Compute Variable dialogue box from the Transform menu…

Here’s what each part of the window is for…

Now let’s say that you want to simply add up three variables to make a total score. You just need to type a sensible name in the left-hand column, and then pace each of the variables in the right-hand ‘Numeric Expression’ box, with a plus between them. (Just like in normal maths, the + tells SPSS to add them up.)

You might find that the SPSS Output window pops up, showing you the code it ran in the background when you clicked ‘OK’ but you can ignore that. Go back to your dataset. At the end, on the right, you’ll find a new column, with the name you just created, and your calculation carried out.

Pro-rating for missing data

This works great for many things, but it does have one drawback. What if you have missing data? For instance, often when we run a survey study, some participants don’t want to answer certain questions, or they are having a careless moment and just skip over one or two questions.

If you have a missing value amongst the values you’re trying to add up, SPSS will refuse to add them up, because, after all, it wouldn’t be a true answer for any participant who has a missing value.

Why does SPSS do this? Well, imagine that we’re adding up 10 questions on a depression scale and that each question might get an answer from 0 (happy) to 10 (depressed). What if Jo Bloggs answers a 10 for half the questions, but then fails to answer the rest? As a human being looking at these scores we might think that Jo is very depressed (maximum score on the ones they did answer) and perhaps even so distressed they couldn’t bring themselves to answer the rest of the questions. If you just added up a total score, though, Jo would get 50 out of a hundred, because of those five questions they left blank, and we might wrongly think they were middling on our depression scale. To prevent us from making this rookie mkstake, SPSS refuses to do the calculation when there are missing values.

One way around this is to calculate means instead of totals. SPSS understands that means are still meaningful (forgive the pun) even when we have some missing data. For instance, if we had taken a mean of Jo’s five answers, we would have got 10 (depressed) overall, despite the five missing values. Thus, SPSS ignores missing values and just storms ahead and calculates means whenever we ask it to. Here’s how we ask it to calculate a mean using syntax…

COMPUTE OurNewVariable = MEAN(var1, var2, var3).
EXECUTE.

Notice that we use the function MEAN() and we put our variables inside the parentheses, separated by commas. (The spaces after the commas are optional, but they do make things look nicer.) You can put as many variables inside the parentheses as you need to, so long as you put a comma between them.

If you want to do this with the graphical user interface, you just put the MEAN(var1, var2, var3) bit in the ‘Numeric Expression’ box.

You can see from the screenshot just above, SPSS has calculated a mean, even though it wouldn’t calculate a total for the second participant in my fake dataset.

Even though it’s useful to know that SPSS does still calculate a mean, this can cause the opposite problem, that it does calculate a mean even when you have so much missing data for a participant that you really shoudn’t be!

In psychology, when we have missing values on a single psychometric scale (a measure of a single construct), it’s fairly usual to prorate (or pro-rate). The usual way to do prorating, before the complete adoption of statistical software, was to take the mean of all the items on a scale (or sub-scale) that the participant did answer, and put that mean in place of each missing value. E.g. if Alex answered 1, 5, __, 3, then you’d take a mean of 1, 5, and 3. You’d calculate that it’s 3, and so you’d enter ‘3’ into the blank cell. That method works, but it’s very time consuming and it also has some drawbacks. For instance, there are some analyses (like Cronbach’s alpha) where we shouldn’t ever use prorated scores.

There’s no need to use the manual time-consuming method in SPSS because you can simply calculate the mean of a group of items even when there are missing values, and the outcome is mathematically equivalent to manual pro-rating. It’s also a heck of a lot easier and quicker and less prone to human error.

However, SPSS will still do this even if most of the data for a participant are missing, and in that case, obviously it would make no sense. Most psychologists set a limit of 20% for missing data that can be ‘replaced’ or ‘pro-rated’. If you have more missing data, the results you’ll get from that participant are much less likely to be reliable. Pro-rating is still somewhat controversial amongst psychologists, even though many have considered it usual practice for decades. My take is that the evidence is good enough to suggest that pro-rating for even 20% missing data gives sufficiently accurate estimates of most constructs that we need not worry about it too much. See here and here.

Remember though that logically, a scale with lower internal consistency (i.e. its items don’t correlate highly amongst themselves) will produce less reliable estimates after pro-rating for missing data because items are less perfect stand-ins for each other. And indeed there is evidence from simulation studies which suggests that pro-rating, as a procedure, makes unrealistic assumptions about the structure of psychological data and may therefore produce biased estimations. If you’re completing an undergraduate dissertation, pro-rating is almost certainly enough.

If you’re long beyond undergrad level, don’t forget to read this post on assumption checks you should perform.

Pro-rating for only a certain percentage of missing data

Even if we choose the simple ‘calculate the mean for me’ approach in SPSS, a problem arises in that SPSS still calculates the mean, even when there are so many missing values for a participant that it would be silly to do so. However, you can prevent SPSS from doing this. In the Compute Variable dialogue (or in Syntax, if you’re geeky) you can add a number to the MEAN() function to tell SPSS how many valid entries there must be for a participant before it should calculate a mean.

Let’s say I’m working with a scale that has ten items, and I have decided that it’s OK to pro-rate up to 20% of my data. That’s equivalent to saying I don’t mind if participants have two answers missing from this scale, and so, I need to tell SPSS to only calculate the mean where there are 8 or more valid answers. Instead of typing MEAN() I therefore type MEAN.8().

MEAN.8(Vbl1, Vbl2, Vbl3, Vbl4, Vbl5, Vbl6, Vbl7, Vbl8, Vbl9, Vbl10)

Last but not least, if for some reason you really wanted total scores, instead of means, you can still use the MEAN() function. You simply need to multiply by k (the number of items in the scale or sub-scale), like this…

MEAN.8(Vbl1, Vbl2, Vbl3, Vbl4, Vbl5, Vbl6, Vbl7, Vbl8, Vbl9, Vbl10) * 10

Why I am a Coaching Psychologist

Last semester, a student asked me “Why are you a Coaching Psychologist, and not a therapist, when you clearly know so much about psychotherapy?” (I paid him for the flattery later.) I gave a rather glib answer at the time, and since then I’ve been wanting to find time to marshall my thoughts and be honest and upfront about my reasons for preferring to be part of a young upstart of a discipline, rather than adopting a label with a century-long pedigree. Today’s the day.

I think Freud did the world a great service. He did more than most to illuminate how we have rather little conscious awareness of the causes of our own behaviours. Freud called this the unconscious mind, which isn’t a term I love, but his work served to orient society to the fact that we aren’t all rational word-and-number-crunching machines who simply do what’s best for ourselves and society. As it happens, Freud seems to have been most proud of a different aspect of his work, namely the ‘discovery’ of mental structures like the id and ego. Personally, I think these ideas are pretty near useless in practical terms, though they may be interesting to talk about.

Later in the twentieth century, Carl Rogers started the second huge school of therapy, and his central tenet was that it is the nature of the relationship between the therapist and the client which helps the client overcome problems. I think the evidence for a non-specific effect in psychotherapy is unarguable, but psychologists typically earn more money than counsellors or therapists, so I’d like to imagine that we can offer something different to justify the price tag.

What I mean to say is that I have a huge deal of respect for psychotherapy. I wish more people would see psychotherapists instead of seeing their family doctor and being prescribed anti-depressants, anxiolytics, and goodness knows what else. Psychotherapy works.

Issue 1: The applicability of the medical model to the behavioural domain

The Oxford English Dictionary defines therapy as, “The medical treatment of disease; curative medical or psychiatric treatment.” This probably fits with the idea most British (dare I say most European) folk have about therapy. The United Kingdom Council for Psychotherapy gives a slightly broader definition: “talking therapy  … helps people with emotional, social or mental health problems.” Definitions like these, to a greater or lesser extent, suggest that any ‘therapist’ is working to cure a disorder or disease. In fact, there is a huge debate in the worlds of psychology and therapy about the extent to which it makes sense to apply ‘the medical model’ of illness to the psychological domain. The evidence that there’s something chemically wrong with the brain that causes depression really isn’t very impressive. Some therapists will say they are helping fix a disorder, some dislike even the word ‘disorder’. Some psychologists will assess a client and then give a diagnosis like ‘major depressive disorder’, whilst others think this entirely inappropriate and bad science.

The very word ‘therapy’ however, comes from Greek and means ‘healing’. There’s no getting away from the fact that therapy was and is generally intended to help people who are struggling. When you see that someone has lost their job because they can’t get out of bed in the morning without hitting the bottle, it does feel to the observer like something must be ‘wrong’. Whether or not calling this a disease or a disorder is helpful from a scientific point of view is a debate for another day. In simple lay language, we have sympathy and we imagine that this person must have substantial pain. Some psychologists have always dedicated their time and energy to understanding those events and circumstances that mean people feel they are struggling with life. As a result, I believe, we live in a more humane society, with better ways of helping and supporting people who are seriously struggling to function in society. And let’s be honest, “seriously struggling to function” describes pretty much all of us at one time or another. I’ve done my fair share of this type of work, and I’ve had my fair share of these types of experiences.

What if John Doe was functioning well, with a successful job, but he wanted to be amongst the elite? What if he wanted to start a business that would fund a very comfortable lifestyle for himself and his family? What if Jane Doe, a very able student, felt really driven to be a great pianist, playing in concert halls? Would we want to say there is something ‘wrong’ or would we more often applaud their drive, their motivation, and their dedication? What if Peter wanted to lose weight, but found that he kept giving up on diets and exercise plans because his life was too busy with other commitments? What if John, Jane, or Peter felt that they currently didn’t have the psychological skills and wanted support from an expert in how to motivate themselves, how to learn faster, or how to improve their relationships with peers and customers? Would the natural response be to say “go and see a therapist”? Unless you are a therapist or psychologist you probably answered, “of course not.” (In fact, a really good therapist would often be able to suggest some quite useful things in these areas, though it might not be what they specialise in.)

Psychology is a very very broad discipline. Some psychologists can tell you about the most intimate behaviour of rats, whilst others analyse the patterns of behaviour shown by millions of people when they act together on Facebook. There have always been certain psychologists who have dedicated their lives to studying those people who seem to be getting on exceptionally well in life — the creative, the genius, the talented, the successful. Recently, much of this work has gained new momentum and has been given two labels. The basic science is often called Positive Psychology and the applied work of showing clients how to use these principles in their lives is usually called Coaching Psychology.

Issue 2: Going from ‘good’ to ‘great’ isn’t the same as going from ‘struggling’ to ‘good’

Remember Jane, who wanted to be a pianist? She’s done as many piano grades as it’s possible to do and she’s currently working with a piano teacher whose students have graced the stage at Carnegie Hall. Her cousin, Zach, age twenty five, wants to start learning the piano. Will Zach need the same teacher? My guess is that if Zach paid to see the same teacher, she’d probably refuse to see him because teaching beginner piano isn’t her speciality. In fact, the chances are that a really great teacher who often works with beginners will get faster progress out of Zach, because that teacher will know much more about the problems that trip up beginner pianists. It isn’t that the ‘beginners’ teacher is a lesser teacher, it’s that we each get good at helping people if we’ve worked with lots of people like them before.

Now, let’s not take this too far. It’s not a perfect analogy. The parallel is only approximate, though the relationship between coaching psychology and therapy is something like the above example. Therapists and coaching psychologists might sometimes use the same principles and even practices, but there are often considerable differences. If a therapist is working with someone in the depths of depression who can hardly get themselves out of bed in the morning, the therapist and the client might both describe this as an issue of motivation. Similarly, a coaching psychology client who is currently ‘only’ putting in 60-hour weeks might want to know how to increase his motivation. Simply lumping these challenges together and calling them ‘motivation’ is a bit like putting Zach and Jane in the same box and calling it ‘playing piano’. A decent coaching psychologist could probably help somewhat, in principle, with the client who’s depressed, and the therapist could probably make some decent suggestions for the guy who wants to have productivity superpowers, but that doesn’t mean it’s the best possible match. The great innovation of the Industrial Revolution was specialisation. It applies in psychology too.

 

The sin of the perpendicular pronoun

Many psychologists, myself included, want to make psychology more like the natural sciences. Psychology, as a scientific discipline, may be a couple of hundred years younger than chemistry or physics, but with cautious work we will be able to reach the same level of replicability and confidence in our findings, despite the complexity of our subject matter. Unfortunately (and for some reasons too complex to delve into here, perhaps deservedly), most scientists do not regard much of psychology as scientific. The result is that psychologists find themselves on the cusp of scientific respectability, and perhaps that is why psychologists are sometimes behind the curve when it comes to changes in scientific practice. One such change has recently occurred in science writing. It’s not a big thing, but it’s something we psychologists should be especially au fait with, and most of us seem to be pretending it didn’t happen. 

In the 1920’s, British and American scientists began to move away from straight-forward constructions like “we ran an experiment” in favour of unpleasant passive voice constructions like “an experiment was run”. Though this was never a complete take-over, and though there has recently been a strong trend to the reverse, high school psychology teachers and many university lecturers are still encouraging students to write in the passive voice.

To some extent, this continued pedagogic trend is understandable. Most psychology courses require no serious prior training in science and so students turn up having studied humanities and literature. In other words, they are often used to modes of writing where an individual is the focus of a narrative. Teachers must somehow encourage a subtle mental shift, to get students focussing on the procedure, the apparatus, the observations, and the data, rather than on their opinion that EEG machines are a bit tricky to calibrate.

A great many psychology lectures therefore ban the use of the first person and encourage the use of the passive voice. This is a quick fix. Students can no longer write,

We didn’t like using the blue conductive gel because it was smelly, so we used the clear one.

and find themselves, as if by magic, writing,

The blue conductive gel was unacceptable to the participants due to its pungent odour and so another brand was substituted.

Well, not quite. In fact, what often happens is that students get their verbs in a knot and write utter garbage, but let’s be charitable for a moment.

In fact, since it’s accepted as standard practice in most psychology departments, most of us have no real first-hand experience of whether it’s our instruction not to use the first person or our other wonderful constructive feedback that improves students’ writing the most. We have conflated our variables.

I would have no problem with this little pedagogic foible if students saw through it later, just as a young chemist realises that ‘covalent’ and ‘ionic’ are useful shorthands for the ends of what is in fact a continuum. But they don’t. I have recently received two peer reviews of my work, and seen three peer reviews of other people’s work where “please avoid the use of the first person” was one of the main criticisms. Why? Are you scared that your false air of scientific respectability will somehow be punctured if we admit that the experiment didn’t just develop consciousness and run itself?

There are several problems with the passive voice, though the ones that annoy me are:

  • Constructions in the passive voice are often longer
  • It is sometimes unclear who is doing what when sentences are written in the passive voice
  • Agency is often given to inanimate things
  • The passive is often more complex and students then interpret a preference for the passive as a preference for linguistic pomposity and grandiloquence.
  • If we accept that the first person is terribly unprofessional, we are implying that some of the greatest scientists of all time were poor communicators. Francis and Crick’s groundbreaking 1953 paper on the structure of DNA was written in the first person. Their paradigm-changing paper started, “We wish to suggest a structure for the salt of deoxyribose nucleic acid (D.N.A.). This structure has novel features which are of considerable biological interest.” It was readable. Moore (2000) even suggests that it may owe some of its impact to the clean and readable nature of its prose. Darwin’s On the Origin of Species was written in the first person. Miller’s famous psychology paper “The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information,” was written in the first person, with no fewer than 52 occurrences of the perpendicular pronoun.
  • Worst of all, restricting the use of the first person is a blunt pedagogic tool. Doing so doesn’t reorient students to care more about data than researchers. They still write “so and so said” instead of “so and so collected data on”.

This is not only a problem in psychology. Scientists from other disciplines have bemoaned the problem over the last decade or two, but in psychology it seems to be accepted knowledge, as though there were not even a debate to be had.

Let’s set the record straight.

The American Psychological Association’s Publication Manual has included a recommendation on the use of the passive voice at least since the fifth edition. Here’s what the sixth edition says:

Prefer the active voice.

Preferred:

We conducted the survey in a controlled setting.

Nonpreferred:

The survey was conducted in a controlled setting.

Not convinced? Here’s what Nature’s editorial board has to say on the issue:

Nature journals prefer authors to write in the active voice (“we performed the experiment…”) as experience has shown that readers find concepts and results to be conveyed more clearly if written directly. (Link)

In fact, psychologist Rupert Sheldrake reports on an informal survey he conducted of peer-reviewed journals, and the vast majority accepted manuscripts written in the first person, with a few, like Nature, preferring that style.

The alternative is clear. Science is done by scientists. There should be no shame in admitting this. Objectivity in science comes from replication by other teams — both theoretical and strict — not from the grammar we use to describe our work.

Now repeat after me … We conducted an experiment…

Data or datum? Let the data decide.

You might be surprised how often scientists, engineers, computer programmers, and statisticians argue over whether the English word data ought to be treated as a singular or a plural noun. Surely they should be spending their time making discoveries and inventing things? Ah, but what you are forgetting dear, naïve reader, is that scientists, programmers and their friends tend toward an acute interest in details. And of course we’re glad they do lean that way, so let’s forgive the navel-gazing. (I hope you’ll forgive my gazing at the navel of scientific discourse in this post.)

“The data were analysed using analysis of variance.”

“The data was analysed using analysis of variance.”

Which is right?

Distressingly passionate arguments have been made on both sides. Perhaps the most complete treatment is that made by Norman Gray, an astronomical scientist from the University of Glasgow. His post title tells you all you need to know: Data is a singular noun.

The arguments rehearsed by Dr Gray and others can be summarised thus:

  1. Data is a Latin word, and in Latin it is the neuter plural past participle of the verb dare, to give. If it was plural in Latin, it should be now in English.
  2. English shouldn’t be bound by the outmoded rules of Latin. We’re speaking English, after all, not Latin, and in English we’ve become used to data as a singular collective noun.

I have more truck with the line of argument in 2. We certainly shouldn’t blindly follow Latin grammar rules. Allow me to suggest some rules of thumb for good English grammar:

  1. Learn the rules before you break them.
  2. Don’t break commonly held grammatical rules for no reason at all.
  3. Remember C.S. Lewis’s words: ‘”Good English” is whatever educated people talk; so that what is good in one place or time would not be so in another.’ Which is to say, good grammar is better informed by common usage than arcane rules.

Oddly, Dr Gray and I agree on these points. After summarising how agendum has lost its singular and been replaced entirely by agenda, likewise stamina and media, he tells us,

When you read in the middle of a sentence ‘…the data are analysed by…’, you stumble: your subconscious grammatical consistency checks raise an alarm! – you have misparsed them (yes, like that).

Perhaps I’m weird but I don’t stumble. My subconscious raises not an iota of alarm. Perhaps this is why:

Google Scholar Searches for the exact phrases “the data are” and “the data is”  from the year 2000 return about 1,190,000 and 762,000 hits respectively (correct in April 2012). That’s 1.56 times as many uses of the plural. If we agree with C.S. Lewis that we should accept as ‘good’ whatever educated people routinely do, we should surely acknowledge that:

  1. Data should be treated as a plural noun in English.
  2. The plural usage only just wins, so we should probably not be distressed when colleagues and students use data singularly.

The emotional scientist

Humility is the defining characteristic of science. To admit that you don’t know, early and often, is contrary to most people’s psychological make-up. It means that science, the dispassionate discipline, requires us to battle through emotions (like pride) to get at the truth.