Is data plural?

You might be surprised how often scientists, engineers, computer programmers, and statisticians argue over whether the English word data ought to be treated as a singular or a plural noun. Surely they should be spending their time making discoveries and inventing things? Ah, but what you are forgetting dear, naïve reader, is that scientists, programmers and their friends tend toward an acute interest in details. And of course we’re glad they do lean that way, so let’s forgive the navel-gazing. (I hope you’ll forgive my gazing at the navel of scientific discourse in this post.)

“The data were analysed using analysis of variance.”

“The data was analysed using analysis of variance.”

Which is right?

Distressingly passionate arguments have been made on both sides. Perhaps the most complete treatment is that made by Norman Gray, an astronomical scientist from the University of Glasgow. His post title tells you all you need to know: Data is a singular noun.

The arguments rehearsed by Dr Gray and others can be summarised thus:

  1. Data is a Latin word, and in Latin it is the neuter plural past participle of the verb dare, to give. If it was plural in Latin, it should be now in English.
  2. English shouldn’t be bound by the outmoded rules of Latin. We’re speaking English, after all, not Latin, and in English we’ve become used to data as a singular collective noun.

I have more truck with the line of argument in 2. We certainly shouldn’t blindly follow Latin grammar rules. Allow me to suggest some rules of thumb for good English grammar:

  1. Learn the rules before you break them.
  2. Don’t break commonly held grammatical rules for no reason at all.
  3. Remember C.S. Lewis’s words: ‘”Good English” is whatever educated people talk; so that what is good in one place or time would not be so in another.’ Which is to say, good grammar is better informed by common usage than arcane rules.

Oddly, Dr Gray and I agree on these points. After summarising how agendum has lost its singular and been replaced entirely by agenda, likewise stamina and media, he tells us,

When you read in the middle of a sentence ‘…the data are analysed by…’, you stumble: your subconscious grammatical consistency checks raise an alarm! – you have misparsed them (yes, like that).

Perhaps I’m weird but I don’t stumble. My subconscious raises not an iota of alarm. Perhaps this is why:

Google Scholar Searches for the exact phrases “the data are” and “the data is”  from the year 2000 return about 1,190,000 and 762,000 hits respectively (correct in April 2012). That’s 1.56 times as many uses of the plural. If we agree with C.S. Lewis that we should accept as ‘good’ whatever educated people routinely do, we should surely acknowledge that:

  1. Data should be treated as a plural noun in English.
  2. The plural usage only just wins, so we should probably not be distressed when colleagues and students use data singularly.