For some researchers, monitoring citation statistics and journal 'impact factors' is an intensely serious business. In many German universities, for instance, the impact factors of the journals in which scientists publish their work are tallied, and the data plugged into formulae that directly influence the funding given to individual departments. Worldwide, citation statistics are increasingly being used as a convenient metric to assess the quality of scientists' work.

Researchers with expertise in bibliometric analysis have long pointed out the potential pitfalls, and go to considerable lengths to validate the data used in their studies. But when citation statistics get into the hands of non-experts, rigour frequently flies out of the window. Comparing the impact factors of journals is meaningless, for instance, if they serve different disciplines in which widely different citation practices may prevail. But in the drive to rate scientists' performance, such comparisons are sometimes carried out.

The effective monopoly supplier of such statistics is the Philadelphia-based ISI, formerly known as the Institute for Scientific Information, and now owned by the Thomson Corporation. The ISI cannot be held responsible for the uses to which its statistics are put, and its web pages include a commentary on the limitations on citation data.

However, the ISI is accountable for the accuracy of the statistics it publishes. Subscribers to the ISI newsletter Science Watch will notice that its latest issue contains an apology to Nature and the International Human Genome Sequencing Consortium. In statistics published on the ISI's website, citation counts for the landmark paper describing the consortium's sequencing of the human genome (Nature 409, 860–921; 2001) were so low that it was absent from the lists — of questionable use, even on a good day — of 'hot papers' in biology, which are published regularly in Science Watch.

When puzzled Nature staff examined the ISI's data, they found that citations to the paper had been grossly undercounted. The same, it emerged, applied to several other prominent papers authored by a consortium, rather than by a list of individuals. To its credit, the ISI reacted promptly, amending its website and commissioning a journalist to write an account of the episode for Science Watch. The paper by the International Human Genome Sequencing Consortium now sits at the top of the hot papers list.

That would be that, were it not for further investigations that reveal other problems, this time with journals' impact factors — a measure of the average number of citations per paper. A journal's 2000 impact factor, for instance, is calculated by first counting the total number of citations made in that year, counted across all publications scanned by the ISI, to papers in that journal that were published in the preceding two years. This figure is then divided by the total number of items in the journal over the same two-year period that are deemed by the ISI to be citable — usually original research papers and review articles.

Preliminary studies by Nature have uncovered errors in the denominators used by the ISI that would seem to invalidate some previously published impact factors. For some of the journals published by the Nature Publishing Group, for instance, the number of citable papers tallied by the ISI has been inaccurate, leading to spurious variation in impact factor from year to year.

It is not the purpose of this article to put the record straight — that would require further detailed analysis. And for many scientists, the accuracy of a particular journal's impact factor will not be their most pressing concern. But these examples highlight an even greater need than previously realized — by us, at least, we confess — to check the ISI's data. Researchers, policy-makers and publishers who depend heavily on citation statistics should be urged to treat them with greater caution. And, it would seem, the ISI has some further investigation to do.