Sunday, February 22, 2009

Voodoo Meta-Analysis


In my previous post I ran some simulations that explored how various summary scores of cluster-wise correlation magnitude are affected by cluster size. I showed that the peak correlation and the two-stage correlation yield systematically higher correlation estimates than either the median or minimum correlation in a cluster. I also made a statement about the magnitude of the bias in such summary scores that was based on a misunderstanding of the "non-independence" error as described by Vul et al.
in the "voodoo correlations" paper and in an in press book chapter by Vul and Kanwisher. I will return to my simulations and argue that they are indeed still informative, but first I want to discuss just what is meant by the "non-independence" error in neuroimaging, as defined by Vul and colleagues.

Understanding the "Non-Independence Error"

There is a common category of errors that often crop up in functional neuroimaging studies in which an ROI is selected on the basis of one statistical test and then a second non-independent test is carried out on the same data. This type of non-independence is often discussed in elementary neuroimaging tutorials and forums and is well-known and rather fiercely guarded against. It, moreover, involves two null hypothesis tests -- and therefore two statistical inferences, the second of which is biased to yield a result in favor of the experimenter's hypothesis. The category of errors referred to in Vul et al. subsumes, but is not limited to, these kind of "two hypothesis test" errors.

Consider the case of a whole-brain correlation analysis. One has just carried out an analysis correlating some behavioral measure x with with a measure of activity in every voxel in the brain. One has corrected for multiple comparisons and identified a number of "activated clusters" in the brain. So far so good. We have conducted one hypothesis test for each voxel in the brain. We are interested in finding where the significant clusters are located (if there are any at all) and we may also be interested in the magnitude of the correlations in the those active clusters.

If we have corrected for multiple comparisons, then we may safely report the location of the clusters in x,y,z coordinates. What we may not do, according to Vul and colleagues, is report the magnitude of the correlation. Neither may we report the maximum of the cluster. Nor may we report the minimum of the cluster. We may not chose any voxel in the cluster randomly and report its value. Let me go further, we may not substitute the threshold (t = 0.6, say) to serve as lower bound for the correlation magnitude. To report the magnitude of the correlation of a selected cluster, or any derivative measure thereof, is to commit the "non-independence error". [I note only in passing that if neuroimaging studies only ever reported the lower bound of a correlation (i.e. the threshold), no studies would ever report correlations greater than ~ 0.7].

One reason we may not (according to Vul et al.) report the magnitude of a correlation is because correlation estimates selected on the basis of a threshold t, will in on average be inflated relative to the "true" value. The reason for this is that above-threshold values are likely to have benefited from "favorable noise" and will therefore be biased upwards. The problem is akin to regression to the mean and is not specific to correlations or social neuroscience or even functional neuroimaging, per se. You can get an idea of the scope and generality of the concept in the recent chapter of Vul and Kanwisher -- which is an extended homily on the varieties of the "non-independence error" where you will, among other things, learn the virtue of not plotting your data:

The most common, most simple, and most innocuous instance of non-independence occurs when researchers simply plot (rather than test) the signal change in a set of voxels that were selected based on that same signal change.” (pg 5)

Vul and Kanwisher are also critical of several authors for presenting bar plots indicating the pattern of means in a series of ROIs selected on the basis of a statistical contrast. We are told that such a presentation is "redundant" and "statistically guaranteed"
(pg 11). I'll give you another example (this one I thought up all on my own) of Vul's non-independence error: reporting a correlation in the text of the results section and then also, quite redundantly and non-informatively, reporting the correlation separately in a table or figure legend. You see the range.

Before I begin with my critique of the Vul et al. meta-analysis, I just want to make it clear that the hypothesis that correlations in whole-brain analyses will tend to be inflated is quite reasonable. The other part of their hypothesis, that correlations are massively -- rather than, say, negligibly -- inflated needs to be backed up empirically. It is this aspect of their study -- the empirical part -- that I find unsatisfying (perhaps that is an understatement).

Voodoo Meta-Analysis and the Non-Independence Error

We have just remarked that in their in press chapter Vul and Kanwisher emphasize that the non-independence error is not limited to reporting of biased statistics, but may also involve the mere presentation of data that has been selected in a non-independent manner.

“Authors that show such graphs must usually recognize that it would be inappropriate to draw explicit conclusions from statistical tests on these data (as these tests are less common), but the graphs are presented regardless." (pg 8.)

Where else might we find example of such a misleading presentation of data? Sometimes it turns out that the "non-independence error" is lurking in your own backyard. Take for instance the meta-analysis presented in the "voodoo correlations" paper by Vul et al. (in press). The thesis of this paper is quite straightforward. First, the authors surmise that correlations observed in brain imaging studies of social neuroscience are "impossibly high". Second, because the magnitude of correlations are intrinsically important, scientists must also provide accurate estimates of the correlation magnitude -- something that is not necessarily guaranteed by null hypothesis testing alone.

To explore the question further, Vul et al. searched the literature for social neuroscience papers that reported brain-behavior correlations, and then sent a series of survey questions to the authors of the selected papers. On the basis of the authors' responses and other unknown considerations, they classified the papers as either using (good) independent methods or using (suspect) non-independent methods.

What constitutes a "non-independent" analysis, you ask? Studies classified as non-independent were ones that selected significant clusters and reported the magnitude of these activations based on a summary score (usually the mean or maximum value of the cluster). Let me be absolutely clear about this because there has been some confusion about this issue (I'm pointing at myself here). These studies did not perform two non-independent statistical tests. They performed one and only one correlation for every voxel in the brain. Because such analyses perform many correlations over the brain (tens of thousands), a correction for multiple comparisons is imposed, resulting in high statistical thresholds to achieve a nominal alpha value of 0.05. The key point is that in the Vul et al. meta-analysis, non-independent analyses are by and large synonymous with whole-brain analyses. That is a crucial element to the argument that follows, so take note of it.

An independent analysis, on the other hand, was defined as an analysis that used a region of interest (ROI) that was defined either anatomically, via independent functional ROI or localizer, or through some combination of both. As a consequence, such independent analyses usually only calculate just one or perhaps a handful of correlations, and therefore apply far more lenient statistical thresholds. For instance, in a large study with 37 subjects, such as the one by Matsuda and Komaki (2006, cited in Vul et al.), a correlation of 0.27 was declared statistically reliable at an alpha level of .05. A whole-brain analysis with the same number of subjects and a 0.001 alpha level would have required a correlation as least as great as 0.5 for a one-tailed test (in addition to whatever cluster extent threshold is applied). For a more typically sized 18 person study, the correlation would have had to be as large as 0.67 (one-tailed) to reach significance.

So What's Wrong with the Voodoo Meta-Analysis?

Let me count the ways.

Remember, the aim of the Vul et al. meta-analysis is to establish that non-independent methods produce massively -- not just marginally -- inflated correlations. The meta-analysis itself is fundamentally an empirical, rather than theoretical, endeavor. Let me remind you that studies classified as "non-independent" are all whole-brain analyses, and therefore involve corrections for multiple comparisons that necessitate a large correlation magnitude to achieve statistical significance. Those studies classified as independent do not impose such high thresholds. The upshot of this is that a whole-brain (non-independent) analysis will by definition never report a correlation less than about 0.5 (assuming a large 37 subject maximum sample). On the other hand, independent analyses, because of their greater sensitivity, will report correlations as low as 0.27 (assuming the same 37 subject maximum sample).

What does this tell us? The classification of papers in to "non-independent" and "independent" groups was guaranteed to produce higher correlations on average for the former than for the latter group, irrespective of whatever genuine inflation of correlation magnitudes may exist in the latter category.

The same result could have been produced with a random number simulation. Suppose I sample numbers randomly from the range -1 to 1. In a first run I sample a number and check to see if it's greater than 0.3, and store it in an array. I keep doing this until I've got about 25 values. In a second run I sample numbers from the same underlying distribution, but I only accept a number greater than 0.6. I then plot a histogram, showing how the first group of numbers are shifted to the left (plotted in green) of the second group of numbers (plotted in red). Note that I'll have to sample more numbers in the latter case to get to 30, but that's OK as I have an inexhaustible supply of random numbers to draw from. Compare this to the Vul et al. literature search which found approximately equal (30 and 26, respectively) numbers of independent and non-independent analyses even though the relative frequencies of the two classes may be very different. But Pubmed, like a random number generator, is inexhaustible.

There is a counterargument that Vul et al. might avail themselves of, however. They might argue that high thresholds and inflated correlations are inextricably linked. It is the high thresholds that lead to the inflated correlations in the first place. Unfortunately, the argument holds little water, as high thresholds would lead to high (significant) correlations even in the absence of any correlation "inflation", which happens to be consistent with the null hypothesis that the authors wish to reject (or persuade you to reject, as we shall see in the next section). Moreover, this argument, if seriously offered, would be a rather obvious example of "begging the question", a practice the authors strongly repudiate. Finally, the division of studies into the two groups is confounded by the differing sensitivities of the analyses, with non-independent studies sensitive only to larger magnitude correlations.


Voodoo Histogram


I would like now to everyone to turn to page 14 of the "voodoo correlations" paper where you may get acquainted with the most famous histogram of 2009, the Christmas colored wonder showing the the distribution of correlations among "independent" and "non-independent" studies that entered Vul et al. survey. What is the purpose of this histogram? Before we answer that question, let us return to the central theses of Vul et al. First, correlation magnitudes matter. And, second, that non-independent analyses produce grossly inflated correlations.

What evidence, other than a priori reasoning, do they adduce in favor of the inflation hypothesis? Well, as a starter, do they provide summary statistics, i.e. the mean or median correlation in the two groups? No. Do they perform a statistical test comparing the the two samples for a shift in the central tendency using, for instance, a t-test or a non-parametric test of some kind? No. Do they carry out an analysis of the frequencies distributed over bins with a chi-square or equivalent statistical test? No. Finally, if correlation magnitudes matter, why does it appear that the authors make an exception to that rule in the their own analysis which fails to report an estimate of the difference in correlations between the two groups? After all, how are we to know how serious the error is, if there is one at all? Do we care if the bias in correlation magnitude is .001 or .05 or even .1? Probably not very much.

Now, the reason for the omission of any statistical test or summaries, I think, is that Vul et al., being virtuous abstainers of the "non-independence error", believed they could avoid its commission by eschewing a formal test -- and therefore insulate themselves against the charge of non-independence. Instead, they reasoned, "we'll just present a green/red colored histogram and let the human color perception system work its magic". (Sadly, since the authors used red and green squares in their histogram, color blind social neuroscientists are mystified as to what all the fuss is about). Sometimes, however, it is enough to plot one's data to be accused guilty of the "non-independence error".

Let me remind you of a passage from Vul and Kanwisher (in press) which contains more of the wit and wisdom of Ed Vul.

"Authors that show such [non-independent] graphs must usually recognize that it would be inappropriate to draw explicit conclusions from statistical tests on these data (as these tests are less common), but the graphs are presented regardless. Unfortunately, the non-independence of these graphs is usually not explicitly noted, and often not noticed, so the reader is often not warned that the graphs should carry little inferential weight." (Vul and Kanwisher, in press, pg. 8)

I think that quote is a rather a nice summing up of the sad affair of the "voodoo histogram". The thing was based on non-independent data selection (due to the differing thresholds between the two groups and sundry other reasons described below) but was nevertheless used to persuade the reader of the correctness of the authors' main hypothesis. In the end, we do not know what to conclude from this meta-analysis, having been presented with no evidence in favor of the central hypotheses put forth by the authors. That the evidence was selected in a non-independent manner in the first place, due to the disparity in the statistical thresholds across groups, has a strange self-referential quality to it that reminds me of one of those Russellian paradoxes about barbers or Cretans and so on.

Cataloguing some of the Voodoo

The more I look at Vul and colleagues' meta-analysis the more perfect little pearls of "non-independence" turn up in its soft tissue. In the following sections I am simply going give you a taste.

1) Vul et al. classified studies as "independent" that selected voxels based on a functional localizer and then correlated a behavioral measure with data extracted from that ROI. The majority of such studies identified the ROI used for the secondary correlation analysis with a whole-brain t-test conducted at the group level in normalized space. It so happens that the magnitude of a t-statistic is influenced by both the difference of two sample means (or the difference between a sample mean and a constant) and the variance of the sample. Thus, ROIs identified in this manner will have taken advantage of favorable noise that will insure both large effects and small variance. As Lieberman et al. cleverly point out in their rebuttal to "voodoo correlations", low variance will inevitably lead to range restriction, a phenomenon that has the effect of artificially deflating correlations. Therefore, the studies labelled "independent" that used whole-brain t-tests to identify ROIs (the majority of such studies) were virtually guaranteed to produce reduced correlations, and therefore constitute another example of the "non-independence error" unwittingly perpetrated by the Vul et al. meta-analysis.

2) The meta-analysis fails to identify which studies reported the peak magnitude of a cluster and which studies reported mean correlations. Vul et al. repeatedly insist that "correlation magnitudes matter" and if this is the case it would be important to distinguish between those two sets. You may refer to my previous blog entry to see that average measures of correlation magnitude in a cluster hew towards the threshold, which is generally around .6 or .65 for a whole-brain analysis. On the other hand, "peak" values in a cluster are (by definition) not representative of the regional magnitude of the correlation estimate and, moreover, suffer from the problem of regression to the mean. It is very likely that virtually all the correlation estimates exceeding .8 come from studies that used the peak magnitude of the cluster. This is important to know! Remember, Vul et al.'s argument isn't to say that reporting only peak magnitudes is a bad practice, it's to say that reporting any summary measure in a selected cluster will result in massively inflated correlations. No evidence for that assertion is provided in the meta-analysis and critical information as to which summary measure was used for each study is not reported.

3) Localization of an ROI using an independent contrast is an imperfect process. Just as there is noise in the estimation of correlation magnitudes so too is there noise in the estimation of the "true location" of a functional area. Thus, spatial variation in the locus of a functional ROI insures that a subsequent estimate of correlation magnitude will be systematically biased downwards. It would take many repetitions of the same experiment in the same group of subjects to arrive at a sufficiently accurate estimate of the "true location" (insofar as such a thing exists) of a functional region to mitigate this spatial selection error.

4) Vul et al. do not consider the possibility that exploratory whole-brain correlation analyes are much more likely to find genuine large magnitude correlations than hypothesis-driven ROI analyses. Think about it. An ROI analysis is about confirming a specific hypothesis that an experimenter has about a brain-behavior relationship. It's a one-shot deal. If the researcher is wrong, he comes up empty. Whether a significant correlation is observed depends a lot on whether the scientist was right to look in that particular region in the first place. But remember, the brain is still a mysterious object and sometimes neuroscientists aren't quite sure where exactly to look. It is in these cases, generally, that they turn to whole-brain analyses. Consider the following. Suppose I have an hypothesis about the relationship between monetary greed and and brain activity in the orbitofrontal cortex. I define an ROI and perform a correlation between brain activity and some measure of the behavior of interest (greed). The correlation turns out to be 0.6. Now, what is probability that some other region in the brain has a higher correlation than the one I discovered? Well, since we know relatively little about the brain I submit that the probability is very close to 1. If that's the case, for every ROI analysis that discovered a correlation with magnitude r a corresponding whole-brain analysis, due to its exploratory nature, is likely to find those other regions that correlate more strongly with the behavioral measure than the ROI that was chosen on the basis of imperfect knowledge. The bottom line is that exploratory analyses have the opportunity, because they are exhaustive, to uncover the big correlations, while targeted ROI analyses are fundamentally limited by the experimenter's knowledge of the brain and the constraint of looking only in one location.

There are two parties looking to find a buried treasure. One party has a hazy hunch that the treasure in located in spot X. The other party, which is bank-rolled by the royal family, is composed of thousands of men who fan out all over the area, searching for the treasure in every conceivable place, leaving no stone unturned. If the first party's hunch is wrong, they strike out. The second party, however, by performing an exhaustive search, always succeeds in finding the treasure -- provided it exists.

5) A few more things to chew on. Vul et al.'s study included five Science and Nature studies combined, which accounted for 10% of the all studies (which means that these "big two" journals were vastly over-represented). Of those 5 papers, 13 out of the 14 correlations included were in the non-independent group. Second, of the the 135 correlations in the non-independent group, 22 came from a single study (study 11, see Vul et al. appendix). Of the 55 correlations that were greater than .7 in the non-independent group, a whopping 23% (13/55) came from this same study. The mean number of correlations from each study was 4.9 and the standard deviation was 4.4, meaning that 22 is nearly 4 standard deviations outside the mean and is therefore an outlier by anyone's standard. Remember that correlations drawn from the same study are non-independent and therefore including 22 correlations from a single study -- especially when that study contributed a disproportionate amount of correlations greater than 0.7, is rather dubious. Indeed, to avoid a non-independence error in this case, Vul et al. should have only chosen one correlation from each study -- and have chosen that correlation randomly.

6) The variance of an correlation estimate is related to the sample size of the study. And yet Vul et al. fail to report the sample size of the 54 studies that entered their analysis. This is a serious omission for obvious reasons that Vul et al. should have been attuned to. This is another potential commission of the non-independence error is Vul et al.

7) I'm getting tired so I will be brief on this last one. The selection criteria for the papers that entered the meta-analysis are poorly described. For instance, how did 5 Science and Nature studies get in to the sample? Is that an accident? If not, what was the rationale for choosing all those high profile papers? A meta-analysis should either be exhaustive or otherwise take pains to achieve a representative sample -- and, if the latter, then it is incumbent on the authors to describe the selection criteria and methods in detail. For instance, were the persons who selected the papers blind to the hypothesis? And so on.


References

Vul, E. Harris, C. Winielman, P. Pashler, H. Voodoo Correlations in Social Neuroscience. Perspectives in Psychological Science. In Press.

Vul E. and Kanwisher N. Begging the Question: The Non-Independence Error in fMRI Data Analysis. Book Chapter. In Press.

Lieberman, M. Berkman, E. Wager, T. Correlation in Social Neuroscience Aren't Voodoo: A Reply to Vul et al. Perspectives in Psychological Science. In Press.


Tuesday, February 17, 2009

Simulating Voodoo Correlations: How much voodoo, exactly, are we dealing with?

The recent article "Voodoo Correlations in Social Neuroscience" by Ed Vul and colleagues has gotten a lot of attention and has stimulated a great deal of discussion about statistical practices in functional neuroimaging. The main critique in the article by Vul involves a bias incurred when a correlation coefficient is re-computed by averaging over a cluster of active voxels that are selected from a whole-brain correlation analysis. Vul et al. correctly point out that the method will produce inflated estimates of the correlation magnitude. There have been several excellent replies to the original paper, including a detailed statistical rebuttal showing that the actual bias incurred by the two-stage correlation (henceforth: vul-correlation) is rather modest.

It occurred to me in thinking about this problem that the bias in the correlation magnitude should be related to the number voxels included in the selected cluster. For instance, in the case of a 1 voxel cluster the bias is obviously zero since there is only one voxel to average over. How fast does this bias increase as a function of cluster volume in a typical fMRI data set with a typically complex spatial covariance structure? Consideration of the high correlation among voxels within a cluster led me to wonder about the true extent of bias in vul-correlations. For instance, in the most extreme case, where all voxels in a cluster are perfectly correlated, there is zero inflation due to avergaing over voxels.

To explore these questions I ran some simulations with real world data. The data I used were from a study carried out on the old 4 Tesla magnet at UC Berkeley and consisted of a set of 27 spatially normalized and smoothed (7mm FWHM) contrasts in a verbal working memory experiment (delay period activation > baseline). The goal was to run many correlation analyses between the "real" contrast maps and a succession of randomly generated "behavioral scores". Thus, for each of 1000 iterations I sampled 27 values from a random normal distribution to create a set of random behavioral scores. I then computed the voxel-wise correlation between each set of scores with the set of 27 contrast maps. I then thresholded the resulting correlation maps at 0.6 (p = 0.001) and clustered the above-threshold voxels using FSL's "cluster" command. This resulted in 1000 thresholded (and clustered) statistical maps representing the correlation between a set of "real" contrast maps and 1000 randomly generated "behavioral scores".

Next, I loaded each of the 1000 statistical volumes and computed, for each active cluster, the minimum correlation in the cluster, the median correlation in the cluster, the maximum correlation in the cluster, and the two-stage vul-correlation. The vul-correlation was computed as follows: I extracted the matrix of values from the set of contrast maps for each cluster where (rows=number of subjects(27), columns=number of voxels in cluster) and averaged across columns, yielding a new vector of 27 values. I then recomputed the correlation coefficient between this averaged vector and the original randomly generated "behavioral variable" (all 1000 of which had been saved in a text file). Then I plotted cluster volume in cubic centimeters against its median, maximum, and vul-correlations. Here's the result.





What you can see is that vul-correlation rapidly increases as a function of cluster volume, reaching asymptote at a correlation of about .73 and a cluster volume of roughly 2 cubic centimeters. You can see, however, that the maximum correlation, which is not a two-stage correlation, has almost the exact same functional profile. The median correlation within a cluster also increases somewhat, but not as high or as rapidly as the vul- and maximum- correlations.

To quantify the "bias" in the vul-correlation as a function of cluster size I plotted the difference between the vul-correlation and median correlation.




It is clear from this plot that the bias becomes maximal when the cluster size is approximately 3 cubic centimeters. That is, however, rather a large cluster by fMRI standards. For a 1 cubic centimeter cluster the bias is about .075 and for a 1/2 cubic centimeter cluster (approximately 20 3 x 3 x 3 mm voxels) the bias is about 0.06. I'm not sure whether that rises to the level of "voodoo". Perhaps voodoo of a Gilligan's Island variety. Minor voodoo, if you like.

Lastly, I examined the minimum correlation as a function of cluster size. Of course, the minimum correlation can never fall below the cluster threshold, which was .6. Thus, I thought that the minimum correlation might serve as a good lower bound for reporting correlation magnitudes. You can see from the plot below that for these random simulations, at least, the minimum correlation does not increase with cluster size. In fact, it tends to approach the correlation threshold, which is not surprising, as this is what would be expected in a noise distribution. This time I've plotted cluster volume on a log (base 2) scale for easier visualization of the trend.






So, what have I learned from this exercise? First, the amount of inflation incurred from a two-stage correlation (vul-corrrelation) increases as a function of cluster size. For smallish clusters (1/2 to 1 cubic centimeters) this bias is not that much, whereas for larger clusters the bias is as high as 0.1. Second, the maximum correlation has a nearly identical relation with cluster volume as does the vul-correlation. Finally, candidates for the reporting of cluster magnitudes could be the median or minimum correlations. The median correlation increases with cluster size, but not by much. The minimum correlation decreases with cluster size, but again not by much.

All in all, I think the problem identified by Vul et al. is a genuine one. Two-stage correlation estimates are inflated when compared to the median correlation within the cluster -- but not by that much. One reason for this is the high threshold required to achieve significance in whole-brain analyses yield voxels that don't have much room to go up. In addition, the constiuent voxels of a cluster are already highly correlated, so that the "truncation of the noise distribution" referred to by Vul et al. may be less than would be expected among truly independent voxels. So, perhaps, in the end the vul-correlation isn't so much a voodoo correlation as it is a vehicle for voodoo fame.



Thursday, May 8, 2008

"The Neural Data is More Sensitive than the Behavioral Data"

Before I get back to the "Four Ages of Functional Neuroimaging" I'd like to take a brief detour and talk a little bit about a phrase -- or a slogan, perhaps -- that one hears more and more in the neurosciences, namely, that: "the neural data is more sensitive than the behavioral data".

I'll give a little context. A speaker has just presented some data, say, on the relationship between hippocampal volume and a genetic polymorphism, or the effect of some drug on dopaminergic activity in the striatum. Impressive bar graphs are displayed, with big effects and little error bars. There is no doubt that the the finding is Real, that such and such drug or such and such genetic polymorphism is having a measurable biological impact, and that it's interesting and worth studying, etc., etc.

Sometimes these biological data are presented along side lots of "scatterplots" showing that the effects are also correlated with some behavioral measure, say, working memory capacity, or performance on the Wisconsin Card Sorting Task. If you have a biological finding and a scatterplot showing a relation to behavior, then you're golden. Everybody in the room is happy, even the cranky behavioral psychologist in the back.

But what if the speaker just presents the biological measure without the scatterplot, without the link to behavior? This is usually fine, provided no claims are made about behavioral relevance. Sometimes it really isn't that important to link the two. One is just trying to get a handle on the relationship between two neural variables (say gene X, and hippocampal volume) and no strong claims are made about causal links to some behavioral state. Someone else will figure that out, later. Sometimes, however, the speaker wants to make these strong claims, even without the scatterplot. Of course, the speaker would have liked to show the audience a nice brain-behavior correlation, and he or she almost certainly collected some behavioral index, but as occurs in science from time to time, the correlation failed to reach significance. And, thus, no scatterplot.

The talk concludes, the speaker having argued forcefully for the importance of drug X, because of its effect on brain system Y. The speaker goes on to say that the drug allows subject to focus attention better and enhances working memory and general fluid cognition.

Hand goes up in the back -- it's the cranky behavioral psychologist. He has a kind of a gravelly voice and one has the distinct impression that he was asleep for most of the talk. Here's what he asks: "Did you measure any behavioral variables? Did administration of the drug have any effect on cognition, as measured by standard measures or memory, reaction time, etc?"

The speaker is ready for this. He is indeed smiling. He's been handling this question for years, and frankly, he's rather amused at the naivete of the questioner.

"Well", he or she says, "of course we had our subjects perform a whole battery of neuropsychological tests, cognitive tasks, and personality inventories, including the WCST, N-Back, Trails A, B, C, and D, the Simon task, the TPQ, the Sensation Seeking scale, the impulsivity scale, locus of control, etc. etc. but none of these measures were significantly correlated with our biological finding. Of course, this is no surprise, because as everybody knows the neural data is more sensitive than behavioral data." The cranky psychologist offers a slight grimace, but does not follow up with another question. Once again, the response worked its charm. After all, who is to argue? There was a big neural effect and no behavioral effect -- therefore, surely the neural data is indeed more sensitive than the behavioral data. Right?

But wait, one might ask what is the neural data more sensitive to?. That is surely an important question. Let's think. The neural data is more sensitive to neural differences (e.g. hippocampal volume) than the behavioral data is. That is true -- perfectly trivial but perfectly true. The converse is also -- trivially -- true: "The behavioral data is more sensitive to behavioral differences than the neural data is".

A more interesting statement would be as follows: "The neural data is more sensitive to behavioral differences than the behavioral data is". That would be a strong claim, but one that is rarely made. Instead, we get the stock "neural data is more sensitive than behavioral data" without any context or qualification. The problem is that this phrase, this slogan, this stock reply to to the crabby behavioral psychologist, is empty of content and specificity.

Just to drive the point home, what if I told you that a stethoscope is more sensitive to differences in heart rate then any behavioral measure. Would you be surprised? But what if I went on to say that my heart rate measurements, because they are so sensitive, indicate that subjects with a faster heart rate live longer. But wait, asks the old guy in the back of the room, where is your behavioral evidence for that assertion (e.g. measure of longevity)?. Don't need any, because biological measures are more sensitive than behavioral measures.

Monday, August 13, 2007

The Four Ages of Functional Neuroimaging: Part 3

Rather than treat “cognition” as a separate realm where functions are described and diagrammed on sheets of paper, functional neuroimaging seeks to eliminate the mind-brain barrier, to deny that venerable dichotomy, and to shift the terrain from the ether of psychological abstraction to the material folds of the brain. It does not matter in the long run that the fusiform gyrus might not truly act as a unitary and modular processor of faces that the name (“FFA”) implies; rather, what is important is that this particular assertion about face processing is committed to a neural state of affairs, which is open both to empirical support or falsification. It is a hypothesis which bears itself to all, declining to hide from the protective shade provided by the term “neural correlate”. The FFA is not the neural correlate of the face processor – it is the face processor, that is its function. Thus, one might say that the end of the Silver Age of neuroimaging was characterized by an increasing willingness, bolstered by an accumulation of empirical support, to propose hypotheses about brain function that treated “cognition” as a thing to be described in neural terms, howsoever simplistic and inchoate, and to be informed by data derived from cognitive psychology, neuroscience, and functional neuroimaging itself.

The use of functional neuroimaging techniques in the study of the biological basis of human cognition and behavior is now entering a Golden Age. The necessary, but often atheoretical, project of “mapping” hypothesized cognitive functions onto discrete pieces of cortex is coming to an end. Rather than stating hypotheses in terms of models of cognition and then in effect “searching” for brain correlates (or proxies of cognitive components) many researchers are now taking an integrated approach, where hypotheses about functional anatomy are stated a priori, and imaging results are taken as evidence for or against a stated hypothesis. One no longer asks where a function is located but rather whether an hypothesized functional-anatomical correspondence provides an accurate picture of biological reality. In the areas where functional neuroimaging is having its greatest impact, it has managed to engage the interest of cognitive psychology and neuroscience. For instance, in long-term memory research, behavioral neuroscience, traditional cognitive psychology, and cognitive neuroscience researchers are increasingly involved in a unified pursuit, a joint conversation, centering on the role of the medial temporal lobe in memory. For example, the idea from psychology of a dichotomy between recollection and familiarity in long-term memory is being studied at all levels: in rats with implanted electrodes, with behavioral measures in psychological laboratories, and with human neuroimaging studies. Whereas in prior years, neuroscience and psychology would be carrying out studies in isolation, each with its own idiosyncratic paradigms and nomenclature, the arrival of cognitive neuroscience and human neuroimaging, is increasingly providing the bridge between psychological and brain research, and contributing to the emergence of a unified approach to a particular problem domain.

To give a more specific example of how the field is advancing consider the hypothesis that the function of the inferior frontal gyrus is for the “selection of competing alternatives” in the context of word retrieval (Thompson Schill et al. 1997). This is a classic “Silver Age” function-structure proposition. It identifies a fairly large region of cortex with a particular function, without an overarching model of word retrieval or a specification contextual factors or neural interactions. On the other hand, this hypothesis makes a rather stark and forthright claim about the function of a particular brain region, and has led to a small industry on the role of the inferior frontal gyrus in semantic retrieval. Competing hypotheses have been adduced which have highlighted the distinction between “controlled retrieval” and “selection from competing alternatives”, and has led the development of paradigms specifically designed to arbitrate between these two notions. The neuroanatomical specificity of these hypotheses has also vastly improved with newer theories proposing functional subdivisions within the inferior frontal gyrus that has led to a more nuanced understanding of the function of the region in word retriaval. Interactions with posterior temporal cortex have also been explored and a links between ideas deriving from models of word retrieval such as that of Levelt, have prominently entered the discussion. Thus, what began as an assertion about the functional role of a brain region has led by degrees to a far more sophisticated appreciation of the role of ventrolateral frontal lobe structures in word retrieval.

One might ask what possible bearing does any of this have on cognitive psychology, traditionally conceived as the study of the function of the mind? The answer to this question is, to paraphrase Coltheart, “none”. Physics has nothing to say about meta-physics. Likewise, the discussion of the functions of brain regions and their interactions tells us nothing about the mind – so long as one insists that the mind exists in a realm apart from the mundane exertions of the brain. Thus, the ongoing debates about the hippocampus and the inferior frontal gyrus are not about “mapping” from mind to brain, as it used to be. Rather, structure and function are inseparably linked, and the common practice now is to examine these two facets of brain organization as an integrated whole, just as the mechanic considers the radiator as a thing that serves a particular purpose, without bothering with intermediary “car-minds” and “car-mind processes” and other such ornaments of functionalism.

Wednesday, June 27, 2007

The Four Ages of Functional Neuroimaging: Part 2

A brief look at the evolution of functional neuroimaging over the last 50 years may offer some insight as to why we are just now grappling with deepest philosophical issues surrounding functional neuroimaging and its relation to its sister disciplines. The history of functional neuroimaging can be roughly divided into to four ages or epochs, which we might call: the age of iron, the age of bronze, the age of silver, and the age that we are currently in, or at least on threshold of entering, the golden age. In the iron age, which lasted from approximately 1955 to 1975, scientists such as Seymour Kety, Lou Sokaloff, David Ingvar and others were the first to measure the brain cerebral blood flow while a subject engaged in what they called “mental activity”. Subjects were asked to perform mental calculations, read silently, read aloud, count backwards and forwards, and this mental effort was revealed in the metabolic changes that were being observed in the brain. The Iron Age established that human thought had metabolic consequences that could be measured and localized to regions in the brain. The sophistication of these early studies was entirely on the physiological side, and certainly at this point the technology was sufficiently crude and unwieldy that it was not viewed as appropriate for the examination of the brain’s information processing capabilities.

At about 1982, human neuroimaging entered the Bronze Age, which heralded both technological and methodological advances in the field. Positron emission tomography (PET) with the O-15 tracer combined with the logic of cognitive subtraction, opened up entirely new vistas in the potential for functional localization in the brain. Michael Posner and Marcus Raichle, moreover, showed that a collaboration between neuroscience and cognitive psychology was essential to studying the brain basis of cognition, and that the same tools and methods employed in experimental psychology – the reaction time subtraction logic of Donders and the additive factors methodology of Sternberg – were reinvented in the context of a this new multi-pixel dependent variable – brain activation. Attention, memory, language, perception, and mental imagery were all studied in the PET scanner, and new ideas relating these traditional concepts to activity in the brain were formed.

The Silver Age (approximately 1993-2000) brought with it an unprecedented expansion in neuroimaging research, vastly improved statistical methodology (an acronym, SPM or “statistical parametric mapping”) and with the emergence of fMRI, the field was no longer just for the tiny minority with access to an expensive PET scanner. Most would admit that it was during this period that the sheer number of “activation” studies, the proliferation of what has been derisively termed “blobology”, or “technicolor phrenology”, cast a certain pall over what was otherwise an extraordinary era of scientific advance. The problem was that many neuroimaging studies carried out during this era were of the “lets just do it and see what lights up” variety, while theory-driven research and hypotheses were not uncommonly eschewed. For any ad-hoc “psychological process” that one could invent, a researcher could be sure to find its “neural correlate” – in brilliant hues – somewhere in the cingulate gyrus, the insula or another numbered Brodmann area. Indeed, in the Silver Age of neuroimaging, there were no “failed studies”. It was probably during this period, however, that many cognitive psychologists and neuroscientists, watching interestedly from the sidelines, decided that functional neuroimaging was not worth the effort. Indeed, even from within the neuroimaging ranks, it was clear, as one toured the poster section at the annual Human Brain Mapping conference, that the only thing that outnumbered the colored blobs was the number explanations for them.

Every new scientific field or endeavor experiences growing pains. The period of the late 1990’s in neuroimaging was both a necessary and inevitable step in the evolution of the field at large. An analogy might be made between this process and a similar one that occurs in human development, for example. A child in infancy learns the relations between the movements of the oral articulators and the sounds that such movements produce through a process known as “babbling”, an auditory-motor tuning process that proceeds through a kind of random exercise of the speech muscles. It is perhaps not too much of a stretch to say that during the Silver Age of neuroimaging research, a similar kind of “tuning” process was occurring whereby certain systematic relations between experimental manipulations or contexts on the one hand, and regions of brain activation on the other hand, were being worked out. The accumulation of studies pointing to some systematic relationship between a “cognitive process” and a corresponding brain region, forges a link between a hypothesized function, on the one hand, and an anatomical location, on the other. As the number of studies pointing at a neural correlate of this or that cognitive function begin to mount, some brave and ambitious researcher decides to christen the anatomical area for its functional properties. Suddenly, the fusiform gyrus is not merely a bump on the ventral surface of the human brain but is the “fusiform face area”: structure and function are merged into a single moniker; or the anterior cingulate gyrus, it is no longer merely a name for a particular cerebral convolution, but has come to refer to function as well: “conflict resolution”. The renaming of parts of the brain to incorporate their specific function is triumph of the silver age of neuroimaging. It is in principle no different than the labeling of the back part of the occipital lobe as “visual cortex”, on the basis of neurophysiology and lesion work. Indeed, during the Silver Age many provisional labels were affixed to diverse structures of the brain, but only a very few of these “cognitropes”, if I may coin a phrase, managed to stick. But the ones that did stick are the labels that have mattered, that were reliable, and have been both the cornerstones and the targets of current hypotheses and theories in cognitive neuroscience.

Tuesday, June 26, 2007

The Four Ages of Functional Neuroimaging: Part 1

In the increasingly interdisciplinary world of the science of human behavior, a certain conversation can be heard again and again in the gathering places where the practitioners of neuroscience and cognitive psychology are occasionally found together. The neuroscientist, having patiently listened to a psychologist present his latest theoretical model – resplendent with the boxes and arrows constituting the “mental modules” of some particular piece of the cognitive system – shakes his head, wondering aloud what possible relevance these chalkboard chimera might have to someone who studies the live matter of the brain. The cognitive psychologist of a certain stripe takes a similar view of the neuroscientists’ efforts, which he or she maintains sheds very little light on the functional properties of the mind. Both camps argue for epistemological supremacy: cognitive psychology for the ghost in the machine, and neuroscience for the machine in the ghost. Standing somewhere in the middle, amid the crosstalk, straining to be heard above the din of argument, stands the cognitive neuroscientist, unsure of which camp he is addressing (or belongs to), but nevertheless confident that he holds the key, the answer to the debate. “We must study both”, he asserts. “We must study the ghost in the machine and machine in the ghost!”

Unfortunately, the cognitive neuroscientist is not unlike the spurned and neglected offspring of two parents that despise each other and consider their child a wastrel and, ultimately, a mistake. Indeed, it is only after the cognitive neuroscientist succeeds in gaining the attention of his audience (in this hypothetical gathering) that psychology and neuroscience turn towards their wayward child and nod their heads and point their fingers in disdainful unison. If there is one thing they can agree on, it is that cognitive neuroscience and its favored technological toy -- functional magnetic resonance imaging (fMRI) -- has nothing to offer them. By the same token, functional neuroimaging and cognitive neuroscience, generally, taking the remarkable success of the movement as a self-evident mark of its scientific worth and validity, has never made a particularly sustained or rigorous effort to make the case for why functional neuroimaging matters to neuroscience and cognitive psychology.

In the last year, however, this has begun to change. Advocates of functional neuroimaging in a number of review papers have laid out a formal case for the legitimacy and relevance of the field. Most of the recent discussion, particularly articles by Henson (2005), Poldrack (2006) and De Zubicaray (2006) has been aimed at cognitive psychology, and has argued for both the relevance and theoretical benefits that functional neuroimaging can bring to the field of cognitive psychology. In a recent issue of the journal Cortex, however, Coltheart (2006) and Page (2006) forcefully argued the position that, essentially, functional neuroimaging has so far contributed nothing to cognitive psychology. A number of examples suggesting how neuroimaging had indeed informed cognitive psychology were then proffered in response, but Coltheart remained unconvinced. It is the purpose of the present article, not to rehash the debate or contribute a technical innovation with regard to the inference in functional neuroimaging, but rather to provide a slightly different perspective on the debate, and to show that, perhaps, this recent flare up arises out of a basic disagreement (or confusion) over what such terms as ”mind”, “mental”, “cognitive process”, and “cognitive theory” really mean.

An essay entitled: The Four Ages of Functional Neuoimaging

I am going to inaugurate this blog with a essay in (four parts) entitled: "The Four Ages of Functional Neuroimaging". The essay is about, among other things, the evolution of the science of functional neuroimaging imaging, how neuroscience and cognitive pyschology view congitive neuroscience, and a bit about why functional neuroimaging is getting better all the time. I had considered sending this essay out for publication, but, really (as you will find out) the style is a bit much at times. Or perhaps not -- you be the judge.