Many of us who collected data from Mturk recently noticed an increase in duplicated GPS coordination, and we have learned from some researchers that the duplicates may produce unreliable responses. Some other scholars question whether the GPS method can actually reliable detect any unreliable responses. Others concern that the unreliable response I have seen (so far) in my data are outliers and they are wondering if the issue is pervasive. Here I propose a simple test with some specific predictions that can soon answer to these concerns.
As I have noted, the data where I initially found anomaly is still collecting data. I aimed to have a total sample size of 1500 for the study. The existing analyses are based on the first 1/3rd of my sample, as the remaining 2/3 is still being collected. One way to answer regarding the question whether GPS duplicates are pervasive and if the GPS method can detect unreliable responses is to apply my own procedure on the remains 2/3 of my data that I am collecting. I think we will have strong evidence that duplicated GPS is pervasive (as opposed to being outliers) and the GPS method can detect unreliable response if we see the following predictions come true on my remaining data that I will finish collecting. I welcome thoughts and comments on regarding these predictions that I will be able to test when my data collection is finished .
Testing the question of pervasiveness
So far my impression from the survey here is that the duplicates are becoming more and more common in matter of weeks. In my own existing data with 578 participants, I have noticed about half (!!!) are from duplicates. I would think half of contaminated data is pervasive. If I predict that in the remaining 1000 responses there are about 500 response (if not more) from duplicates, I think it is a strong evidence that the duplicates are rampant, at least in my own data.
Testing the question of data quality
I think if we see the following predictions are preponderantly supported, then we have strong evidence that data from duplicates are unreliable.
1. The reliability of known scales
I will test the reliability of the racial identification scale that I used for duplicates and non-duplicates separately based on what I saw in the 1/3 of my data here. Let me just predict that, in the remaining data that I am collecting, for duplicates, the scale will be in the range of 0 to .2 (i.e., close to .11), and for non-duplicates, the scale will be greater than .8 (i.e., close to .87).
I will also test the reliability of the symbolic threat scale, let me just predict that the reliability for duplicates will be under .2 (i.e., close to -.01.) and for non-duplicates, it will be greater than .7 (i.e., close to.81).
2. The distribution of measures with known/expected distribution
I asked my participants their feeling toward KKK and the Nazi party from 0=most unfavorable to 100=most favorable, and 50 is mid point. I predict that for duplicates, the mean responses for KKK and Nazi will be close to 60 and for non-duplicates, they will be close to 8.5.
3. Measures that are known/expected to be correlated
I predict that for non duplicates, the correlation between feeling toward liberal politicians with ideology and party identification are significant (p<.001) at the magnitude of about above .5 (i.e., at-.59 and r=-.57). For duplicates, I predict these relationships will be non-significant. For conservative politicians, the current data is a little more messy, although I have some speculation for why that I will try to confirm more latter. I think it likely has something to do with whether questions are coded in the same direction , but I am not making any predictions for conservative politicians for now.
4. Qualitative responses
Let me just predict that for duplicates “good“ of any kind will appear for about 25% (i.e., close to 26.6%) and “nice” of any kind will appear for about 20% (i.e., close to 21%). I predict that they will occur less than 3% for non-duplicates. I anticipate (though not predict) that there are a lot more “very” for duplicates. I also anticipate that most of the “good” or “nice” not detected by the GPS method will also give KKK or Nazi values close to 50 instead of below 10.