Independent research · Political Belief Lab

Max Hui Bai, Ph.D

ScientistEntrepreneurSpeaker

I study how social change — shifting populations, and now the rise of AI — reshapes what people believe and how they see one another. I also build things, including the Publish or Perish card game.

Curriculum Vitae → Google Scholar ↗

Featured in The Washington Post The New Yorker Nature WIRED

About

Max is the director of the Political Belief Lab, an independent research lab. He was consulted by organizations such as OpenAI and the White House, and has delivered talks at universities and conferences — including the 2025 keynote at ALPSP.

He was previously a scientist at Stanford's Polarization and Social Change Lab and Stanford Impact Labs. Alongside research, he runs several small businesses as an entrepreneur.

Doctorate: Ph.D, Social Psychology — University of Minnesota, Twin Cities
Previously: Stanford University — PASCL & Stanford Impact Labs
Training: Alumnus, Stanford Ignite
Advised: OpenAI · The White House
Field: Political belief · social perception · prejudice · demographic change

AI and social change

What does a changing society do to the mind?

Society is being remade by two forces at once — demographic change and the arrival of AI. I study how each reshapes what people believe and how they see one another. The work runs along three threads.

Line 01

The psychology of social change

Majorities attend to the decline of their own group more than the growth of others — activating a collective existential threat and political backlash.

Line 02

Belief over identity

We evaluate leaders by what they believe, not their race or gender. Belief can even bend how we visually perceive a person's race.

Line 03

AI & new methods

Treating AI as both a subject and a tool — from real-time, AI-generated persuasion to new methods for catching bad data and untangling confounded effects.

Browse findings →

Entrepreneurship · Flagship

The Publish or Perish Game

A satirical card game about the grind of academic life — the grants, the rejections, and Reviewer #2 of it all. Turned the absurdity of the tenure track into something you can play at a table.

Card gameKickstarterPress-covered

View on Kickstarter ↗

On stage

Keynote — ALPSP, 2025

On data quality, AI, and the future of the social and behavioral sciences. Delivered to the Association of Learned & Professional Society Publishers.

Watch the talk ↗

Research findings

The work, framed as questions.

Each study began with something I genuinely wanted to know. Open a question to read the abstract and reach the paper and materials.

Selected publications Peer-reviewed

2025Nature CommsCan AI persuade us as well as another person can?

Large language models can now write fluent arguments — so can they move our politics? Across three pre-registered experiments with diverse samples of Americans (total N ≈ 4,800), messages generated by openly available LLMs significantly shifted readers’ attitudes across a range of policies, including polarized ones like an assault-weapons ban, a carbon tax, and a paid parental-leave program. The AI-generated messages were about as persuasive as messages crafted by laypeople. The findings carry direct implications for misinformation, political persuasion, and the governance of generative AI.

Read in Nature Communications ↗

2024JPSPHow are Asian and Black people stigmatized differently?

How different racial minorities experience racism differently remains underexplored. Twelve studies spotlight a racial asymmetry in dehumanization using a wide array of methods (experimental, archival, and computational) and data sources (online samples, word embeddings, and U.S. Bureau of Labor Statistics data): whereas Black people are more often subjected to animalistic dehumanization, Asian people are predominantly subjected to mechanistic dehumanization. The asymmetry emerges from the vantage points of both victims and perpetrators, and across domains from everyday language to perceptions of romantic relationships, crime, and business skill — introducing a new way to understand and unify Asian stereotypes. (with Xian Zhao)

Read in JPSP ↗

2022PSPBHow do political values and identity relate to support for political candidates?

Past studies on how political value (ideology) and identity (party) predict support for candidates often consider only citizens or only candidates, introducing omitted-variable problems. This paper introduces the multiple-matching perspective, which considers how a citizen's ideology and identity are matched by a candidate's ideology and party. Four studies reveal that the effect of ideology match is large, robust, and consistent; that candidates' ideology matters more than their party except in the final stage of a presidential race; and that party can guide citizens to support a candidate based on the candidate's ideology (Republicans supporting conservatives) more than the reverse.

Paper & materials ↗

2022GPIRWhy does the growth of the Muslim population provoke diverging responses from Republicans and Democrats?

Five experiments show that Republicans and Democrats respond to Muslim population growth with divergent reactions across three domains: threat perception, celebratory reaction, and emotion. Republicans more often perceive the growth as a threat to Christians and to American culture, law, and peace; they show fewer celebratory reactions, with less hope and pride and more anxiety and anger. These divergences are partially explained by ideology and media exposure — not by racial mechanisms or religious identity. Political leaning can shape reactions to demographic change in ways that go beyond a dominant group's concern for status.

Paper & materials ↗

2021JPSPDoes racism and sexism really undermine Black and female politicians?

Using large nationally diverse or representative samples (total N = 44,836), this work shows that citizens' prejudice does not usually benefit or undermine politicians from a particular demographic group, as many studies assumed. Instead, prejudice predicts support for conservative politicians and opposition to liberal ones — regardless of the politician's demographics. Racism and sexism negatively predict support for liberals like Obama, Clinton, and Sanders, and positively predict support for conservatives like Trump, Carson, and Fiorina, irrespective of race and gender. The pattern is confirmed experimentally and historically (1972–2016): prejudice translates into political preference primarily through the politician's ideology, not their demographic background.

Paper & materials ↗

2021SPPSIn White identity politics, does a candidate's race matter more than their ideology?

White Americans' racial identity can predict their sociopolitical attitudes, but it has been unclear whether the effects of White identity politics turn on candidates' ideology or race. Four studies with White American samples consistently support the ideology hypothesis: White identity predicts support for conservative politicians and opposition to liberal ones because of their ideology. Evidence is limited for the racial hypothesis — that White identity predicts support for White over Black politicians because of race. The findings clarify who actually benefits from the growing influence of White identity politics.

Paper & materials ↗

2021LongitudinalWhat predicts change in a citizen's vote preference over time?

Much is known about what predicts vote preference, less about what predicts change in it. This paper focuses on judgments about the national economy in the recent past — sociotropic economic retrospections. Two longitudinal studies show that these retrospections (with partisanship, ideology, and incumbency) predict within-person changes in vote choice over time, and cross-lagged analyses suggest reciprocal effects. The link between economic retrospection and vote preference is more dynamic than past literature implies.

Paper & materials ↗

2021ExperimentalThou shalt not kill — unless it is not a human?

Research on moral dilemmas has examined personality and situational variables but relatively neglected the targets within the dilemmas. Four experiments manipulated the perceived dehumanization of targets. Dehumanized targets tended to make decisions easier and less emotional, and in some studies led to less deontological judgment. Though patterns were somewhat inconsistent across studies, overall the results suggest that a target's dehumanization can shape how people resolve moral dilemmas.

Paper & materials ↗

2020WorkingWhat type of people become far-right extremists?

Across five studies, White Americans who regard their racial background as very important to their self-concept report higher levels of far-right extremism. The effect is particularly strong among people who believe society should be more hierarchical — those higher in social dominance orientation.

2020WorkingHow do White Americans react to the population decline of Whites?

Two studies examine whether a perceived numerical decline of the White population translates into a perceived existential threat, prompting defensive reactions. Correlational data show that a collective existential threat explains the link between perceived White decline and defensive political reactions (racial bias and conservatism). A second study replicates this experimentally by manipulating perceptions of decline and growth — suggesting that perceived in-group numerical decline uniquely shifts racial and political attitudes via heightened collective existential threat.

2012Early workDo our names affect our preferences and behavior?

The name-letter effect refers to unconscious priming from one's own name. Study 1 found no correlation between students' last-name initials and when they took an exam; Study 2 replicated a relationship between last-name initial and GPA. Given methodological concerns and inconsistent results, the findings suggest the name initial is, at best, a very limited unconscious prime — if it operates at all.

Paper (PDF) ↗

Working papers & under review Preprints

Preprint10 studiesCan you look like what you believe?

Our beliefs about society do not determine our race or our phenotype — yet this is how others perceive us. Ten experiments introduce a belief-driven model: mind perception shapes identity perception, which shapes phenotype perception. People are more likely to identify a person as Black, and to see them as darker-skinned, when that person holds liberal rather than conservative beliefs. The first step is explained by perceivers' stereotypes; the second reflects biased visual perception, not biased memory. The theory generalizes to perceptions of sexual identity and replicates across three cultural contexts (U.S., Mexico, South Africa). Implications for racism in a supposedly "post-racial" society are discussed.

Preprint & materials ↗

Preprint5 studiesCan fake news we know is fake still change us?

Teaching people to recognize fake news as fake is a popular remedy — but this paper reveals its limit. In five experiments, participants thoroughly told they were about to read a made-up article still came to believe its content and shifted their political preferences or behavioral intentions. The effects resisted corrective efforts and persisted over time — observable two days, and again nine days, after exposure. The findings carry implications for misinformation research, media practice, polarization, and common research practices such as deception and debriefing.

Preprint & materials ↗

PreprintN = 3,346Which matters more — symbolic threat or status threat?

Past theory holds that we dislike out-groups that challenge our in-group's social status — but that effect may be confounded by symbolic threat, since status-threatening groups are also usually seen as holding different values. Six experiments clarify that our feelings about others are determined more by symbolic threat than status threat, the latter losing its effect once the former is accounted for. The pattern holds for evaluations of hypothetical immigrants, countries, third parties, and individuals: feelings toward a person turn on whether they are perceived to share the in-group's beliefs, not whether they actually belong to it.

Preprint & materials ↗

PreprintPre-registeredHow is implicit prejudice related to political preferences?

Explicit prejudice relates to explicit support for conservative and opposition to liberal politicians regardless of demographics — but how does prejudice relate to candidate evaluation implicitly? Four pre-registered experiments clarify that politicians' ideology, not race or gender, drives the association between prejudice and evaluation, whether prejudice is measured explicitly or implicitly. These preferences are driven primarily by citizens' preference for politicians who support inequality, and to a lesser extent the status quo.

Preprint & materials ↗

PreprintUK · USWho bought all the toilet paper during the pandemic?

Some people responded to the COVID-19 pandemic with panic buying — a poorly understood phenomenon. People who believe pandemic conspiracies may experience a heightened sense of powerlessness and vulnerability, and so may be especially prone to palliative, compensatory stockpiling. Supporting this, two studies (cross-sectional UK and longitudinal US) show that people who endorse COVID-19 conspiracy theories are more likely to stockpile, both in the past and in the future.

Preprint & materials ↗

PreprintN = 13,274How do race and gender jointly shape the experience of stigma?

Five studies test three accounts of how race and gender jointly contribute to stigmatization: additive (race → racial stigma; gender → gender stigma), projective (race → gender stigma; gender → racial stigma), and multiplicative (race × gender → stigma). The evidence consistently supports the additive approach and disconfirms the multiplicative one. For the projective approach, an asymmetry emerges — race shapes gender stigmatization, but gender does not shape racial stigmatization — clarifying theories of double jeopardy.

Preprint & materials ↗

Diversity statement

Equity, from lived experience.

The richest 100 Americans own more than $2 trillion in assets — about 10% of U.S. GDP in 2021. A photo of that group would show almost entirely White men, a scene strikingly like an 18th-century gathering of the British aristocracy. Meanwhile, compared with White Americans, Black Americans are only two-thirds as likely to graduate from college, almost twice as likely to be unemployed, and over five times as likely to be incarcerated.

Social disparities run along many other axes too — gender, sexual orientation and identity, religion, immigration status. As an immigrant, a person of color, a neuroatypical scholar, and a queer person, I know firsthand how diversity, equity, and inclusion shape lives like mine, and how urgent it is to address them. Informed by that experience, I promote DEI as a researcher, a teacher, and an advocate.

Practice 01As a researcher

My research engages directly with race, gender, and politics, contributing to DEI by advancing our understanding of prejudice, identifying the factors that hinder progress, and offering recommendations for organizations. For example, my recent work offers a theory that support for a leader is determined more by the leader's beliefs than by their race or gender (Bai, 2021, JPSP). Many prejudiced citizens support Black and female candidates whose platforms do not challenge the status quo, while opposing White male candidates whose policies would advance equality. Prejudice can thus be disguised as support for minority right-wing leaders ("I vouched for a Black guy, so how can I be racist?"). The implication: diversity initiatives should weigh not only the demographics of those hired, but whether they will advocate for women and people of color.

I am also committed to my field's inclusiveness in whose psychology we study. In one paper proposing a symmetry in racial dehumanization (Bai & Zhao, 2024, JPSP), we examine the question from the perspectives of Black and Asian participants, not only the majority's view. That commitment extends beyond the U.S.: in my work on the belief-driven model of social perception (Bai, unpublished), I collected data from the global south — Mexico and South Africa — which are rarely represented in social and behavioral science.

Finally, I work to diversify the scientific community itself. When proposing a conference symposium, I ensure women and scientists of color are included, and I form collaborations across backgrounds. To date, all of my mentees and research assistants have been women or students of color. As a researcher who studies diversity itself, I take its value to heart — and I look forward to helping lead my field toward becoming more diverse and inclusive.

Practice 02As a teacher

I will never forget my fourth-grade teacher's disparaging remarks during a bad episode of Tourette's in class. I remember my seventh-grade teacher telling me I did not belong in the city I lived in when she learned my family was Beipiao — migrant workers in Beijing. I did not come out as queer until college, because so many of my teachers had used slurs to describe queer people.

Teachers have a tremendous and lasting impact; what they say and how they shape a classroom profoundly affects how students see themselves. As a teacher, I am committed to an environment where diversity is celebrated, difference is valued, and everyone is included. I make that value explicit, adopt the latest strategies for DEI in the classroom, and build in flexibility and availability that goes a long way for marginalized students. I prioritize meeting with students, respond promptly, and am never "too busy." When a student falls behind, I offer one-on-one coaching to set Specific, Measurable, Achievable, Relevant, and Time-bound goals — providing extra support for those from diverse and marginalized backgrounds.

Practice 03As a social-justice advocate

As someone with several intersecting marginalized identities, I care deeply about social justice. I have volunteered at the Beijing LGBT Center and the Queer Student Cultural Center (QSCC) at the University of Minnesota, organizing events to build community and promote trust and positive images of queer people. As a group facilitator at the QSCC, I led discussions and supported individuals at various stages of coming out — including those still discovering or coming to terms with their sexual or gender identity.

I also advance social justice by communicating DEI-related science to the public. I have hosted workshops discussing research on diversity and racism with non-academic audiences, including seniors in nursing homes and leaders in private companies. Over the long term, I believe these efforts compound — creating more informed citizens who make progress toward DEI in everyday life.

In conclusion

I am committed to promoting diversity, equity, and inclusion as a researcher, a teacher, and a leader in my community — and I look forward to incorporating impactful DEI efforts into my work as a faculty member.

Blog · August 2018

The MTurk data-quality investigation.

In 2018 I noticed a wave of low-quality responses contaminating online survey data and organized an open, collaborative effort to measure it. These posts document the method and the evidence as it developed — along with the discussion they drew, preserved from the original site.

15 Aug 2018Proposed agenda and updates on the MTurk issue

Here is a proposed agenda for the current MTurk issue. The document also includes some updates on how many researchers have seen duplicates in their data so far. I invite comments and collaboration: z.umn.edu/MturkIssue

Discussion · preserved verbatim from the original site · spam removed

Jason9/17/2018 07:32:05 pm

Crowd-sourced work force platforms have always interested me. As I read about the alarm that you have raised for the potential about bad data on account of bots, I can't help but to wonder about what sort of financial impact that your posts have had on the income of workers on Amazon Mechanical Turk as some researchers opt to not post surveys on Amazon Mechanical Turk on account of the doubts that you have raised, but not sufficiently proven. Reading through the worker forums, many are commenting on the recent drop in work available following your posts. Several of the more seasoned workers are beginning to mention "lawsuit".

Have no doubt, I am certainly worried about surveys being flooded with bad data from bots of malicious workers.

However, in considering the fact that you raised the alarm before conducting a truly systematic study/survey of the data has been completed, and considering that many legitimate workers meticulously track statistics about work available and their own earnings from Amazon Mechanical Turk, are you potentially at risk for any sort of legal action on the behalf of these workers who can reasonably demonstrate a loss of wages on account of your public blog posts?

If these surveys were unpaid and completed strictly by volunteers, then I wouldn't be concerned. However, since your postings could potentially have an impact on the wages of thousands of workers shouldn't that require you to be more thorough and cautious about the validity of your claims before they are made public?

Helen9/20/2018 11:13:12 am

THANK YOU! YES! [APPLAUSE]

Sally9/20/2018 11:24:39 am

I too think that before you published or stated your opinion that you should of had some kind of proof. There are many honest hard working people that put in their time on Mturk to assist requesters with their surveys and most of it for underpaid wages. I have also looked at the forums and see the struggle they are having and the reports of less well paying surveys and other work since your article. For many of the workers this is their main source of income. I encourage any requesters, students, etc. that before making a decision not to use mturk that they themselves go to a couple of the different forums and get to know the people that work on Mturk and not to just take Jason's unproven opinion on bots.

13 Aug 2018A prospective test and some predictions regarding MTurk data contamination

Many of us who collected data from MTurk recently noticed an increase in duplicated GPS coordinates, and we have learned that the duplicates may produce unreliable responses. Some scholars question whether the GPS method can reliably detect unreliable responses; others wonder whether the unreliable responses I have seen so far are outliers, and whether the issue is pervasive. Here I propose a simple test with specific predictions that can soon answer these concerns.

The dataset where I initially found the anomaly is still collecting data (target N = 1,500). The existing analyses are based on the first third of the sample. One way to address whether GPS duplicates are pervasive — and whether the GPS method detects unreliable responses — is to apply my own procedure to the remaining two-thirds as it comes in. I welcome thoughts and comments on these predictions.

Testing pervasiveness. My impression is that duplicates have become more common over a matter of weeks. In my existing data with 578 participants, about half are duplicates. If I predict that the remaining 1,000 responses contain about 500 (if not more) from duplicates, that would be strong evidence the duplicates are rampant — at least in my own data.

Testing data quality. If the following predictions are preponderantly supported, we have strong evidence that responses from duplicates are unreliable:

Reliability of known scales. For duplicates, the racial-identification scale should fall around .11; for non-duplicates, above .8 (near .87). The symbolic-threat scale should fall under .2 for duplicates (near −.01) and above .7 for non-duplicates (near .81).
Distribution of measures with known distributions. On feelings toward the KKK and the Nazi party (0–100, 50 = midpoint), I predict duplicates average near 60 and non-duplicates near 8.5.
Measures known to correlate. For non-duplicates, feelings toward liberal politicians should correlate with ideology and party ID near −.59 and −.57 (p < .001); for duplicates, non-significant. Conservative politicians are messier and I make no prediction for them yet.
Qualitative responses. For duplicates, "good" should appear about 25% of the time (near 26.6%) and "nice" about 20% (near 21%); for non-duplicates, under 3%.

10 Aug 2018A proposed procedure for testing the evidentiary value of responses from duplicated GPS sources

Many of us have found a notable amount of duplicated GPS (not IP) in our recent MTurk data. Circumstantial evidence suggests these responses are random and unreliable. The following procedure tests the evidentiary value of responses from duplicates and determines whether they are truly random. I invite open comments; I will revise based on feedback and distribute the survey the following Friday.

The four proposed tests:

Reliability of known scales. Pick a well-validated scale with known reliability — ideally one with 50% reverse-coded items — and report reliability for duplicates and non-duplicates separately.
Distribution of known measures. Pick a measure with a known or expected skewed distribution (the more skewed the better) and report it for duplicates and non-duplicates separately.
Relationships between variables known to correlate. Pick two strongly correlated measures, ideally both with 50% reverse-coded items and on different survey pages.
Frequency of suspicious key words. Duplicates tend to enter "good," "nice," and "very" regardless of what is asked. Count how many come from duplicates versus non-duplicates, focusing on "good" and "nice."

For tests 1–3, I also plan to analyze a random subset of non-duplicates of comparable N, so non-duplicates gain no unfair advantage from higher sample size.

Conclusion. We would have strong evidence that GPS-duplicate responses have limited evidentiary value if a preponderance of analyses show a large discrepancy between duplicates and non-duplicates on the reliability of known scales, the distribution of known measures, the correlation between variables known to correlate, and the frequency of suspicious key words.

Footnote. Among 282 responses from duplicates in my still-collecting survey, "good" appears 75 times (26.6%), "nice" 59 times (21%), and "very" 19 times (6.7%). Among 296 non-duplicate responses, "good" shows up only twice, "nice" once, and "very" not at all — and those few appear to be random responses the GPS method missed, with KKK/Nazi ratings near 50 rather than the typical 8–9.

08 Aug 2018Evidence that a large amount of low-quality MTurk responses can be detected with repeated GPS coordinates

Suggested citation — Bai, H. (2018). Evidence that a large amount of low-quality responses on MTurk can be detected with repeated GPS coordinates. https://www.maxhuibai.com/blog/evidence-that-responses-from-repeating-gps-are-random

Background. In the past day or two, I discovered that a large number of responses in my latest MTurk survey appear to be random — and that a large portion of them have repeating GPS locations. A relatively large number of social psychologists have also noticed a drop in MTurk data quality and detected concerning patterns with the GPS method.

What is being done. I am organizing an effort to determine the scale and nature of the problem, and I need help from as many labs as possible. One quick check for contamination: search your data for the latitude fragment 88639831 (the final digit may differ due to rounding). This location appeared across multiple surveys; if it is in your data, examine how many participants share repeating GPS coordinates and consider analyzing or excluding them separately. Responses with this — and any other repeating — GPS location appear to be random.

Scale of impact. So far at least about 90 studies from around the world (mostly North America and Europe) seem affected, with contamination dating back to as early as March 2018.

Evidence that repeaters are random. In a survey still collecting data (N = 578), 282 (48.8%) have duplicated GPS. Analyzed separately:

Reliability of known scales. The racial-identification scale's reliability is .87 for non-repeaters and .11 for repeaters; the symbolic-threat scale is .81 versus −.01. Repeaters appear to straight-line their responses regardless of reverse-coding.
Known distributions. Feelings toward the KKK and the Nazi party (0–100) average 8.90 and 8.21 for non-repeaters, but 60.82 and 60.02 for repeaters — more consistent with random responding than genuine affinity.
Variables known to correlate. Feelings toward liberal politicians correlate with ideology and party ID at −.59 and −.57 for non-repeaters (both p < .001), but are non-significant for repeaters. For conservative politicians, the non-repeater correlations are .53 and .48; for repeaters, ideology holds but party ID does not.

The predictive power of known variables does not hold up for repeaters as it does for non-repeaters — evidence that repeaters, whatever they are, are not giving meaningful responses.

Measures included a four-item racial-identification scale, a three-item symbolic-threat scale, a single-item political-ideology measure (1 = very liberal to 7 = very conservative), and a single-item party-identification measure (1 = strong Democrat to 7 = strong Republican). Hypotheses, analysis plan, and data sharing were pre-registered.

Discussion · preserved verbatim from the original site · spam removed

Young-Jae Cha8/9/2018 01:53:01 am

Dear Max,

I appreciate your thoughtful notice about the possibility of data contamination. In my recent dataset (01/08), I found a data with "xx.88639831xxx" (I erased real numbers by using x intentionally). In this case, "88639831" is in the middle of consecutive numbers. Do you see this is the sign of a response from the bot?

Max8/9/2018 06:54:03 am

I would concern about it. Do their responses tend to look random and give non sensible answers to open ended responses (e.g, "GOOD" "very nice!")? Also, if you find that established scales do not work well on them, I would worry

Nick8/9/2018 08:09:37 am

Note that repeated IP addresses can be caused by odd defaults, e.g., take this story: http://theweek.com/articles/624040/how-internet-mapping-glitch-turned-kansas-farm-into-digital-hell

Max8/9/2018 11:01:28 am

I think that is a possibility, but in the current issue, it seems that data from repeating GPS usually are far more random than non-repeaters, so I think that is a concern. GPS itself, I think, is just an artifact that happened to be helpful to identify likely problem responses...

Sam8/9/2018 09:22:29 am

In some of the repeating coordinates I've found in my most recent dataset it seems like whoever has done this is also spoofing these coordinates. For example, I have 4 respondents whose coordinates all lead to the middle of the Cheney reservoir in Kansas. Others I've found lead to places in the middle of nowhere in Venezuela, old warehouses, and other places that seem unlikely (or entirely impossible) to have any computers at all, much less internet access.

Max8/9/2018 11:06:51 am

I agree! IP is not the most reliable way to easily identify them yet, although some people said that it can be done

John burger8/9/2018 02:10:59 pm

geo-ip is not nearly as reliable as you might think.

Coordinates in the middle of bodies of water is exactly what you should expect if the IP-to-geo map has low precision. Max Mind started doIng that after the article came out that someone else linked above:
http://theweek.com/articles/624040/how-internet-mapping-glitch-turned-kansas-farm-into-digital-hell

Elizabeth8/10/2018 09:34:08 am

This was helpful. From the initial email, I began to check my data. Became suspicious when I saw ~10 respondents from the Cheney Reservoir noted above in each survey. However, the article John noted and lack of other suspicious patterns in the data cleared concerns.

Max8/12/2018 03:37:17 pm

I think the duplicated GPS is just a trace that is left by whoever is doing this. It is certainly possible that they are from real people, but their data appear to be very different at least in my own data and this scholar's http://timryan.web.unc.edu/2018/08/12/data-contamination-on-mturk/. I proposed this procedure to more systematically test the evidentiary value of data from duplicates. I welcome any thoughts any comments
https://www.maxhuibai.com/blog/a-proposed-procedure-for-testing-the-evidentiary-value-of-responses-from-duplicated-gps-sources-comments-invited

Phelim (Prolific)8/9/2018 10:28:01 am

Just posting in case there are concerns about this being an issue beyond Mturk. We haven't seen any evidence of similar bot-like accounts on Prolific (I'm Prolific's CTO). We have several processes in place to prevent these types of accounts. Including but not limited to the following:

1) Every account needs a unique non-VOIP phone number to verify.
2) We restrict signups based on IP and ISP (e.g. we allow common residential ISPs but block low-trustworthy IP/ISPS)
3) We limit to the number of accounts that can use the same IP address and the machine to prevent duplicate accounts.
4) We limit the number of unique IPs per "HIT" (study).
5) PayPal and Circle accounts for getting paid must be unique to a participant account. This means that in order to have 2 participant accounts that get paid, you would also need to have 2 PayPal accounts. PayPal and Circle also have steps to prevent duplicate accounts.
6) We take any data-quality reports very seriously and whenever researchers have suspicions about accounts they can report the relevant participant IDs to us we investigate the individual accounts as well as any shared patterns between them.
7) We analyse our internal data to monitor for unusual usage patterns.

If any Prolific users have concerns about bots, data-quality, or any other questions feel free to get in touch. We take these issues very seriously and do everything we can to make sure any data collected on our platform is trustworthy and reliable.

Max8/9/2018 11:09:09 am

Hi Phelim, thank you very much for your message! It is really reassuring to know that Prolific is taking it very seriously. Do you happen to have any data that you have access to and see if there is any repeating GPS (not IP)?

Phelim (Prolific)8/10/2018 03:11:53 am

We don't record GPS data I'm afraid, although we'll look in to recording this and working with any researchers who record this using our participants.

J8/9/2018 03:28:51 pm

I'm curious whether the following questions below have been discussed:

For the studies being affected, what are the MTurk presets (e.g. location, previous hits, rejection rate, payment rate)?

Are people finding that Turkprime is not detecting duplicate IP addresses or confirming locations accurately?

T8/9/2018 05:57:24 pm

I'm also curious about the HIT presets. Without appropriate presets, low quality workers are to be expected.

Brad Turner8/10/2018 08:37:43 am

Yes, I'd appreciate clarity on qualifications used as well. Also, I have to assume the supposed bots or their operators can take unique completion codes generated at the end of the Qualtrics survey and paste them back into MTurk. Can you confirm?

Kristin Broussard8/10/2018 08:50:32 am

I checked one of my recent data sets that was collected through Turk Prime today and found a huge number of repeat GPS coordinates (including 44 with .88639831 that passed 3 attention checks in the survey).

Brad Turner8/10/2018 10:23:53 am

Hi Kristin, is that data from repeat GPS coordinates good or bad? This may be a red herring. Qualtrics estimates GPS data to the city level except for GPS-enabled mobile devices. Max shows likelihood that some GPS-duplicators are probably bots/scripting users, but it's not necessarily a concern.
https://www.qualtrics.com/support/survey-platform/data-and-analysis-module/data/download-data/understanding-your-dataset/

Kristin8/15/2018 04:19:33 am

Hi Brad,
Yes, the data is bad. The GPS is just a marker that's useful for pulling cases with bad data. What seems to be the real giveaway are the responses to open-ended questions (e.g., nonsensical responses, "good," "very," "nice,").

Also, as Max suggested, the reliabilities seem to be lower for theses flagged potential fraudsters/bots (although not necessarily low in an absolute sense), and, as noted by Tim Ryan, there are a high number that input "30" as their age on a type-in question and most chose "male" for their gender.

I do want to also note that I collected data for 4 different projects on mTurk and TurkPrime this summer and only one data set seems to be highly affected, even after planned exclusions for failing attention checks, etc. One data set seems completely unaffected (after cleaning for attention checks) and one other only had about 30 suspicious cases that passed attention checks.

Cori Faklaris8/9/2018 06:04:14 pm

Saw your post on Twitter and it rang a bell for me. I have also seen duplicate IP addresses in Mturk data, plus they are all ridiculously specific in decimal place - but that might be an issue with how they are assigned or recorded. I decided to look at the pattern of the specific responses as you did - unlike in your case though, my open ended responses weren't suspiciously out of context and the Likert responses didn't deviate noticeably from the aggregate. So I don't think our cases are similar issues. I wonder if you were targeted by a script due to the subject matter?

ctr8/10/2018 10:00:12 am

I have some open ended qualitative questions requiring an answer. Although the 88639831 data looks like outright junk, possibly made by a bot or more likely a human with poor english skills and low attention, the other repeating GPS coordinates appear to be mostly good data.

Mark8/10/2018 11:12:21 am

Per this very excellent article from Yale Law, you should be using code generation to match data received to payment requests on MTurk.
https://library.law.yale.edu/news/administering-qualtrics-surveys-mechanical-turk

Delete any other responses, which are far more likely to be bots.

Once matched, every data point corresponds to a *specific* MTurk user ID. To set up an account to get that ID the user must provide SSN or other tax ID information to be uniquely identified.

While GPS or IP might be easy to spoof - tax ID info is not. That means data issues reduce from a bot providing many bogus responses to individual users not really trying. Those users *can* come in groups (e.g., a community of people from a large college population just trying to get some extra cash) that share IP or GPS identifiers.

When we've used MTurk presets of 95% approval, 500 or more HITs completed, and usually for language purposes US only geographic region, we've found such data problems to be less than 2%.

Reject those workers.

If bots, you will never hear from them again - it costs more in time to follow up with you than they'll earn from the survey. And you'll help tarnish their MTurk approval rate to filter them out of future studies.

If they are real people they will likely email you after rejection complaining about the negative effect on their account and lack of payment. You'll very quickly be able to tell that they are not bots.

David Rand8/10/2018 11:21:15 am

We have not experienced this problem. I believe it's because we begin our HITs by having the workers transcribe a paragraph of handwritten text (essentially, complete a captcha). This is a trivial approach to screen out bots. I recommend it!

T8/11/2018 04:39:52 am

TurkPrime just released a report on the issue, analyzing 100,000 MTurk studies.

"In the last 24 hours, we have worked to determine whether there is a growing problem of multiple submissions from the same geolocation. In reviewing over 100,000 studies that have been launched on TurkPrime, we see that the rate of submissions from duplicate geolocations typically bounced from less than 1% to 2.5% within a study."

Over 97% of all studies had 2.5% or fewer duplicates, and they acknowledge duplicates "could be explained by people submitting surveys from the same building, office, internet service provider, or even the same city"

http://blog.turkprime.com/2018/08/concerns-about-bots-on-mechanical-turk.html

billy8/21/2018 11:20:14 am

Typical knee jerk reaction. All they can do is block duplicate gps, this does nothing but exclude hundreds of legitimate participants for no reason other than wanting increased privacy.

Katherine H.8/11/2018 07:38:57 am

I looked through some data I collected in February-March and found a few repeating GPS coordinates from the Cheney reservoir in Kansas. From what I can tell, they have the same IP addresses and different MTurk IDs.

Side note: My university IRB asks that I check the box to anonymize data collected on MTurk -- when I remember to do this, it means I don't collect IP address or GPS, making it more difficult for researchers to use this method to detect bots.

Julien (Lucid)8/14/2018 03:34:54 pm

Thanks for posting. I work on human and bot fraud detection at Lucid, and may have some pointers to avoid this.

First, IP address is a rather poor bot detection mechanism. IP addresses are very simple to spoof, and you can always use a VPN to mask your true origin. Using it as your only deduplication mechanism is just playing whack-a-mole. For the record, MaxMind outputs the IP as the "middle of nowhere" in Kansas when the incoming IP is absent or unreadable, so you can assume those invalid. Second, open text input is easily scripted by bots, so it's usually an easy tell.

We do a couple things to address these problems: no duplicate IP address, cookie, or participant ID can enter a same survey twice. We consider this table stakes for online research, so we offer this free of charge. Additionally, to the issue at hand, we use tools to detect gibberish, ungrammatical inputs, or even duplicate answers between participants. While researchers tend to avoid open-ended questions, they have proven a very good fraud detection mechanism.

We're happy to advise further on best security practices in online research.

www.luc.id

John Burger8/18/2018 09:56:03 am

MaxMind also uses the middle of Kansas when all it can tell about the ISP is that it's in the US. This will happen increasingly often with GDPR and other privacy regulations.

Sean8/17/2018 02:53:22 pm

My co-authors and I have just posted the following working paper to SSRN that investigates the root cause of this issue. Importantly, we find no evidence of bots.
https://ssrn.com/abstract=3233954

Alex8/28/2018 06:57:33 am

Hello,

While this seems like a good method to track scammers, it severely disadvantages those who use MTurk from home with their partners. Based on this study, my partner and I have both received rejections from a requested who utilized this resource.
Sincerely,
Alex

Wade9/20/2018 02:16:06 pm

Please report these worker IDs to Amazon that the problem can be corrected.

I would suggest using a 99% approval rate of at least 5000 HITs if you want quality results.

unanimous7/23/2019 12:41:35 pm

You know what?, this study is ridiculous, because of this, many mturkers who are legit and taking time finishing the survey being punished. That example of the nazi crap is just an opinion. How can you justify something from the opinion? If the question is right or wrong answer then justify it using that method.

That question is just like believing trump or not and justifying your answer using that method because you don't believe or believe on Trump. that's ridiculous! That's why it is called OPINION!!!

Application materials

Paying it forward.

Like many others, I have written many applications on my academic journey, and I was lucky to receive helpful comments and example materials from people who walked the path before me. To return the favor, I am posting materials I have used — the ones that were accepted, since they are likely more useful than the ones that weren't.

Job applications Postdoctoral position — Stanford University The application materials submitted for my postdoctoral scientist position at Stanford. Open PDF ↗

Graduate school Ph.D applications Personal statements and supporting materials from my graduate-school applications. Coming soon

Grants Funded grant proposals Selected proposals that were awarded funding. Coming soon

A note

If you are early on this path and a particular kind of example would help, reach out — I am glad to share what I can.

Code & tools

Small things for R.

Utilities I built to make my own analyses faster, shared in case they save you time too.

regtable

Regression tables from lm objects

Generates clean, formatted regression tables directly from your fitted lm models.

Documentation → cor.csv()

Correlation tables to Excel

Builds a correlation table from a data frame and writes it to Excel — with means and SDs, rounded to .00, and significance markings (*** < .001, ** < .01, * < .05, + < .10).

Download R script →