Bias Toward Languages Studied in NLP Peer Reviews

First systematic study of language-of-study bias in NLP peer review, finding that non-English papers face bias rates roughly 40× higher than English-only ones.

Jul 2025 – Apr 2026 Koç University Paper: arXiv:2604.07119 Code & data: GGLAB-KU/LOBSTER
  • First systematic characterization of language-of-study (LoS) bias — when reviewers evaluate a paper based on the language(s) it studies rather than its scientific merit.
  • Released LOBSTER, a human-annotated dataset of 534 review segments labeled for negative bias, positive bias, or no bias.
  • Benchmarked four LLMs as bias detectors; the best (Gemini 3.1 Pro) reaches 87.37 macro F1 on 3-way classification.
  • Applied the detector to 15,645 reviews across six NLP venues (EMNLP 2023/24/25, ACL 2025, ARR 2024, COLING/NAACL 2025).
  • Identified four subcategories of negative bias; demanding unjustified cross-lingual generalization dominates (~62% of negative-bias cases).

Bias rates by paper language scope

Language scope Bias rate
English-only 0.37%
Single non-English (avg.) 14.79%
Chinese 10.50%
Specified multilingual 4.18%
Unspecified multilingual 0.34%
Language-agnostic 0.30%

Non-English papers face bias rates roughly 40× higher than English-only ones, and negative bias consistently outweighs positive bias across venues.