LLM Collusion and Identifiability in Code Generation
Investigates whether large language models can recognize or attribute code authored by themselves or other LLMs.
| May 2025 – Mar 2026 (discontinued) | Code: ebarkhordar/llm-collusion (archived) |
- Investigates whether large language models can recognize or attribute code authored by themselves or other LLMs.
- Implemented three attribution tasks: Self-Recognition, Target Identification, and Full Attribution.
- Evaluated across multiple benchmarks (MBPP, HumanEval, DS-1000) with code generated by GPT-5, Claude, Gemini, Grok, and DeepSeek.
- Built a scalable experimentation pipeline for multi-model code generation, attribution testing, and statistical analysis.
- Fine-tuned lightweight judge models to identify model-specific code signatures, highlighting risks of collusion and leakage in code evaluation.