TY - JOUR AU - Saidov, Bobur AU - Barakhnin, Vladimir AU - Fayzullaeva, Zarnigor AU - Ibragimov, Umid AU - Tursunov, Ulugbek PY - 2026 TI - UzNER: A Human-Reviewed Benchmark for Uzbek Named Entity Recognition With Gazetteer-Augmented Transformer Models JF - Journal of Computer Science VL - 22 IS - 6 DO - 10.3844/jcssp.2026.1894.1911 UR - https://thescipub.com/abstract/jcssp.2026.1894.1911 AB - UzNER-100K is a large-scale human-reviewed benchmark for Uzbek named entity recognition with 100,000 training sentences, 18 fine-grained entity types and 200,083 entity mentions across 114,269 sentences in total. The corpus was constructed through an LLM-assisted, expert-reviewed annotation pipeline that achieved strong reliability on the main audit subset while substantially reducing corpus-construction effort. The benchmark includes a standard test split, a gold-audited subset and a hard subset designed to stress long, ambiguous and structurally complex cases. We evaluate 10 Uzbek NER systems spanning recurrent, monolingual Uzbek, multilingual transformer and hybrid architectures. The best model, XLM-R + Gazetteer + CRF, reaches 91.03 Micro-F1 on the standard test set, 89.67 on the gold-audited subset and 83.21 on the hard subset. Quality control included a dedicated inter-annotator agreement audit, achieving 91.3% span-level agreement, 93.7% entity-type agreement, and a Cohen’s Kappa of 0.914. In addition, a qualitative native-speaker assessment confirmed the linguistic naturalness of the model outputs while highlighting remaining challenges in legal, administrative, and event-related expressions.