TY  - JOUR
AU  - Saidov, Bobur 
AU  - Barakhnin, Vladimir 
AU  - Fayzullaeva, Zarnigor 
AU  - Ibragimov, Umid 
AU  - Tursunov, Ulugbek 
PY  - 2026
TI  - UzNER: A Human-Reviewed Benchmark for Uzbek Named Entity Recognition With Gazetteer-Augmented Transformer Models
JF  - Journal of Computer Science
VL  - 22
IS  - 6
DO  - 10.3844/jcssp.2026.1894.1911
UR  - https://thescipub.com/abstract/jcssp.2026.1894.1911
AB  - UzNER-100K is a large-scale human-reviewed benchmark for Uzbek named entity recognition with 100,000 training sentences, 18 fine-grained entity types and 200,083 entity mentions across 114,269 sentences in total. The corpus was constructed through an LLM-assisted, expert-reviewed annotation pipeline that achieved strong reliability on the main audit subset while substantially reducing corpus-construction effort. The benchmark includes a standard test split, a gold-audited subset and a hard subset designed to stress long, ambiguous and structurally complex cases. We evaluate 10 Uzbek NER systems spanning recurrent, monolingual Uzbek, multilingual transformer and hybrid architectures. The best model, XLM-R + Gazetteer + CRF, reaches 91.03 Micro-F1 on the standard test set, 89.67 on the gold-audited subset and 83.21 on the hard subset. Quality control included a dedicated inter-annotator agreement audit, achieving 91.3% span-level agreement, 93.7% entity-type agreement, and a Cohen&rsquo;s Kappa of 0.914. In addition, a qualitative native-speaker assessment confirmed the linguistic naturalness of the model outputs while highlighting remaining challenges in legal, administrative, and event-related expressions.