cat ~/research/index.md
research
Papers that came out of questions I keep returning to — around reliability, evaluation, and how to make AI systems easier to inspect and reason about.
- PAPERS
- 20
- YEAR
- 2026
- RELATED
- → /projects
════════════════════════════════════════════════════════════════════════════
№ 01
2026
State-level, token-attribution, and deletion-validation analysis of XLM-R and NLLB-200 encoder internals against multilingual naturalistic fMRI (LPPC, English/French/Chinese) isolates attention-side rather than FFN-semantic computations behind SHARED-greater-than-SPECIFIC alignment, including extension into auditory cortex.
// operational relevance
Shows which internal transformer computations actually drive shared cross-lingual alignment, giving multilingual model developers a mechanistic target instead of a single scalar benchmark.
authors: Ali Uyar
multilingual-nlpbrain-alignmentmechanistic-analysis
№ 02
2026
ROI-first encoding analysis on the Le Petit Prince multilingual fMRI corpus decomposes XLM-R and NLLB-200 encoder features into leave-target-out SHARED and orthogonalized SPECIFIC components, with SHARED-greater-than-SPECIFIC holding semantically across all six model-by-language tests under Holm correction.
// operational relevance
Provides a reproducible subspace decomposition that teams can reuse to audit whether multilingual encoders carry cross-lingual content structure versus language-specific residuals.
authors: Ali Uyar
multilingual-nlpfmri-encodingrepresentation-analysis
№ 03
2026
Semantics-fixed protocol that separates operational output contracts (strict JSON, exact quotation) from shared QA semantics via dense-delta residualization on top of a frozen model, recovering deployable single-overlay adapters that match direct-trained behavior on schema validity and quote exactness without degrading answer F1.
// operational relevance
Lets teams ship and audit reusable output-contract adapters as residuals on a frozen base instead of re-training whole behaviors from scratch.
authors: Ali Uyar
frozen-modelsadapter-residualsauditable-evaluation
№ 04
2026
Provenance-tracked runtime that revokes persisted descendants of suspicious roots and replays only dirty state-writing events, with coarse-rollback fallback; in a frozen 8-chain matrix and a live qwen2.5:14b confirmation it is the only method that drives residual explicit attack success to zero while retaining benign state, cutting post-detection cost from 17 to 9 extra LLM calls versus rollback on retrieval memory.
// operational relevance
Gives operators a surgical post-compromise recovery primitive for poisoned agent memory that removes attacker influence without wiping useful carry-forward state.
authors: Ali Uyar
prompt-injection-recoveryagent-memoryprovenance
№ 05
2026
Locked within-family Gemma testbed for transplanting single-turn function-calling behavior via fixed sparse, dense, and steering modules under frozen manifests and exact JSON grading. Sparse transfer is real and partially localized, but a parameter-matched dense module wins the multiseed primary metric (0.2177 vs 0.1658 strict success).
// operational relevance
Gives teams a deterministic, claim-audited harness for measuring post-training capability edits without LLM judges or retraining recipients.
authors: Ali Uyar
function-callingdeterministic-evalmodel-editing
№ 06
2026
Paired answer-phase sparse surrogate over two residual-stream layers of Gemma 3 4B PT/IT cuts answer-token KL by 28.3% and recovers 0.186 of the PT-to-IT capability gap, while frozen 5-feature capability masks collapse to the null and verbosity-subtraction masks lengthen outputs instead of shortening them. Includes matched baselines, ablations, and prompt-level paired bootstrap intervals.
// operational relevance
Sets a held-out, bootstrap-bounded standard for separating capability from verbosity in instruction-tuning interventions before they ship.
authors: Ali Uyar
instruction-tuningsparse-interventionsheld-out-eval
№ 07
2026
Offline future-token probes across transformer depth initialize a sparse top-m layer router for multi-token prediction on a frozen backbone, beating the strongest screening-selected dense finalist on held-out top-1 and speculative-draft acceptance under a stage-gated protocol with mandatory probes and a final 1B rerun.
// operational relevance
Turns a cheap one-time probe pass into a routing prior that improves speculative-draft acceptance on a frozen backbone without unfreezing or growing the adaptation recipe.
authors: Ali Uyar
frozen-backbonemulti-token-predictionspeculative-decoding
№ 08
2026
Introduces criterion fidelity as a distinct validity axis for LLM judges using criterion families (base, meaning-preserving paraphrases, counterfactuals) that dissociate from ordinary accuracy, with strict criterion-emphasis prompting shown to be an unreliable fix.
// operational relevance
Gives evaluation teams a concrete probe to detect when judge pipelines are tracking surface wording instead of the stated criterion before shipping them as reviewers.
authors: Ali Uyar
llm-evaluationjudge-validityreliability
№ 09
2026
Linear probes on mid-reasoning hidden states in fixed-question, changed-context multi-hop QA recover a binary evidence-sufficiency signal distinct from final-answer correctness and pre-generation query uncertainty, supported by title-masked robustness with mixed cross-model replication.
// operational relevance
Gives retrieval and agent systems an internal signal for when a reasoning model actually has enough evidence to answer, separate from whether it happens to answer correctly.
authors: Ali Uyar
interpretabilitymulti-hop-qahidden-state-probes
№ 10
2026
Locked, auditable pipeline analyzing 5,332 same-gene ORF/CRISPR Cell Painting pairs under fixed preprocessing, bootstrap consensus, and prespecified robustness checks. Regime map is dominated by asymmetric and ambiguous states, shifts with effect-magnitude deciles, and yields null retrieval utility despite 536 significant Reactome terms.
// operational relevance
Demonstrates an auditable locked-pipeline protocol that bounds what morphology-derived perturbation claims can be defended when analytic flexibility is removed.
authors: Ali Uyar
locked-pipelinerobustnessnegative-result
№ 11
2026
Audit that separates outcome readout from process verification for hidden-state verifiers by holding the extractor family and scorer class fixed and varying only the supervision target across arithmetic DAG execution and grounded Horn-style logic. Strict same-step flips and PVS show LocalValidity supervision improves process-verification evidence while first-invalid-step localization remains metric-dependent and transfer-sensitive.
// operational relevance
Gives evaluation teams a measurement-first audit object for step-level verifiers so trace-level outcome readout cannot be mistaken for reliable process verification in production reasoning pipelines.
authors: Ali Uyar
hidden-state-verifiersprocess-verificationcounterfactual-audit
№ 12
2026
ODCV-Bench study on Qwen2.5-7B-Instruct that localizes the earliest severity-3 executed step, rewrites it into a minimal safe counterfactual, and trains LoRA-DPO and chosen-only SFT adapters on next-turn pairs. Held-out evaluation exposes a capability-safety bottleneck where sparse unsafe train support limits what localized preference post-training can learn.
// operational relevance
Gives agent-safety teams a disciplined held-out evaluation pipeline and a concrete warning: preference-based safety post-training can be upstream-bottlenecked by the base model's own unsafe behavioral horizon during on-policy data generation.
authors: Ali Uyar
agent-safetypreference-optimizationheld-out-evaluation
№ 13
2026
Full-scope evaluation of a locked control-calibrated acceptance rule for top-down causal abstractions across a planted symbolic generator, a miniature IOI transformer, and GPT-2-small IOI. Across primary and quantized robustness codebooks, no supported abstraction class is certified, with structured failure modes in eligibility geometry and null-gap retention.
// operational relevance
Gives interpretability teams an audited negative-result template and concrete reporting rules (frontier-domain exclusions, configured-versus-realized coverage) before claiming a model implements a proposed algorithm.
authors: Ali Uyar
mechanistic-interpretabilityacceptance-criterianull-calibration
№ 14
2026
Deterministic KV-cache corruption protocols and distillation-driven repair operators with seed/holdout replication and bootstrap confidence intervals.
// operational relevance
Improves serving stability by making KV-cache failures reproducible, diagnosable, and repairable.
authors: Ali Uyar
kv-cachedistillationrobustness
№ 15
2026
Train-certify-verify pipeline with counterexample-guided constrained repair and reproducibility gates for fail-closed evidence in production sign-off.
// operational relevance
Turns model repair into a verifiable workflow teams can trust during production sign-off.
authors: Ali Uyar
llm-reliabilityreproducibilitycertification
№ 16
2026
Claim auditing with non-identifiability certificates to provide verifiable interpretability evidence and reduce overclaiming.
// operational relevance
Reduces interpretability overclaims by requiring evidence that can be independently checked.
authors: Ali Uyar
interpretabilityauditingsafety
№ 17
2026
Compute-matched discover-then-distill pipeline with ablation suite and retention-transfer diagnostics for clear separation of adaptation and consolidation effects.
// operational relevance
Clarifies how to separate short-term adaptation from durable model improvement in production tuning loops.
authors: Ali Uyar
test-time-adaptationdistillationevaluation
№ 18
2026
Rate-matched diagnostics with strong controls and deterministic artifact packaging to detect instability under gain and scaling changes.
// operational relevance
Helps teams detect brittle model behavior before it becomes a production incident.
authors: Ali Uyar
mechanistic-analysisllmstability
№ 19
2026
Deterministic audits for semantics-preserving formatting effects in LLM QA with no-truncation and semantics gates.
// operational relevance
Provides auditable reliability evidence when prompt formatting changes could alter outcomes.
authors: Ali Uyar
tokenizationevaluationauditability
№ 20
2026
Technical report documenting the CIS reliability approach, evaluation criteria, and implementation notes for production-grade AI systems.
// operational relevance
Translates deterministic reliability design into a practical report teams can adopt in real delivery environments.
authors: Ali Uyar
technical-reportreliabilitysystems