Home Research Uncertainty-Aware Control
Production Blueprint

Uncertainty-Aware Control
for Reliable AI Systems

We present a unified framework for uncertainty quantification and control that enables AI systems to recognize their limitations, abstain from unreliable predictions, and defer to human experts when appropriate.

Authors TeraSystemsAI Applied Research
Published January 2026
Focus Areas Calibration • Abstention • Human-AI Teaming
Scientist analyzing AI uncertainty data on interactive displays
Calibrated & Reliable AI
0.02
Calibration Error (ECE ↓)
Lower is better
0.89
Selective Accuracy
At target coverage
0.90
OOD AUROC
Higher is better
Reduced
Human Workload
Reduction via Smart Deferral

Built on Trust, Designed for Safety

Every layer of our Uncertainty-Aware Control framework is engineered to protect people, build confidence, and keep AI systems accountable, so your team can deploy with peace of mind.

Secure digital lock interface on a screen representing proactive AI safety protection

Proactive Protection

Uncertain predictions are flagged and routed before they reach users. Your AI knows its limits and acts on them automatically.

Magnifying glass over data revealing transparent insights and clear metrics

Full Transparency

Every confidence score is calibrated, every deferral is explained. Stakeholders and regulators get clear visibility into how decisions are made.

Smiling diverse team collaborating together on a project in a bright workspace

Human-Centered Design

Smart deferral keeps humans in the loop where it matters most. AI and people work together, each handling what they do best.

Executive Summary

Production AI systems need a safe way to say “I don’t know” and to route borderline cases to the right fallback (abstain, ask for more input, or defer to a human). Uncertainty-Aware Control (UAC) combines calibration, selective prediction, and cost-aware deferral so teams can set explicit policies for coverage, risk, and human review. The result is a controllable accuracy/coverage trade-off, more robust behavior under distribution shift, and auditable escalation workflows for high-stakes domains like clinical triage, autonomy, and credit risk.

Clear probability and statistics visualizations representing different types of uncertainty in AI systems

Types of Uncertainty

A

Aleatoric Uncertainty

Inherent randomness in the data that cannot be reduced with more training data. Captures noise in measurements, ambiguous inputs, and stochastic processes.

Data-dependent Source
Irreducible Nature
E

Epistemic Uncertainty

Model uncertainty due to limited knowledge, reducible with more data. High in regions with sparse training data or out-of-distribution inputs.

Model-dependent Source
Reducible Nature
!

Distribution Shift

Mismatch between training and deployment distributions causing unreliable predictions. Includes covariate shift, label shift, and concept drift.

0.87 Detection (AUROC)
OOD Type
C

Calibration Error

Gap between predicted confidence and actual accuracy. Well-calibrated models have confidence that matches their empirical success rate.

0.02 ECE (Ours)
Lower vs baseline
Modern analytics control dashboard with real-time gauges and confidence metrics

Control Mechanisms

S

Selective Prediction

Learn when to predict and when to abstain using a rejection function optimized for coverage-accuracy trade-off with task-specific thresholds.

0.89 Selective Acc
Moderate Abstention
D

Intelligent Deferral

Cost-aware routing of difficult cases to human experts based on estimated uncertainty, case complexity, and expert availability.

Reduced Workload
Improved Team Accuracy
C

Confidence Calibration

Post-hoc and training-time calibration methods ensuring predicted probabilities match empirical frequencies across all confidence levels.

0.02 ECE
Stable Over Time
!

Uncertainty Alerts

Real-time monitoring and alerting when model uncertainty exceeds safety thresholds, triggering human review or system fallback.

<50ms Latency
0.88 Alert Accuracy

Information-Theoretic Framework

Entropy and mutual information relationships in uncertainty quantification

Information Theory Framework for Uncertainty

Figure 1: Information-theoretic foundations showing relationships between entropy, mutual information, and uncertainty decomposition used in our control framework.

Clear research notes and mathematical analysis representing technical methodology

Technical Framework

Eq Equation 1: Uncertainty Decomposition
$$\mathcal{U}(x) = \underbrace{\mathbb{E}_{p(\theta|\mathcal{D})}[H[p(y|x,\theta)]]}_{\text{Aleatoric}} + \underbrace{I[y;\theta|x,\mathcal{D}]}_{\text{Epistemic}}$$
Total predictive uncertainty decomposes into aleatoric (expected entropy under posterior) and epistemic (mutual information between prediction and parameters) components.
Eq Equation 2: Optimal Selective Prediction
$$\min_{f,g} \mathbb{E}\left[ \ell(f(x), y) \cdot g(x) + c_{\text{abstain}} \cdot (1-g(x)) \right] \quad \text{s.t.} \quad \mathbb{E}[g(x)] \geq 1-\alpha$$
Joint optimization of predictor $f$ and selector $g \in \{0,1\}$, balancing prediction loss against abstention cost $c_{\text{abstain}}$ while maintaining coverage $\geq 1-\alpha$.
Eq Equation 3: Cost-Aware Deferral
$$d^*(x) = \mathbf{1}\left[ c_{\text{human}} + \mathcal{R}_{\text{human}}(x) < \mathcal{R}_{\text{model}}(x) \right]$$
Defer to human when expected human cost plus human error risk is less than model risk. $\mathcal{R}$ represents expected risk computed from calibrated uncertainties.
Eq Equation 4: Focal Calibration Loss
$$\mathcal{L}_{\text{focal-cal}} = -\sum_i (1-p_i)^\gamma \log(p_i) + \lambda \sum_b \left| \text{acc}(B_b) - \text{conf}(B_b) \right|$$
Combines focal loss for hard example emphasis with explicit calibration penalty across confidence bins $B_b$, where $\gamma$ controls focusing and $\lambda$ controls calibration strength.
Algorithm 1: Uncertainty-Aware Decision Pipeline O(N · M) ensemble inference
1 Input: Input x, ensemble {f_m}, thresholds (τ_abstain, τ_defer), costs
2 Output: Decision (predict, abstain, or defer) with confidence
3
4 // Step 1: Compute predictive distribution
5 p̄(y|x) ← (1/M) Σ_m p(y|x, θ_m)
6 ŷ ← argmax_y p̄(y|x)
7
8 // Step 2: Decompose uncertainty
9 u_aleatoric ← E_m[H[p(y|x, θ_m)]]
10 u_epistemic ← H[p̄(y|x)] - u_aleatoric
11 u_total ← u_aleatoric + u_epistemic
12
13 // Step 3: Calibrate confidence
14 conf ← Calibrate(max_y p̄(y|x), u_total)
15
16 // Step 4: Make control decision
17 if conf ≥ τ_abstain then
18   return (PREDICT, ŷ, conf)
19 else if ShouldDefer(u_epistemic, costs) then
20   return (DEFER, ŷ, conf) // Route to human
21 else
22   return (ABSTAIN, ∅, u_total) // No prediction
23 end if
Python uncertainty_aware_control.py
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Tuple, Optional, NamedTuple
from enum import Enum

class Decision(Enum):
    PREDICT = "predict"
    ABSTAIN = "abstain"
    DEFER = "defer"

class UncertaintyEstimate(NamedTuple):
    aleatoric: torch.Tensor
    epistemic: torch.Tensor
    total: torch.Tensor


class UncertaintyAwareController(nn.Module):
    """Unified uncertainty quantification and control system."""
    
    def __init__(
        self,
        base_model: nn.Module,
        num_ensemble: int = 5,
        abstain_threshold: float = 0.85,
        defer_cost: float = 0.1,
        num_mc_samples: int = 20
    ):
        super().__init__()
        self.ensemble = nn.ModuleList([
            self._clone_model(base_model) for _ in range(num_ensemble)
        ])
        self.abstain_threshold = abstain_threshold
        self.defer_cost = defer_cost
        self.num_mc_samples = num_mc_samples
        
        # Calibration network
        self.calibrator = nn.Sequential(
            nn.Linear(3, 32),  # conf, epistemic, aleatoric
            nn.ReLU(),
            nn.Linear(32, 1),
            nn.Sigmoid()
        )
    
    def compute_uncertainty(
        self, 
        x: torch.Tensor
    ) -> Tuple[torch.Tensor, UncertaintyEstimate]:
        """Compute decomposed uncertainty from ensemble."""
        # Collect ensemble predictions
        all_probs = []
        for model in self.ensemble:
            model.eval()
            with torch.no_grad():
                logits = model(x)
                probs = F.softmax(logits, dim=-1)
                all_probs.append(probs)
        
        all_probs = torch.stack(all_probs)  # [M, B, C]
        
        # Mean prediction
        mean_probs = all_probs.mean(dim=0)  # [B, C]
        
        # Aleatoric: expected entropy
        entropies = -(all_probs * (all_probs + 1e-10).log()).sum(dim=-1)
        aleatoric = entropies.mean(dim=0)  # [B]
        
        # Total: entropy of mean
        total = -(mean_probs * (mean_probs + 1e-10).log()).sum(dim=-1)
        
        # Epistemic: mutual information
        epistemic = total - aleatoric
        
        uncertainty = UncertaintyEstimate(
            aleatoric=aleatoric,
            epistemic=epistemic,
            total=total
        )
        
        return mean_probs, uncertainty
    
    def calibrate_confidence(
        self,
        raw_conf: torch.Tensor,
        uncertainty: UncertaintyEstimate
    ) -> torch.Tensor:
        """Apply learned calibration to raw confidence."""
        features = torch.stack([
            raw_conf,
            uncertainty.epistemic,
            uncertainty.aleatoric
        ], dim=-1)
        
        return self.calibrator(features).squeeze(-1)
    
    def should_defer(
        self,
        epistemic: torch.Tensor,
        model_risk: torch.Tensor
    ) -> torch.Tensor:
        """Decide whether to defer to human expert."""
        # Estimate human would do better on high-epistemic cases
        human_risk = 0.05  # Assumed human error rate
        return (self.defer_cost + human_risk) < model_risk
    
    def forward(
        self,
        x: torch.Tensor
    ) -> Tuple[Decision, Optional[torch.Tensor], torch.Tensor]:
        """Make uncertainty-aware decision."""
        # Get predictions and uncertainty
        probs, uncertainty = self.compute_uncertainty(x)
        
        # Get predicted class and raw confidence
        raw_conf, pred = probs.max(dim=-1)
        
        # Calibrate confidence
        calibrated_conf = self.calibrate_confidence(raw_conf, uncertainty)
        
        # Estimate model risk
        model_risk = 1 - calibrated_conf
        
        # Decision logic
        decisions = []
        for i in range(x.size(0)):
            if calibrated_conf[i] >= self.abstain_threshold:
                decisions.append(Decision.PREDICT)
            elif self.should_defer(uncertainty.epistemic[i], model_risk[i]):
                decisions.append(Decision.DEFER)
            else:
                decisions.append(Decision.ABSTAIN)
        
        return decisions, pred, calibrated_conf


class SelectivePredictor(nn.Module):
    """Learn to predict and abstain jointly."""
    
    def __init__(self, base_model: nn.Module, num_classes: int):
        super().__init__()
        self.predictor = base_model
        self.selector = nn.Sequential(
            nn.Linear(num_classes + 128, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )
    
    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
        """Returns predictions and selection scores."""
        features = self.predictor.get_features(x)
        logits = self.predictor.classifier(features)
        probs = F.softmax(logits, dim=-1)
        
        # Selection score: should we make a prediction?
        selector_input = torch.cat([probs, features], dim=-1)
        selection_score = self.selector(selector_input).squeeze(-1)
        
        return logits, selection_score
    
    def selective_loss(
        self,
        logits: torch.Tensor,
        selection: torch.Tensor,
        targets: torch.Tensor,
        coverage_target: float = 0.85,
        abstain_cost: float = 0.1
    ) -> torch.Tensor:
        """Compute selective prediction loss."""
        # Classification loss on selected samples
        ce_loss = F.cross_entropy(logits, targets, reduction='none')
        selective_ce = (ce_loss * selection).sum() / (selection.sum() + 1e-10)
        
        # Abstention penalty
        abstain_penalty = abstain_cost * (1 - selection).mean()
        
        # Coverage constraint
        coverage = selection.mean()
        coverage_penalty = F.relu(coverage_target - coverage) ** 2
        
        return selective_ce + abstain_penalty + 10 * coverage_penalty

Deployment Scenarios

Built for production teams (risk, MLOps, operations): define explicit uncertainty policies, route borderline cases to humans with context, and keep behavior stable under changing data.

Doctor reviewing AI-assisted medical diagnostics on a digital tablet

Medical Diagnosis

AI-assisted radiology with automatic escalation of uncertain cases to specialist review, ensuring high-confidence automated diagnosis.

0.89
Selective Acc
0.12
Deferral Rate
0.73
Workload ↓
Autonomous vehicle and urban environment

Autonomous Driving

Real-time uncertainty monitoring for autonomous vehicles, triggering driver takeover requests or safe stops when confidence drops.

Verified
Safety Status
<100ms
Alert Latency
<0.01
False Alerts
Business analytics and financial data

Financial Risk

Credit scoring with uncertainty-aware decision boundaries, routing borderline applications for manual review.

0.88
Auto-Approve Acc
0.18
Manual Review
Lower
Default Rate
Legal documents and contract review

Legal Document Analysis

Contract review automation with confidence-based routing to legal experts for complex or ambiguous clauses.

0.87
Clause Acc
0.08
Expert Review
0.85
Time Saved
Scientific charts and performance graphs showing benchmark results

Experimental Results

Entropy-Based Uncertainty Analysis

Entropy decomposition and predictive uncertainty distributions

Entropy Theory for Uncertainty Quantification

Figure 2: Entropy decomposition revealing aleatoric vs epistemic uncertainty contributions across different input distributions and model confidence regimes.

Reliability Diagram

Calibration comparison across methods

Interactive chart
Loading reliability diagram
Method Full Accuracy Selective Acc Coverage ECE ↓ OOD AUROC
Softmax Baseline 0.88 0.90 0.90 0.15 0.76
Temperature Scaling 0.88 0.90 0.88 0.05 0.78
MC Dropout 0.88 0.90 0.87 0.06 0.82
Deep Ensemble 0.89 0.90 0.86 0.04 0.90
Selective Net 0.89 0.90 0.84 0.04 0.85
UAC (Ours) 0.89 0.91 0.85 0.02 0.90

Selective Accuracy vs Coverage

Trade-off between accuracy and prediction coverage

Interactive chart
Loading selective accuracy analysis

Human-AI Teaming Performance

Combined accuracy with intelligent deferral

Interactive chart
Loading teaming performance

Out-of-Distribution Detection

ROC curves for OOD detection across datasets

Interactive chart
Loading OOD detection curves

Uncertainty Control Demo

Decision Under Uncertainty

Explore how the system makes decisions based on confidence and uncertainty thresholds.

Person reviewing data insights on a large screen with graphs and analysis results

Key Findings

C

Calibration is Critical

Well-calibrated uncertainties are essential for reliable abstention and deferral. Our calibration objective materially reduces ECE compared to raw softmax.

U

Decomposition Matters

Separating epistemic and aleatoric uncertainty enables smarter decisions to defer on epistemic (human can help) but abstain on pure aleatoric (inherent ambiguity).

H

Human-AI > Either Alone

Human-in-the-loop teaming can outperform either humans or models alone on hard cases, while reducing manual review volume through targeted deferral.

RT

Real-Time Capable

Uncertainty estimation can be implemented with bounded overhead, enabling real-time deployment in autonomous systems.

References

  • Gal, Y., Ghahramani, Z.
    Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
    ICML 2016
    arXiv:1506.02142 →
  • Lakshminarayanan, B., Pritzel, A., Blundell, C.
    Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
    NeurIPS 2017
    arXiv:1612.01474 →
  • Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.
    On Calibration of Modern Neural Networks
    ICML 2017
    arXiv:1706.04599 →
  • Geifman, Y., El-Yaniv, R.
    Selective Classification for Deep Neural Networks
    NeurIPS 2017
    arXiv:1705.08500 →
  • Mozannar, H., Sontag, D.
    Consistent Estimators for Learning to Defer to an Expert
    ICML 2020
    arXiv:2006.01862 →
  • Hendrycks, D., Gimpel, K.
    A Baseline for Detecting Misclassified and Out-of-Distribution Examples
    ICLR 2017
    arXiv:1610.02136 →
  • Kendall, A., Gal, Y.
    What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
    NeurIPS 2017
    arXiv:1703.04977 →
  • Naeini, M.P., Cooper, G.F., Hauskrecht, M.
    Obtaining Well Calibrated Probabilities Using Bayesian Binning
    AAAI 2015
    PMC4410090 →
  • Minderer, M., Djolonga, J., Romijnders, R., et al.
    Revisiting the Calibration of Modern Neural Networks
    NeurIPS 2021
    arXiv:2106.07998 →
  • Ovadia, Y., Fertig, E., Ren, J., et al.
    Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
    NeurIPS 2019
    arXiv:1906.02530 →
Team of professionals collaborating and building partnerships in a bright modern office

Let's Work Together

This work reflects a deep investment in reliable, calibrated AI. Whether you're hiring, collaborating, funding, or seeking consultation, let's connect.

Staff / Senior Roles

Actively exploring senior or staff research and engineering positions in AI safety, reliability, and production ML at AI labs, tech companies, or applied research teams.

Reach Out →

Research Collaboration

Working on uncertainty, calibration, human-AI teaming, or deployment reliability? Open to joint papers, benchmarks, workshops, and shared evaluations.

Propose a Collaboration →

Grants & Funding

TeraSystemsAI is pursuing research grants and philanthropic partnerships to scale AI safety infrastructure work. Happy to discuss program fit and joint proposals.

Discuss Funding →

Industry Consulting

Available for consulting on AI safety architecture, uncertainty-aware system design, risk evaluation, and deploying reliable AI in regulated industries.

Start a Conversation →

Follow the Work

Share feedback, flag an issue with our approach, or reach out if something we've built would benefit your team or research. All thoughtful messages welcome.

Get in Touch →