Overview
Production Readiness
0.45
Novelty Score
0.55
Cost Impact Score
0.6
Citation Count
1
Why It Matters For Business
AI techniques can reduce tail latency, improve throughput, lower storage errors, and cut datacenter costs, but require guardrails and staged deployment to avoid regressions and privacy risks.
Summary TLDR
This 68-page survey maps the two-way interaction between AI and operating systems. It summarizes how traditional ML, large language models (LLMs), and agent systems improve OS subsystems (scheduling, I/O, storage, memory, networking, security, GUI/CLI, ops/tuning, verification, education). It also explains how OS designs (kernel-bypass, modular kernels, memory and scheduler interfaces) accelerate AI workloads (short- and long-context inference, distributed training, edge inference). The paper lists representative systems, quantifies several empirical gains (I/O latency, tail latency, storage throughput, error reduction, energy/TCO), identifies pitfalls (model drift, overhead, explainability,
Problem Statement
Modern OSs face growing heterogeneity and dynamic workloads that break static heuristics. At the same time, AI methods (ML, LLMs, agents) can automate and optimize OS decisions but are fragmented and raise new overhead, reliability, and governance issues. The paper surveys techniques and gaps in both "AI for OS" and "OS for AI" to guide engineering and research.
Main Contribution
Categorize research into two directions: AI for OS (apply AI inside OS) and OS for AI (OS changes to support AI workloads).
Survey representative systems across kernel subsystems and the OS ecosystem, summarizing goals, methods, and measured impacts.
Distill common evaluation axes, engineering patterns, deployment suggestions, and a three-stage roadmap: AI-powered, AI-refactored, AI-driven OSs.
Identify core risks (model drift, runtime overhead, explainability, privacy) and propose rules+AI guardrails, modular kernels, and unified toolchains.
Key Findings
Lightweight ML in the kernel can sharply improve I/O predictability and throughput.
A production-focused ML pipeline can deliver sub-microsecond decisions and reduce latency vs heuristics.
Learned indexes at the block layer can cut redundant work and shrink tail latency drastically.
ML methods can improve NVM reliability and device lifetime.
System-level autotuning with ML can lower datacenter memory cost with small performance impact.
LLM-powered automation boosts kernel testing and vulnerability discovery but is not flawless.
Results
I/O latency (avg)
Accuracy
Flash translation double reads
Device endurance (NVM)
Datacenter memory TCO
Kernel fuzzing / vulnerability discovery
Who Should Care
What To Try In 7 Days
Run a lightweight ML pilot on a hot I/O path (simulate LinnOS) and measure latency/tail improvements.
Audit logs and telemetry to build a small dataset for anomaly detection or failure prediction (prepare Desh-style pipeline).
Prototype a simple LLM-assisted devops flow for kernel config changes using AutoOS or BYOS ideas in a sandbox.
Agent Features
Memory
- external memory vectors (semantic memory)
- context window as short-term memory
- KV cache retrieval
Planning
- multi-step reasoning
- tree-of-thought prompting
- state-machine orchestration
Tool Use
- LLM + symbolic executor
- system tool invocation
- fuzzers and validators
Frameworks
- AIOS
- OSAgent
- AIOS-Agent
- LSFS
- CoRE
Is Agentic
true
Architectures
- single-agent
- multi-agent
- memory-enhanced agents
Collaboration
- role-specialized agents
- agent orchestration pipelines
- multi-agent grading/education
Optimization Features
Token Efficiency
- attention reuse (AttentionStore)
- paged attention / memory paging for long context
- KV cache management
Infra Optimization
- library OS / Demikernel for microsecond datacenter paths
- modular kernels and per-application OS instances
- software-defined far memory
Model Optimization
- quantization
- model distillation
- lightweight NN for kernel paths
System Optimization
- communication-computation overlap
- GPU-initiated I/O (BaM)
- device-aware scheduling
Training Optimization
- federated/continuous retraining
- noise filtering and period-based labeling
- domain-specific data curation
Inference Optimization
- kernel-bypass and in-kernel inference
- quantized sub-µs inference (Heimdall)
- adaptive batching and preemption (ExeGPT, XSched)
Reproducibility
Open Source Status
- partial
Risks & Boundaries
Limitations
- Model drift: learned policies degrade as hardware and workloads change.
- Inference overhead: kernel-embedded models must meet tight latency budgets.
- Explainability: LLM/agent outputs can hallucinate or produce unsafe patches.
- Data scarcity: representative, labeled OS traces are hard to collect and share.
- Engineering complexity: legacy kernels lack modular hooks for safe AI integration.
When Not To Use
- In hard real-time kernel paths with strict determinism requirements.
- On resource-limited embedded devices without inference acceleration.
- When high-quality, representative telemetry is unavailable.
- In environments requiring provable, formally verified behavior without AI fallback.
Failure Modes
- Hallucinated or incorrect code patches from LLMs causing regressions.
- Model drift leading to performance regressions or SLO violations.
- Inference-induced contention that increases tail latency.
- Adversarial inputs that trigger incorrect scheduling or security alerts.
Core Entities
Models
- GPT-4
- LLaMA
- Gemini
- MLP
- LSTM
- Autoencoder
- Random Forest
Metrics
- P99 latency
- average latency
- throughput (QPS)
- Accuracy
- energy/TCO
- device endurance
- code coverage
Datasets
- Microsoft/Alibaba/Tencent I/O traces (Heimdall)
- Linux kernel bug corpus (LinuxFLBench)
- VulnLoc
- SAN2VULN
Benchmarks
- UnixBench (AutoOS experiments)
- SLURM benchmarks (Chronus)
- Kernel fuzzing coverage (ECG, KernelGPT)

