<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="zh-CN"><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://weinan.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://weinan.io/" rel="alternate" type="text/html" hreflang="zh-CN" /><updated>2026-04-12T00:04:37+00:00</updated><id>https://weinan.io/feed.xml</id><title type="html">阿男的小窝</title><subtitle>阿男的技术博客：Linux 内核、云原生、Java/Rust 与系统编程笔记。</subtitle><author><name>阿男</name></author><entry><title type="html">当Linux内核不再「迁就」PostgreSQL：一次抢占模型变更引发的性能风暴</title><link href="https://weinan.io/2026/04/08/linux-kernel-7-preempt-lazy-postgresql-performance.html" rel="alternate" type="text/html" title="当Linux内核不再「迁就」PostgreSQL：一次抢占模型变更引发的性能风暴" /><published>2026-04-08T00:00:00+00:00</published><updated>2026-04-08T00:00:00+00:00</updated><id>https://weinan.io/2026/04/08/linux-kernel-7-preempt-lazy-postgresql-performance</id><content type="html" xml:base="https://weinan.io/2026/04/08/linux-kernel-7-preempt-lazy-postgresql-performance.html"><![CDATA[<style>
/* Mermaid diagram container */
.mermaid-container {
  position: relative;
  display: block;
  cursor: pointer;
  transition: opacity 0.2s;
  max-width: 100%;
  margin: 20px 0;
  overflow-x: auto;
}

.mermaid-container:hover {
  opacity: 0.8;
}

.mermaid-container svg {
  max-width: 100%;
  height: auto;
  display: block;
}

/* Modal overlay */
.mermaid-modal {
  display: none;
  position: fixed;
  z-index: 9999;
  left: 0;
  top: 0;
  width: 100%;
  height: 100%;
  background-color: rgba(0, 0, 0, 0.9);
  animation: fadeIn 0.3s;
}

.mermaid-modal.active {
  display: flex;
  align-items: center;
  justify-content: center;
}

@keyframes fadeIn {
  from { opacity: 0; }
  to { opacity: 1; }
}

/* Modal content */
.mermaid-modal-content {
  position: relative;
  width: 90vw;
  height: 90vh;
  overflow: hidden;
  background: white;
  padding: 20px;
  border-radius: 8px;
  box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
  display: flex;
  align-items: center;
  justify-content: center;
}

.mermaid-modal-diagram {
  transform-origin: center center;
  transition: transform 0.2s ease;
  display: inline-block;
  min-width: 100%;
  cursor: grab;
  user-select: none;
}

.mermaid-modal-diagram.dragging {
  cursor: grabbing;
  transition: none;
}

.mermaid-modal-diagram svg {
  width: 100%;
  height: auto;
  display: block;
  pointer-events: none;
}

/* Control buttons */
.mermaid-controls {
  position: absolute;
  top: 10px;
  right: 10px;
  display: flex;
  gap: 8px;
  z-index: 10000;
}

.mermaid-btn {
  background: rgba(255, 255, 255, 0.9);
  border: 1px solid #ddd;
  border-radius: 4px;
  padding: 8px 12px;
  cursor: pointer;
  font-size: 14px;
  transition: background 0.2s;
  color: #333;
  font-weight: 500;
}

.mermaid-btn:hover {
  background: white;
  box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}

/* Close button */
.mermaid-close {
  background: #f44336;
  color: white;
  border: none;
}

.mermaid-close:hover {
  background: #d32f2f;
}

/* Zoom indicator */
.mermaid-zoom-level {
  position: absolute;
  bottom: 20px;
  left: 20px;
  background: rgba(0, 0, 0, 0.7);
  color: white;
  padding: 6px 12px;
  border-radius: 4px;
  font-size: 14px;
  z-index: 10000;
}
</style>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';

  mermaid.initialize({
    startOnLoad: false,
    theme: 'default',
    securityLevel: 'loose',
    htmlLabels: true,
    themeVariables: {
      fontSize: '14px'
    }
  });

  let currentZoom = 1;
  let currentModal = null;
  let isDragging = false;
  let startX = 0;
  let startY = 0;
  let translateX = 0;
  let translateY = 0;

  // Create modal HTML
  function createModal() {
    const modal = document.createElement('div');
    modal.className = 'mermaid-modal';
    modal.innerHTML = `
      <div class="mermaid-controls">
        <button class="mermaid-btn zoom-in">放大 +</button>
        <button class="mermaid-btn zoom-out">缩小 -</button>
        <button class="mermaid-btn zoom-reset">重置</button>
        <button class="mermaid-btn mermaid-close">关闭 ✕</button>
      </div>
      <div class="mermaid-modal-content">
        <div class="mermaid-modal-diagram"></div>
      </div>
      <div class="mermaid-zoom-level">100%</div>
    `;
    document.body.appendChild(modal);
    return modal;
  }

  // Show modal with diagram
  function showModal(diagramContent) {
    if (!currentModal) {
      currentModal = createModal();
      setupModalEvents();
    }

    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.innerHTML = diagramContent;

    // Remove any width/height attributes from SVG to make it responsive
    const svg = modalDiagram.querySelector('svg');
    if (svg) {
      svg.removeAttribute('width');
      svg.removeAttribute('height');
      svg.style.width = '100%';
      svg.style.height = 'auto';
    }

    // Setup drag functionality
    setupDrag(modalDiagram);

    currentModal.classList.add('active');
    currentZoom = 1;
    resetPosition();
    updateZoom();
    document.body.style.overflow = 'hidden';
  }

  // Hide modal
  function hideModal() {
    if (currentModal) {
      currentModal.classList.remove('active');
      document.body.style.overflow = '';
    }
  }

  // Update zoom level
  function updateZoom() {
    if (!currentModal) return;
    const diagram = currentModal.querySelector('.mermaid-modal-diagram');
    const zoomLevel = currentModal.querySelector('.mermaid-zoom-level');
    diagram.style.transform = `translate(${translateX}px, ${translateY}px) scale(${currentZoom})`;
    zoomLevel.textContent = `${Math.round(currentZoom * 100)}%`;
  }

  // Reset position when zoom changes
  function resetPosition() {
    translateX = 0;
    translateY = 0;
  }

  // Setup drag functionality
  function setupDrag(element) {
    element.addEventListener('mousedown', startDrag);
    element.addEventListener('touchstart', startDrag);
  }

  function startDrag(e) {
    if (e.type === 'mousedown' && e.button !== 0) return; // Only left click

    isDragging = true;
    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.classList.add('dragging');

    if (e.type === 'touchstart') {
      startX = e.touches[0].clientX - translateX;
      startY = e.touches[0].clientY - translateY;
    } else {
      startX = e.clientX - translateX;
      startY = e.clientY - translateY;
    }

    document.addEventListener('mousemove', drag);
    document.addEventListener('touchmove', drag);
    document.addEventListener('mouseup', stopDrag);
    document.addEventListener('touchend', stopDrag);
  }

  function drag(e) {
    if (!isDragging) return;
    e.preventDefault();

    if (e.type === 'touchmove') {
      translateX = e.touches[0].clientX - startX;
      translateY = e.touches[0].clientY - startY;
    } else {
      translateX = e.clientX - startX;
      translateY = e.clientY - startY;
    }

    updateZoom();
  }

  function stopDrag() {
    isDragging = false;
    const modalDiagram = currentModal?.querySelector('.mermaid-modal-diagram');
    if (modalDiagram) {
      modalDiagram.classList.remove('dragging');
    }
    document.removeEventListener('mousemove', drag);
    document.removeEventListener('touchmove', drag);
    document.removeEventListener('mouseup', stopDrag);
    document.removeEventListener('touchend', stopDrag);
  }

  // Setup modal event listeners
  function setupModalEvents() {
    if (!currentModal) return;

    // Close button
    currentModal.querySelector('.mermaid-close').addEventListener('click', hideModal);

    // Zoom buttons
    currentModal.querySelector('.zoom-in').addEventListener('click', () => {
      currentZoom = Math.min(currentZoom + 0.25, 3);
      updateZoom();
    });

    currentModal.querySelector('.zoom-out').addEventListener('click', () => {
      currentZoom = Math.max(currentZoom - 0.25, 0.5);
      updateZoom();
    });

    currentModal.querySelector('.zoom-reset').addEventListener('click', () => {
      currentZoom = 1;
      resetPosition();
      updateZoom();
    });

    // Close on background click
    currentModal.addEventListener('click', (e) => {
      if (e.target === currentModal) {
        hideModal();
      }
    });

    // Close on ESC key
    document.addEventListener('keydown', (e) => {
      if (e.key === 'Escape' && currentModal.classList.contains('active')) {
        hideModal();
      }
    });
  }

  // Convert Jekyll-rendered code blocks to mermaid divs
  document.addEventListener('DOMContentLoaded', async function() {
    const codeBlocks = document.querySelectorAll('code.language-mermaid');

    for (const codeBlock of codeBlocks) {
      const pre = codeBlock.parentElement;
      const container = document.createElement('div');
      container.className = 'mermaid-container';

      const mermaidDiv = document.createElement('div');
      mermaidDiv.className = 'mermaid';
      mermaidDiv.textContent = codeBlock.textContent;

      container.appendChild(mermaidDiv);
      pre.replaceWith(container);
    }

    // Render all mermaid diagrams
    try {
      await mermaid.run({
        querySelector: '.mermaid'
      });
      console.log('Mermaid diagrams rendered successfully');
    } catch (error) {
      console.error('Mermaid rendering error:', error);
    }

    // Add click handlers to rendered diagrams
    document.querySelectorAll('.mermaid-container').forEach((container, index) => {
      // Find the rendered SVG inside the container
      const svg = container.querySelector('svg');
      if (!svg) {
        console.warn(`No SVG found in container ${index}`);
        return;
      }

      // Make the container clickable
      container.style.cursor = 'pointer';
      container.title = '点击查看大图';

      container.addEventListener('click', function(e) {
        e.preventDefault();
        e.stopPropagation();

        // Clone the SVG for the modal
        const svgClone = svg.cloneNode(true);
        const tempDiv = document.createElement('div');
        tempDiv.appendChild(svgClone);

        console.log('Opening modal for diagram', index);
        showModal(tempDiv.innerHTML);
      });

      console.log(`Click handler added to diagram ${index}`);
    });
  });
</script>

<blockquote>
  <p>一个调度标志位的改变，如何让数据库吞吐量瞬间腰斩？</p>
</blockquote>

<h2 id="引言从完美运行到性能腰斩">引言：从”完美运行”到”性能腰斩”</h2>

<p>想象一下这样的场景：你的数据库服务器刚刚升级了最新的Linux Kernel 7.0，期待着更好的性能和安全性。然而，上线后监控图表却显示了一个触目惊心的画面——PostgreSQL的吞吐量在毫无征兆的情况下<strong>骤降了将近一半</strong>。</p>

<p>这背后的根源，直指Linux内核调度器在Kernel 7.0中引入的一次重大变更——<strong>惰性抢占（PREEMPT_LAZY）</strong> 模型<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>。本文将深入技术底层，剖析这次性能衰退的来龙去脉，并探讨其背后的设计哲学冲突。</p>

<h2 id="一linux抢占模型速览吞吐量-vs-响应时间的权衡">一、Linux抢占模型速览：吞吐量 vs. 响应时间的权衡</h2>

<p>要理解这个问题，我们首先需要明白Linux内核是如何决定”何时暂停一个任务，让另一个任务运行”的。这个决策过程被称为<strong>抢占（Preemption）</strong>。多年来，Linux内核提供了几种抢占模式，在<strong>系统吞吐量</strong>和<strong>交互响应时间</strong>之间做出权衡。</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">抢占模式</th>
      <th style="text-align: left">核心机制</th>
      <th style="text-align: left">特点</th>
      <th style="text-align: left">典型场景</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>PREEMPT_NONE</strong></td>
      <td style="text-align: left">任务仅在时间片用完或主动让出时被抢占。</td>
      <td style="text-align: left"><strong>吞吐量最高</strong>，但响应延迟可能较大。</td>
      <td style="text-align: left">服务器、批处理系统</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>PREEMPT_VOLUNTARY</strong></td>
      <td style="text-align: left">在内核代码的”检查点”（如<code class="language-plaintext highlighter-rouge">cond_resched()</code>）主动让出CPU。</td>
      <td style="text-align: left">吞吐量与响应时间的<strong>折中方案</strong>。</td>
      <td style="text-align: left">通用发行版内核默认选项</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>PREEMPT_FULL</strong></td>
      <td style="text-align: left">除了极少数临界区（如持有自旋锁），几乎任何地方都可抢占。</td>
      <td style="text-align: left"><strong>响应延迟极低</strong>，适合桌面和多媒体应用。</td>
      <td style="text-align: left">桌面系统、需要低延迟的场景</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>PREEMPT_RT</strong> (实时补丁)</td>
      <td style="text-align: left">进一步将自旋锁变为可抢占，提供硬实时能力。</td>
      <td style="text-align: left"><strong>确定性响应</strong>，但吞吐量有损耗。</td>
      <td style="text-align: left">工业控制、音视频处理</td>
    </tr>
  </tbody>
</table>

<p>对于绝大多数Linux发行版内核，默认采用的是<code class="language-plaintext highlighter-rouge">PREEMPT_VOLUNTARY</code>模式。而像PostgreSQL这样的数据库，则极度依赖于<code class="language-plaintext highlighter-rouge">PREEMPT_NONE</code>或<code class="language-plaintext highlighter-rouge">PREEMPT_VOLUNTARY</code>带来的高吞吐量特性。</p>

<p>下图展示了不同抢占模型在性能特性上的定位：</p>

<pre><code class="language-mermaid">graph LR
    subgraph "抢占模型的演化与权衡"
        A[PREEMPT_NONE&lt;br/&gt;服务器优化] --&gt;|引入检查点| B[PREEMPT_VOLUNTARY&lt;br/&gt;折中方案]
        B --&gt;|全面可抢占| C[PREEMPT_FULL&lt;br/&gt;桌面优化]
        C --&gt;|硬实时| D[PREEMPT_RT&lt;br/&gt;实时系统]
        B -.-&gt;|v7.0新增| E[PREEMPT_LAZY&lt;br/&gt;简化内核]
    end
    
    style A fill:#90EE90
    style B fill:#FFD700
    style C fill:#FFB6C1
    style D fill:#FF6347
    style E fill:#87CEEB
    
    classDef throughput fill:#90EE90,stroke:#333,stroke-width:2px
    classDef latency fill:#FFB6C1,stroke:#333,stroke-width:2px
    classDef balanced fill:#FFD700,stroke:#333,stroke-width:2px
    classDef newmodel fill:#87CEEB,stroke:#333,stroke-width:3px
</code></pre>

<p><strong>关键特性对比：</strong></p>

<ul>
  <li>🟢 <strong>PREEMPT_NONE/VOLUNTARY</strong>：高吞吐，PostgreSQL的最佳拍档</li>
  <li>🔵 <strong>PREEMPT_LAZY</strong>：试图保持高吞吐，同时简化内核</li>
  <li>🔴 <strong>PREEMPT_FULL/RT</strong>：低延迟优先，牺牲部分吞吐</li>
</ul>

<h3 id="cond_resched一个权宜之计"><code class="language-plaintext highlighter-rouge">cond_resched()</code>：一个权宜之计</h3>

<p>在<code class="language-plaintext highlighter-rouge">PREEMPT_NONE</code>模式下，如果一个内核线程执行了过长的循环，可能会导致其他任务”饿死”。为了解决这个问题，内核开发者在代码中插入了数百个<code class="language-plaintext highlighter-rouge">cond_resched()</code>调用<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>。这就像在高速公路上设置的临时检查站——内核线程运行到这里时，会主动”看一眼”是否有更高优先级的任务需要CPU，如果有，就主动让出。</p>

<p>但这终究是一个<strong>启发式（heuristic）的权宜之计</strong>：它依赖于开发者”猜”对哪里需要插入检查点，而且这些额外的检查点本身也会带来性能开销。</p>

<h2 id="二kernel-70-的改变惰性抢占的登场">二、Kernel 7.0 的改变：惰性抢占的登场</h2>

<p>Kernel 7.0 的调度器迎来了一次重大重构。维护者Peter Zijlstra引入了一个新的抢占模式——<strong>PREEMPT_LAZY（惰性抢占）</strong>。在commit <a href="https://github.com/torvalds/linux/commit/7dadeaa6e851"><code class="language-plaintext highlighter-rouge">7dadeaa6e851</code></a>中，他详细解释了引入这一机制的三个核心原因<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>：</p>

<blockquote>
  <p>The introduction of PREEMPT_LAZY was for multiple reasons:</p>

  <ul>
    <li>PREEMPT_RT suffered from over-scheduling, hurting performance compared to !PREEMPT_RT.</li>
    <li>the introduction of (more) features that rely on preemption; like folio_zero_user() which can do large memset() without preemption checks.</li>
    <li>the endless and uncontrolled sprinkling of cond_resched() – mostly cargo cult or in response to poor to replicate workloads.</li>
  </ul>
</blockquote>

<p>简单来说，核心目标是<strong>简化内核代码，并为最终移除所有的<code class="language-plaintext highlighter-rouge">cond_resched()</code>铺平道路</strong>。</p>

<p>在支持<code class="language-plaintext highlighter-rouge">PREEMPT_LAZY</code>的架构（包括x86和ARM64）上，传统的<code class="language-plaintext highlighter-rouge">PREEMPT_VOLUNTARY</code>选项已从配置菜单中移除<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>。在<a href="https://github.com/torvalds/linux/blob/master/kernel/Kconfig.preempt"><code class="language-plaintext highlighter-rouge">kernel/Kconfig.preempt</code></a>中可以看到：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>config PREEMPT_VOLUNTARY
	bool "Voluntary Kernel Preemption (Desktop)"
	depends on !ARCH_HAS_PREEMPT_LAZY
	depends on !ARCH_NO_PREEMPT
</code></pre></div></div>

<h3 id="技术核心两个标志位的故事">技术核心：两个标志位的故事</h3>

<p><code class="language-plaintext highlighter-rouge">PREEMPT_LAZY</code>的实现非常巧妙，它引入了两个关键的线程标志位。在commit <a href="https://github.com/torvalds/linux/commit/26baa1f1c4bd"><code class="language-plaintext highlighter-rouge">26baa1f1c4bd</code></a>中，Peter Zijlstra描述了这一基础设施<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>：</p>

<blockquote>
  <p>Add the basic infrastructure to split the TIF_NEED_RESCHED bit in two.
Either bit will cause a resched on return-to-user, but only
TIF_NEED_RESCHED will drive IRQ preemption.</p>
</blockquote>

<p>具体来说：</p>

<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED</code>（紧急标志）</strong>：设置此标志意味着<strong>必须立即抢占</strong>当前任务。这通常用于高优先级实时任务被唤醒的场景。</li>
  <li><strong><code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED_LAZY</code>（惰性标志）</strong>：设置此标志意味着”最好”抢占当前任务，但<strong>不是现在</strong>。这用于普通的调度公平性考虑。</li>
</ol>

<p>在commit <a href="https://github.com/torvalds/linux/commit/7c70cb94d29c"><code class="language-plaintext highlighter-rouge">7c70cb94d29c</code></a>中，Peter Zijlstra进一步说明了工作机制<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>：</p>

<blockquote>
  <p>This LAZY bit will be promoted to the full NEED_RESCHED bit on tick.
As such, the average delay between setting LAZY and actually
rescheduling will be TICK_NSEC/2.</p>

  <p>In short, Lazy preemption will delay preemption for fair class but
will function as Full preemption for all the other classes, most
notably the realtime (RR/FIFO/DEADLINE) classes.</p>
</blockquote>

<p><strong>工作机制：</strong></p>

<ul>
  <li><strong>大多数情况</strong>：当一个普通的高优先级任务被唤醒时，调度器只会设置<code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED_LAZY</code>标志，而不是传统的<code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED</code>。</li>
  <li><strong>检查点行为改变</strong>：在<code class="language-plaintext highlighter-rouge">PREEMPT_VOLUNTARY</code>模式下，<code class="language-plaintext highlighter-rouge">cond_resched()</code>会检查<code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED</code>标志并立即让出CPU。但在新的惰性模式下，<strong><code class="language-plaintext highlighter-rouge">cond_resched()</code>不再检查惰性标志</strong>。</li>
  <li><strong>最终抢占</strong>：当前任务会继续运行，直到下一个<strong>时钟中断（timer tick）</strong> 到来。此时，内核会检查惰性标志，如果被设置，则将其”升级”为紧急标志，并触发抢占。</li>
</ul>

<p>内核在<a href="https://github.com/torvalds/linux/blob/master/kernel/sched/core.c"><code class="language-plaintext highlighter-rouge">kernel/sched/core.c</code></a>中实现了这一机制：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">__always_inline</span> <span class="kt">int</span> <span class="nf">get_lazy_tif_bit</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">dynamic_preempt_lazy</span><span class="p">())</span>
		<span class="k">return</span> <span class="n">TIF_NEED_RESCHED_LAZY</span><span class="p">;</span>

	<span class="k">return</span> <span class="n">TIF_NEED_RESCHED</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">resched_curr_lazy</span><span class="p">(</span><span class="k">struct</span> <span class="n">rq</span> <span class="o">*</span><span class="n">rq</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">__resched_curr</span><span class="p">(</span><span class="n">rq</span><span class="p">,</span> <span class="n">get_lazy_tif_bit</span><span class="p">());</span>
<span class="p">}</span>
</code></pre></div></div>

<p>在时钟中断处理中，惰性标志会被升级为常规的重调度标志：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	<span class="k">if</span> <span class="p">(</span><span class="n">dynamic_preempt_lazy</span><span class="p">()</span> <span class="o">&amp;&amp;</span> <span class="n">tif_test_bit</span><span class="p">(</span><span class="n">TIF_NEED_RESCHED_LAZY</span><span class="p">))</span>
		<span class="n">resched_curr</span><span class="p">(</span><span class="n">rq</span><span class="p">);</span>
</code></pre></div></div>

<h3 id="改变前后的对比">改变前后的对比</h3>

<p><strong>Kernel 6.x (PREEMPT_VOLUNTARY)</strong>：高优先级任务醒来 → 设置 <code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED</code> → 当前任务运行到下一个<code class="language-plaintext highlighter-rouge">cond_resched()</code> → <strong>立即让出CPU</strong>。</p>

<p><strong>Kernel 7.0 (PREEMPT_LAZY)</strong>：高优先级任务醒来 → 设置 <code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED_LAZY</code> → 当前任务<strong>忽略所有<code class="language-plaintext highlighter-rouge">cond_resched()</code>检查点</strong> → 继续运行直到<strong>时钟中断</strong>（例如几毫秒后）→ 升级标志，让出CPU。</p>

<p>下面的时序图展示了这两种模式的关键差异：</p>

<pre><code class="language-mermaid">sequenceDiagram
    participant TS as 任务调度器
    participant CT as 当前任务
    participant CP as cond_resched()
    participant TI as 时钟中断
    
    Note over TS,TI: Kernel 6.x (PREEMPT_VOLUNTARY)
    TS-&gt;&gt;CT: 设置 TIF_NEED_RESCHED
    CT-&gt;&gt;CT: 继续执行...
    CT-&gt;&gt;CP: 运行到检查点
    CP-&gt;&gt;CP: 检查 TIF_NEED_RESCHED
    CP--&gt;&gt;TS: 立即让出CPU (快速响应)
    
    Note over TS,TI: Kernel 7.0 (PREEMPT_LAZY)
    TS-&gt;&gt;CT: 设置 TIF_NEED_RESCHED_LAZY
    CT-&gt;&gt;CT: 继续执行...
    CT-&gt;&gt;CP: 运行到检查点
    CP-&gt;&gt;CP: 忽略 LAZY 标志
    CT-&gt;&gt;CT: 继续执行...
    CT-&gt;&gt;TI: 时钟中断到达
    TI-&gt;&gt;TI: 升级 LAZY → NEED_RESCHED
    TI--&gt;&gt;TS: 触发抢占 (延迟响应)
</code></pre>

<p>简单来说，<strong>内核将抢占决策权从”代码中的分散检查点”收拢到了”调度器的时钟中断”中</strong>。这简化了内核，但也意味着一个任务在被抢占前，可能会运行更长时间。</p>

<h2 id="三postgresql-的自旋锁机制一场对低延迟的极致追求">三、PostgreSQL 的自旋锁机制：一场对低延迟的极致追求</h2>

<p>那么，为什么内核的这个改动会让PostgreSQL”崩溃”呢？答案藏在PostgreSQL为了极致性能而设计的<strong>自旋锁（Spinlock）</strong> 机制中。</p>

<h3 id="自旋而不是睡眠">自旋，而不是睡眠</h3>

<p>在PostgreSQL的源代码<a href="https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/s_lock.c"><code class="language-plaintext highlighter-rouge">src/backend/storage/lmgr/s_lock.c</code></a>中，我们可以看到其自旋锁的实现逻辑<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>。当一个进程尝试获取一个已被其他进程持有的自旋锁时，它不会立即进入睡眠状态（这会导致上下文切换，开销巨大），而是会执行一个<strong>紧凑的循环，反复检查锁是否已被释放</strong>。这个过程被称为<strong>自旋（spinning）</strong>。</p>

<p>PostgreSQL的代码注释清楚地说明了这一点：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*
 * When waiting for a contended spinlock we loop tightly for awhile, then
 * delay using pg_usleep() and try again.  Preferably, "awhile" should be a
 * small multiple of the maximum time we expect a spinlock to be held.  100
 * iterations seems about right as an initial guess.  However, on a
 * uniprocessor the loop is a waste of cycles, while in a multi-CPU scenario
 * it's usually better to spin a bit longer than to call the kernel, so we try
 * to adapt the spin loop count depending on whether we seem to be in a
 * uniprocessor or multiprocessor.
 */</span>
</code></pre></div></div>

<p>实际的自旋锁实现：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">s_lock</span><span class="p">(</span><span class="k">volatile</span> <span class="n">slock_t</span> <span class="o">*</span><span class="n">lock</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">file</span><span class="p">,</span> <span class="kt">int</span> <span class="n">line</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">func</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">SpinDelayStatus</span> <span class="n">delayStatus</span><span class="p">;</span>

	<span class="n">init_spin_delay</span><span class="p">(</span><span class="o">&amp;</span><span class="n">delayStatus</span><span class="p">,</span> <span class="n">file</span><span class="p">,</span> <span class="n">line</span><span class="p">,</span> <span class="n">func</span><span class="p">);</span>

	<span class="k">while</span> <span class="p">(</span><span class="n">TAS_SPIN</span><span class="p">(</span><span class="n">lock</span><span class="p">))</span>
	<span class="p">{</span>
		<span class="n">perform_spin_delay</span><span class="p">(</span><span class="o">&amp;</span><span class="n">delayStatus</span><span class="p">);</span>
	<span class="p">}</span>

	<span class="n">finish_spin_delay</span><span class="p">(</span><span class="o">&amp;</span><span class="n">delayStatus</span><span class="p">);</span>

	<span class="k">return</span> <span class="n">delayStatus</span><span class="p">.</span><span class="n">delays</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>PostgreSQL的设计哲学是：<strong>自旋锁保护的临界区代码</strong>应该<strong>极其短小</strong>，通常只是修改几个指针或标志位。因此，持有锁的时间预期只有几十个CPU指令周期。在这种情况下，<strong>“自旋等待”几乎总是比”睡眠唤醒”更快</strong>。</p>

<h3 id="当被误解的自旋锁遭遇更懒的内核">当”被误解”的自旋锁遭遇”更懒”的内核</h3>

<p>问题在于，PostgreSQL的自旋锁机制对内核的抢占行为有一个<strong>强烈的隐含假设</strong>：</p>

<blockquote>
  <p><strong>“我已经把临界区做得非常短了。因此，当我持有自旋锁时，请千万不要抢占我。让我赶紧执行完，释放锁，比让其他CPU上的几十个线程一起自旋空转要好得多。”</strong></p>
</blockquote>

<p>在旧的<code class="language-plaintext highlighter-rouge">PREEMPT_VOLUNTARY</code>模式下，内核”尊重”了这个假设。虽然理论上任何地方都可能被抢占，但实际情况是，由于临界区极短，在它内部触发抢占的概率微乎其微。</p>

<p>但在Kernel 7.0的<code class="language-plaintext highlighter-rouge">PREEMPT_LAZY</code>模式下，情况发生了根本性的变化。虽然临界区很短，但<strong>现在，持锁进程在释放锁之前，更有可能”撞上”时钟中断</strong>。</p>

<p>让我们一步步推演这个灾难场景：</p>

<ol>
  <li><strong>CPU 0</strong>上的进程A获得自旋锁L，开始执行临界区代码。</li>
  <li><strong>此时</strong>，由于某些原因（例如时间片即将用完，或有其他任务被唤醒），调度器为CPU 0设置了<code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED_LAZY</code>标志。</li>
  <li>进程A继续执行，它并不知道自己被标记了。它快速执行着临界区代码，眼看就要完成了。</li>
  <li><strong>然而</strong>，时钟中断发生了。Kernel 7.0的中断处理程序检查到惰性标志，并将其<strong>升级为紧急抢占标志</strong>。</li>
  <li><strong>内核执行抢占</strong>：进程A的上下文被保存，它被”踢出”CPU。而它<strong>手上还死死握着那把锁L</strong>。</li>
  <li>现在，其他CPU（如CPU 1, CPU 2, …）上的进程B、C、D想要获取锁L。它们执行<code class="language-plaintext highlighter-rouge">TAS</code>操作，发现锁被占用，于是<strong>开始自旋</strong>。</li>
  <li>这些进程在用户态疯狂地自旋、自旋、自旋……<strong>消耗着宝贵的CPU周期，却什么有用的工作都没做</strong>。</li>
  <li>进程A虽然被抢占了，但由于它持有锁，且可能优先级不高，调度器迟迟没有让它重新运行。</li>
  <li>最终，经过漫长的等待（对CPU而言），进程A被重新调度，释放了锁。但此时，整个系统的CPU时间已经被无意义的自旋消耗殆尽。</li>
</ol>

<p>下图展示了这个灾难性的时序：</p>

<pre><code class="language-mermaid">sequenceDiagram
    participant CPU0 as CPU 0 (进程A)
    participant Lock as 自旋锁L
    participant Sched as 调度器
    participant CPU1 as CPU 1 (进程B)
    participant CPU2 as CPU 2 (进程C)
    
    CPU0-&gt;&gt;Lock: 获取锁L
    activate Lock
    CPU0-&gt;&gt;CPU0: 执行临界区代码
    
    Note over Sched: 设置 TIF_NEED_RESCHED_LAZY
    Sched--&gt;&gt;CPU0: (标记，但不立即抢占)
    
    CPU0-&gt;&gt;CPU0: 继续执行临界区...
    
    Note over CPU0: 时钟中断到达！
    Sched-&gt;&gt;CPU0: 升级标志，强制抢占
    Note right of CPU0: 被换出 (仍持有锁L!)
    
    Note over CPU1,CPU2: 其他CPU上的进程尝试获取锁
    
    CPU1-&gt;&gt;Lock: TAS_SPIN(lock)
    Lock--&gt;&gt;CPU1: 失败 (锁被占用)
    CPU1-&gt;&gt;CPU1: 自旋等待...
    CPU1-&gt;&gt;CPU1: 自旋等待...
    
    CPU2-&gt;&gt;Lock: TAS_SPIN(lock)
    Lock--&gt;&gt;CPU2: 失败 (锁被占用)
    CPU2-&gt;&gt;CPU2: 自旋等待...
    CPU2-&gt;&gt;CPU2: 自旋等待...
    
    Note over CPU1,CPU2: CPU空转，浪费算力！
    
    CPU1-&gt;&gt;CPU1: 继续自旋...
    CPU2-&gt;&gt;CPU2: 继续自旋...
    
    Note over Sched,CPU0: 经过漫长等待...
    Sched-&gt;&gt;CPU0: 重新调度进程A
    CPU0-&gt;&gt;CPU0: 完成临界区
    CPU0-&gt;&gt;Lock: 释放锁L
    deactivate Lock
    
    CPU1-&gt;&gt;Lock: TAS_SPIN(lock)
    Lock--&gt;&gt;CPU1: 成功！
    activate Lock
    Note over CPU1,CPU2: 终于可以继续工作了
</code></pre>

<h2 id="四修复方案内核的设计立场与rseq时间片扩展">四、修复方案：内核的设计立场与RSEQ时间片扩展</h2>

<p>面对PostgreSQL的性能问题，调度器维护者Peter Zijlstra在commit <a href="https://github.com/torvalds/linux/commit/476e8583ca16"><code class="language-plaintext highlighter-rouge">476e8583ca16</code></a>中坚定地在x86架构上启用了PREEMPT_LAZY<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">8</a></sup>，提交信息非常简洁：</p>

<blockquote>
  <p>sched, x86: Enable Lazy preemption</p>

  <p>Add the TIF bit and select the Kconfig symbol to make it go.</p>
</blockquote>

<p>这一决定背后的设计理念可以从commit <a href="https://github.com/torvalds/linux/commit/7dadeaa6e851"><code class="language-plaintext highlighter-rouge">7dadeaa6e851</code></a>中看出<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>：引入<code class="language-plaintext highlighter-rouge">PREEMPT_LAZY</code>的核心目标是<strong>简化内核代码，最终移除所有<code class="language-plaintext highlighter-rouge">cond_resched()</code>调用</strong>。这是一个正确的技术方向，体现了内核社区的长期愿景：</p>

<ol>
  <li><strong>简化内核</strong>：消除内核代码中数百个启发式的<code class="language-plaintext highlighter-rouge">cond_resched()</code>检查点</li>
  <li><strong>统一调度</strong>：将抢占决策集中到调度器，而非分散在代码各处</li>
  <li><strong>明确责任</strong>：如果用户空间程序依赖特定的抢占行为来保证性能，应该通过显式的内核接口来声明需求，而非依赖隐式假设</li>
</ol>

<h3 id="官方解决方案让postgresql使用rseq时间片扩展">官方解决方案：让PostgreSQL使用RSEQ时间片扩展</h3>

<p>Peter Zijlstra和Thomas Gleixner给出的解决方案是：<strong>让PostgreSQL使用Kernel 7.0中新增的RSEQ（Restartable Sequences）时间片扩展功能</strong><sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">9</a></sup>。</p>

<p><strong>什么是RSEQ？</strong> RSEQ是一种允许用户空间程序与内核安全地协作，执行一系列原子操作的机制。</p>

<p><strong>时间片扩展是什么？</strong> 这是Thomas Gleixner在2025年12月提交的一系列补丁引入的新特性。在commit <a href="https://github.com/torvalds/linux/commit/d7a5da7a0f7f"><code class="language-plaintext highlighter-rouge">d7a5da7a0f7f</code></a>的用户空间API文档中，明确说明了其目的<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">10</a></sup>：</p>

<blockquote>
  <p>This allows a thread to request a time slice extension when it enters a
critical section to avoid contention on a resource when the thread is
scheduled out inside of the critical section.</p>
</blockquote>

<p>这正是为了解决像PostgreSQL这样的应用在持锁期间被抢占导致的性能问题而设计的！</p>

<p>Linux内核在<a href="https://github.com/torvalds/linux/blob/master/include/uapi/linux/rseq.h"><code class="language-plaintext highlighter-rouge">include/uapi/linux/rseq.h</code></a>中定义了相关接口<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote" rel="footnote">11</a></sup>：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/**
 * rseq_slice_ctrl - Time slice extension control structure
 * ...
 */</span>
<span class="k">struct</span> <span class="n">rseq_slice_ctrl</span> <span class="p">{</span>
	<span class="k">union</span> <span class="p">{</span>
		<span class="n">__u32</span>		<span class="n">all</span><span class="p">;</span>
		<span class="k">struct</span> <span class="p">{</span>
			<span class="n">__u8</span>	<span class="n">request</span><span class="p">;</span>
			<span class="n">__u8</span>	<span class="n">granted</span><span class="p">;</span>
			<span class="n">__u16</span>	<span class="n">__reserved</span><span class="p">;</span>
		<span class="p">};</span>
	<span class="p">};</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="n">rseq</span> <span class="p">{</span>
	<span class="c1">// ...</span>
	<span class="k">struct</span> <span class="n">rseq_slice_ctrl</span> <span class="n">slice_ctrl</span><span class="p">;</span>
	<span class="c1">// ...</span>
<span class="p">};</span>
</code></pre></div></div>

<p>其效果是：<strong>当该线程持有关键锁（即处于RSEQ临界区）时，内核调度器将暂时”无视”针对它的惰性抢占标志，不会在时钟中断时强行抢占它</strong>。这相当于PostgreSQL向内核宣告：”给我几十微秒，我马上就完事，别打断我。”</p>

<p>这完美地解决了我们之前分析的”持锁被抢”的困境。PostgreSQL可以获得它梦寐以求的”短时不可抢占”保证，同时内核也可以继续朝着更简洁、更统一的调度架构演进。</p>

<p>PostgreSQL社区需要在其代码中集成RSEQ时间片扩展的支持。这需要修改PostgreSQL锁管理器（<code class="language-plaintext highlighter-rouge">s_lock.c</code>）的实现，在获取自旋锁前请求时间片扩展，释放锁后清除请求，从而避免在持锁期间被抢占。</p>

<h3 id="如何使用rseq时间片扩展">如何使用RSEQ时间片扩展</h3>

<p>根据内核文档<sup id="fnref:14:1" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">10</a></sup>，应用程序需要按以下步骤启用这个功能：</p>

<ol>
  <li><strong>注册RSEQ</strong>：通过<code class="language-plaintext highlighter-rouge">rseq()</code>系统调用注册一个用户空间内存区域</li>
  <li><strong>启用时间片扩展</strong>：通过<code class="language-plaintext highlighter-rouge">prctl()</code>启用该功能：</li>
</ol>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">prctl</span><span class="p">(</span><span class="n">PR_RSEQ_SLICE_EXTENSION</span><span class="p">,</span> <span class="n">PR_RSEQ_SLICE_EXTENSION_SET</span><span class="p">,</span>
      <span class="n">PR_RSEQ_SLICE_EXT_ENABLE</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</code></pre></div></div>

<ol>
  <li><strong>请求扩展</strong>：在进入临界区前，在<code class="language-plaintext highlighter-rouge">rseq-&gt;slice_ctrl.request</code>字段设置请求位</li>
  <li><strong>检查授权</strong>：内核会在<code class="language-plaintext highlighter-rouge">rseq-&gt;slice_ctrl.granted</code>字段返回是否授权</li>
</ol>

<p>下图展示了RSEQ时间片扩展的完整工作流程：</p>

<pre><code class="language-mermaid">sequenceDiagram
    participant App as 用户态应用(PostgreSQL)
    participant RSEQ as RSEQ结构(共享内存)
    participant Kernel as 内核调度器
    participant Timer as 时钟中断
    
    Note over App,Kernel: 初始化阶段
    App-&gt;&gt;Kernel: rseq() 系统调用
    Kernel--&gt;&gt;RSEQ: 分配共享内存区域
    App-&gt;&gt;Kernel: prctl(PR_RSEQ_SLICE_EXTENSION_SET)
    Kernel--&gt;&gt;App: 启用成功
    
    Note over App,Timer: 运行时：进入临界区
    App-&gt;&gt;RSEQ: 设置 slice_ctrl.request = 1
    App-&gt;&gt;App: 获取自旋锁
    App-&gt;&gt;App: 执行临界区代码...
    
    Note over Kernel,Timer: 调度压力出现
    Kernel-&gt;&gt;Kernel: 设置 TIF_NEED_RESCHED_LAZY
    Timer-&gt;&gt;Kernel: 时钟中断到达
    
    Kernel-&gt;&gt;RSEQ: 检查 slice_ctrl.request
    alt 请求有效且无其他待处理工作
        Kernel-&gt;&gt;RSEQ: 设置 slice_ctrl.granted = 1
        Kernel-&gt;&gt;Kernel: 忽略抢占，允许继续运行
        Note over Kernel: 授予时间片扩展
    else 有待处理工作或其他条件不满足
        Kernel-&gt;&gt;RSEQ: 拒绝请求 (granted = 0)
        Kernel-&gt;&gt;App: 执行抢占
    end
    
    Note over App,Timer: 完成临界区
    App-&gt;&gt;App: 释放自旋锁
    App-&gt;&gt;RSEQ: 清除 slice_ctrl.request
    RSEQ--&gt;&gt;Kernel: 下次时钟中断时清除 granted
</code></pre>

<p>这个机制的核心实现在commit <a href="https://github.com/torvalds/linux/commit/dfb630f548a7"><code class="language-plaintext highlighter-rouge">dfb630f548a7</code></a>中，由Thomas Gleixner详细说明了授权决策过程<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote" rel="footnote">12</a></sup>：只有在从中断返回用户态、且没有其他待处理工作（如信号）时，才会授予时间片扩展。</p>

<h2 id="五postgresql与linux内核的协作历史numa案例">五、PostgreSQL与Linux内核的协作历史：NUMA案例</h2>

<p>有趣的是，PostgreSQL和Linux内核之间的互动并非总是冲突。一个很好的协作案例发生在2025年，当时PostgreSQL 18引入了新的NUMA内省功能。</p>

<p>在开发过程中，PostgreSQL开发者发现了Linux内核中<code class="language-plaintext highlighter-rouge">do_pages_stat()</code>函数的一个长期存在的bug（自2010年起）<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">13</a></sup>。这个bug影响所有在64位内核上运行32位用户空间的系统。PostgreSQL开发者Christoph Berg提交了内核修复<a href="https://github.com/torvalds/linux/commit/10d04c26ab2b"><code class="language-plaintext highlighter-rouge">10d04c26ab2b</code></a>：</p>

<blockquote>
  <p>Discovered while working on PostgreSQL 18’s new NUMA introspection code.</p>

  <p>For arrays with more than 16 entries, the old code would incorrectly
advance the pages pointer by 16 words instead of 16 compat_uptr_t.</p>
</blockquote>

<p>同时，PostgreSQL也在自己的代码中实现了规避措施，在commit <a href="https://github.com/postgres/postgres/commit/7fe2f67c7c9"><code class="language-plaintext highlighter-rouge">7fe2f67c7c9</code></a>中限制了<code class="language-plaintext highlighter-rouge">numa_move_pages</code>请求的大小<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">14</a></sup>：</p>

<blockquote>
  <p>This is a long-standing kernel bug (since 2010), affecting pretty much
all kernels, so it’ll take time until all systems get a fixed kernel.
Luckily, we can work around the issue by chunking the requests the same
way do_pages_stat() does, at least on affected systems.</p>
</blockquote>

<p>这个案例展示了<strong>开源项目之间健康的协作模式</strong>：发现问题后，同时修复内核bug并在应用层实现兼容性处理，确保在旧内核上也能正常工作。</p>

<h2 id="六总结与展望一次痛苦的蜕变">六、总结与展望：一次痛苦的蜕变</h2>

<p>Kernel 7.0与PostgreSQL的这次”冲突”，并非谁的错，而是计算机系统设计中的一个经典矛盾：<strong>通用操作系统的演进 vs. 特定领域应用的极致优化</strong>。</p>

<ul>
  <li><strong>对Linux而言</strong>：<code class="language-plaintext highlighter-rouge">PREEMPT_LAZY</code>是一次勇敢的”自我简化”手术。它摒弃了历史包袱，为未来几十年的调度器发展奠定了基础<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">15</a></sup>。尽管短期内带来了阵痛，但方向是正确的。</li>
  <li><strong>对PostgreSQL而言</strong>：这次事件是一次警醒。它揭示了自己过去一直依赖的”在<code class="language-plaintext highlighter-rouge">PREEMPT_VOLUNTARY</code>下不会被抢占”的假设，其实只是一个美丽而脆弱的巧合。拥抱RSEQ等新内核机制，将使其性能模型更加健壮和可移植。</li>
</ul>

<p>这次性能腰斩事件，本质上是<strong>两个高度复杂的系统在”无锁化”和”抢占”的边缘地带，发生的一次深刻的碰撞</strong>。它再次证明了一个朴素的真理：在系统软件的世界里，没有银弹。每一个看似微小的”优化”，都可能在其他地方掀起惊涛骇浪。而解决之道，不在于互相指责和回退，而在于更深层次的<strong>协作与适配</strong>。</p>

<p>最终，一个更简洁、更强大的Linux内核，和一个更健壮、更高效的PostgreSQL，都将从这个痛苦的蜕变中诞生。</p>

<hr />

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>LKML讨论：《Re: [PATCH v3 00/20] sched: EEVDF and latency-nice and/or slice-attr》，讨论了抢占模型变化对数据库工作负载的影响。参见：<a href="https://lkml.kernel.org/r/20241007075055.555778919@infradead.org">https://lkml.kernel.org/r/20241007075055.555778919@infradead.org</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>Peter Zijlstra，Linux内核提交 <a href="https://github.com/torvalds/linux/commit/7dadeaa6e851"><code class="language-plaintext highlighter-rouge">7dadeaa6e851</code></a> — <em>sched: Further restrict the preemption modes</em>。详细说明了引入PREEMPT_LAZY的三个核心原因，以及为何限制PREEMPT_NONE和PREEMPT_VOLUNTARY。完整提交信息：<a href="https://patch.msgid.link/20251219101502.GB1132199@noisy.programming.kicks-ass.net">https://patch.msgid.link/20251219101502.GB1132199@noisy.programming.kicks-ass.net</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>Linux内核文档，《preempt-locking.rst》，详细说明了内核抢占模型的演化和<code class="language-plaintext highlighter-rouge">cond_resched()</code>的使用。参见：<a href="https://github.com/torvalds/linux/blob/master/Documentation/locking/preempt-locking.rst">Documentation/locking/preempt-locking.rst</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Paul E. McKenney，Linux内核提交 <a href="https://github.com/torvalds/linux/commit/78c2ce0fd6dd"><code class="language-plaintext highlighter-rouge">78c2ce0fd6dd</code></a> — <em>scftorture: Update due to x86 not supporting none/voluntary preemption</em>。明确说明”As of v7.0-rc1, architectures that support preemption, including x86 and arm64, no longer support CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY.” 链接：<a href="https://patch.msgid.link/20260303235903.1967409-4-paulmck@kernel.org">https://patch.msgid.link/20260303235903.1967409-4-paulmck@kernel.org</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>Peter Zijlstra，Linux内核提交 <a href="https://github.com/torvalds/linux/commit/26baa1f1c4bd"><code class="language-plaintext highlighter-rouge">26baa1f1c4bd</code></a> — <em>sched: Add TIF_NEED_RESCHED_LAZY infrastructure</em>。说明：”Add the basic infrastructure to split the TIF_NEED_RESCHED bit in two.” 链接：<a href="https://lkml.kernel.org/r/20241007075055.219540785@infradead.org">https://lkml.kernel.org/r/20241007075055.219540785@infradead.org</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p>Peter Zijlstra，Linux内核提交 <a href="https://github.com/torvalds/linux/commit/7c70cb94d29c"><code class="language-plaintext highlighter-rouge">7c70cb94d29c</code></a> — <em>sched: Add Lazy preemption model</em>。说明：”This LAZY bit will be promoted to the full NEED_RESCHED bit on tick. As such, the average delay between setting LAZY and actually rescheduling will be TICK_NSEC/2.” 链接：<a href="https://lkml.kernel.org/r/20241007075055.331243614@infradead.org">https://lkml.kernel.org/r/20241007075055.331243614@infradead.org</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p>PostgreSQL源码 <a href="https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/s_lock.c"><code class="language-plaintext highlighter-rouge">src/backend/storage/lmgr/s_lock.c</code></a> — 自旋锁的实现，包括<code class="language-plaintext highlighter-rouge">s_lock()</code>函数和相关注释，说明了为何选择自旋而非立即睡眠。 <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:12" role="doc-endnote">
      <p>Peter Zijlstra，Linux内核提交 <a href="https://github.com/torvalds/linux/commit/476e8583ca16"><code class="language-plaintext highlighter-rouge">476e8583ca16</code></a> — <em>sched, x86: Enable Lazy preemption</em>。在x86架构上启用PREEMPT_LAZY的关键提交。链接：<a href="https://lkml.kernel.org/r/20241007075055.555778919@infradead.org">https://lkml.kernel.org/r/20241007075055.555778919@infradead.org</a> <a href="#fnref:12" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:13" role="doc-endnote">
      <p>LKML patch series：《[PATCH 00/14] Restartable Sequences: selftests, time-slice extension》，Thomas Gleixner提出RSEQ时间片扩展机制，共14个补丁。链接：<a href="https://lkml.kernel.org/r/20251215155615.870031952@linutronix.de">https://lkml.kernel.org/r/20251215155615.870031952@linutronix.de</a> <a href="#fnref:13" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:14" role="doc-endnote">
      <p>Thomas Gleixner，Linux内核提交 <a href="https://github.com/torvalds/linux/commit/d7a5da7a0f7f"><code class="language-plaintext highlighter-rouge">d7a5da7a0f7f</code></a> — <em>rseq: Add fields and constants for time slice extension</em>。在用户空间API文档（<code class="language-plaintext highlighter-rouge">Documentation/userspace-api/rseq.rst</code>）中说明：”This allows a thread to request a time slice extension when it enters a critical section to avoid contention on a resource when the thread is scheduled out inside of the critical section.” 链接：<a href="https://patch.msgid.link/20251215155708.669472597@linutronix.de">https://patch.msgid.link/20251215155708.669472597@linutronix.de</a> <a href="#fnref:14" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:14:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:15" role="doc-endnote">
      <p>Linux内核UAPI头文件 <a href="https://github.com/torvalds/linux/blob/master/include/uapi/linux/rseq.h"><code class="language-plaintext highlighter-rouge">include/uapi/linux/rseq.h</code></a> — 定义了<code class="language-plaintext highlighter-rouge">struct rseq_slice_ctrl</code>和相关的RSEQ时间片扩展接口。相关的<code class="language-plaintext highlighter-rouge">prctl()</code>接口定义在commit <a href="https://github.com/torvalds/linux/commit/28621ec2d46c"><code class="language-plaintext highlighter-rouge">28621ec2d46c</code></a>中。 <a href="#fnref:15" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:16" role="doc-endnote">
      <p>Thomas Gleixner，Linux内核提交 <a href="https://github.com/torvalds/linux/commit/dfb630f548a7"><code class="language-plaintext highlighter-rouge">dfb630f548a7</code></a> — <em>rseq: Implement rseq_grant_slice_extension()</em>。详细说明了时间片扩展的授权决策逻辑：”The decision is made in two stages. First an inline quick check to avoid going into the actual decision function.” 链接：<a href="https://patch.msgid.link/20251215155709.195303303@linutronix.de">https://patch.msgid.link/20251215155709.195303303@linutronix.de</a> <a href="#fnref:16" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p>Christoph Berg，Linux内核提交 <a href="https://github.com/torvalds/linux/commit/10d04c26ab2b"><code class="language-plaintext highlighter-rouge">10d04c26ab2b</code></a> — <em>mm/migrate: fix do_pages_stat in compat mode</em>。说明：”Discovered while working on PostgreSQL 18’s new NUMA introspection code.” 修复了一个自2010年以来的内核bug。链接：<a href="https://lkml.kernel.org/r/aGREU0XTB48w9CwN@msg.df7cb.de">https://lkml.kernel.org/r/aGREU0XTB48w9CwN@msg.df7cb.de</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p>Tomas Vondra，PostgreSQL提交 <a href="https://github.com/postgres/postgres/commit/7fe2f67c7c9"><code class="language-plaintext highlighter-rouge">7fe2f67c7c9</code></a> — <em>Limit the size of numa_move_pages requests</em>。PostgreSQL侧对内核bug的规避措施。讨论：<a href="https://postgr.es/m/aEtDozLmtZddARdB@msg.df7cb.de">https://postgr.es/m/aEtDozLmtZddARdB@msg.df7cb.de</a> <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p>Linux内核UAPI头文件 <a href="https://github.com/torvalds/linux/blob/master/include/uapi/linux/rseq.h"><code class="language-plaintext highlighter-rouge">include/uapi/linux/rseq.h</code></a> — 定义了<code class="language-plaintext highlighter-rouge">struct rseq_slice_ctrl</code>和相关的RSEQ时间片扩展接口。 <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Linux内核调度的时钟心跳：定时器中断、抢占与实时性的权衡</title><link href="https://weinan.io/2026/04/07/linux-kernel-scheduling-timer-interrupt-preemption.html" rel="alternate" type="text/html" title="Linux内核调度的时钟心跳：定时器中断、抢占与实时性的权衡" /><published>2026-04-07T00:00:00+00:00</published><updated>2026-04-07T00:00:00+00:00</updated><id>https://weinan.io/2026/04/07/linux-kernel-scheduling-timer-interrupt-preemption</id><content type="html" xml:base="https://weinan.io/2026/04/07/linux-kernel-scheduling-timer-interrupt-preemption.html"><![CDATA[<style>
/* Mermaid diagram container */
.mermaid-container {
  position: relative;
  display: block;
  cursor: pointer;
  transition: opacity 0.2s;
  max-width: 100%;
  margin: 20px 0;
  overflow-x: auto;
}

.mermaid-container:hover {
  opacity: 0.8;
}

.mermaid-container svg {
  max-width: 100%;
  height: auto;
  display: block;
}

/* Modal overlay */
.mermaid-modal {
  display: none;
  position: fixed;
  z-index: 9999;
  left: 0;
  top: 0;
  width: 100%;
  height: 100%;
  background-color: rgba(0, 0, 0, 0.9);
  animation: fadeIn 0.3s;
}

.mermaid-modal.active {
  display: flex;
  align-items: center;
  justify-content: center;
}

@keyframes fadeIn {
  from { opacity: 0; }
  to { opacity: 1; }
}

/* Modal content */
.mermaid-modal-content {
  position: relative;
  width: 90vw;
  height: 90vh;
  overflow: hidden;
  background: white;
  padding: 20px;
  border-radius: 8px;
  box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
  display: flex;
  align-items: center;
  justify-content: center;
}

.mermaid-modal-diagram {
  transform-origin: center center;
  transition: transform 0.2s ease;
  display: inline-block;
  min-width: 100%;
  cursor: grab;
  user-select: none;
}

.mermaid-modal-diagram.dragging {
  cursor: grabbing;
  transition: none;
}

.mermaid-modal-diagram svg {
  width: 100%;
  height: auto;
  display: block;
  pointer-events: none;
}

/* Control buttons */
.mermaid-controls {
  position: absolute;
  top: 10px;
  right: 10px;
  display: flex;
  gap: 8px;
  z-index: 10000;
}

.mermaid-btn {
  background: rgba(255, 255, 255, 0.9);
  border: 1px solid #ddd;
  border-radius: 4px;
  padding: 8px 12px;
  cursor: pointer;
  font-size: 14px;
  transition: background 0.2s;
  color: #333;
  font-weight: 500;
}

.mermaid-btn:hover {
  background: white;
  box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}

/* Close button */
.mermaid-close {
  background: #f44336;
  color: white;
  border: none;
}

.mermaid-close:hover {
  background: #d32f2f;
}

/* Zoom indicator */
.mermaid-zoom-level {
  position: absolute;
  bottom: 20px;
  left: 20px;
  background: rgba(0, 0, 0, 0.7);
  color: white;
  padding: 6px 12px;
  border-radius: 4px;
  font-size: 14px;
  z-index: 10000;
}
</style>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';

  mermaid.initialize({
    startOnLoad: false,
    theme: 'default',
    securityLevel: 'loose',
    htmlLabels: true,
    themeVariables: {
      fontSize: '14px'
    }
  });

  let currentZoom = 1;
  let currentModal = null;
  let isDragging = false;
  let startX = 0;
  let startY = 0;
  let translateX = 0;
  let translateY = 0;

  // Create modal HTML
  function createModal() {
    const modal = document.createElement('div');
    modal.className = 'mermaid-modal';
    modal.innerHTML = `
      <div class="mermaid-controls">
        <button class="mermaid-btn zoom-in">放大 +</button>
        <button class="mermaid-btn zoom-out">缩小 -</button>
        <button class="mermaid-btn zoom-reset">重置</button>
        <button class="mermaid-btn mermaid-close">关闭 ✕</button>
      </div>
      <div class="mermaid-modal-content">
        <div class="mermaid-modal-diagram"></div>
      </div>
      <div class="mermaid-zoom-level">100%</div>
    `;
    document.body.appendChild(modal);
    return modal;
  }

  // Show modal with diagram
  function showModal(diagramContent) {
    if (!currentModal) {
      currentModal = createModal();
      setupModalEvents();
    }

    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.innerHTML = diagramContent;

    // Remove any width/height attributes from SVG to make it responsive
    const svg = modalDiagram.querySelector('svg');
    if (svg) {
      svg.removeAttribute('width');
      svg.removeAttribute('height');
      svg.style.width = '100%';
      svg.style.height = 'auto';
    }

    // Setup drag functionality
    setupDrag(modalDiagram);

    currentModal.classList.add('active');
    currentZoom = 1;
    resetPosition();
    updateZoom();
    document.body.style.overflow = 'hidden';
  }

  // Hide modal
  function hideModal() {
    if (currentModal) {
      currentModal.classList.remove('active');
      document.body.style.overflow = '';
    }
  }

  // Update zoom level
  function updateZoom() {
    if (!currentModal) return;
    const diagram = currentModal.querySelector('.mermaid-modal-diagram');
    const zoomLevel = currentModal.querySelector('.mermaid-zoom-level');
    diagram.style.transform = `translate(${translateX}px, ${translateY}px) scale(${currentZoom})`;
    zoomLevel.textContent = `${Math.round(currentZoom * 100)}%`;
  }

  // Reset position when zoom changes
  function resetPosition() {
    translateX = 0;
    translateY = 0;
  }

  // Setup drag functionality
  function setupDrag(element) {
    element.addEventListener('mousedown', startDrag);
    element.addEventListener('touchstart', startDrag);
  }

  function startDrag(e) {
    if (e.type === 'mousedown' && e.button !== 0) return; // Only left click

    isDragging = true;
    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.classList.add('dragging');

    if (e.type === 'touchstart') {
      startX = e.touches[0].clientX - translateX;
      startY = e.touches[0].clientY - translateY;
    } else {
      startX = e.clientX - translateX;
      startY = e.clientY - translateY;
    }

    document.addEventListener('mousemove', drag);
    document.addEventListener('touchmove', drag);
    document.addEventListener('mouseup', stopDrag);
    document.addEventListener('touchend', stopDrag);
  }

  function drag(e) {
    if (!isDragging) return;
    e.preventDefault();

    if (e.type === 'touchmove') {
      translateX = e.touches[0].clientX - startX;
      translateY = e.touches[0].clientY - startY;
    } else {
      translateX = e.clientX - startX;
      translateY = e.clientY - startY;
    }

    updateZoom();
  }

  function stopDrag() {
    isDragging = false;
    const modalDiagram = currentModal?.querySelector('.mermaid-modal-diagram');
    if (modalDiagram) {
      modalDiagram.classList.remove('dragging');
    }
    document.removeEventListener('mousemove', drag);
    document.removeEventListener('touchmove', drag);
    document.removeEventListener('mouseup', stopDrag);
    document.removeEventListener('touchend', stopDrag);
  }

  // Setup modal event listeners
  function setupModalEvents() {
    if (!currentModal) return;

    // Close button
    currentModal.querySelector('.mermaid-close').addEventListener('click', hideModal);

    // Zoom buttons
    currentModal.querySelector('.zoom-in').addEventListener('click', () => {
      currentZoom = Math.min(currentZoom + 0.25, 3);
      updateZoom();
    });

    currentModal.querySelector('.zoom-out').addEventListener('click', () => {
      currentZoom = Math.max(currentZoom - 0.25, 0.5);
      updateZoom();
    });

    currentModal.querySelector('.zoom-reset').addEventListener('click', () => {
      currentZoom = 1;
      resetPosition();
      updateZoom();
    });

    // Close on background click
    currentModal.addEventListener('click', (e) => {
      if (e.target === currentModal) {
        hideModal();
      }
    });

    // Close on ESC key
    document.addEventListener('keydown', (e) => {
      if (e.key === 'Escape' && currentModal.classList.contains('active')) {
        hideModal();
      }
    });
  }

  // Convert Jekyll-rendered code blocks to mermaid divs
  document.addEventListener('DOMContentLoaded', async function() {
    const codeBlocks = document.querySelectorAll('code.language-mermaid');

    for (const codeBlock of codeBlocks) {
      const pre = codeBlock.parentElement;
      const container = document.createElement('div');
      container.className = 'mermaid-container';

      const mermaidDiv = document.createElement('div');
      mermaidDiv.className = 'mermaid';
      mermaidDiv.textContent = codeBlock.textContent;

      container.appendChild(mermaidDiv);
      pre.replaceWith(container);
    }

    // Render all mermaid diagrams
    try {
      await mermaid.run({
        querySelector: '.mermaid'
      });
      console.log('Mermaid diagrams rendered successfully');
    } catch (error) {
      console.error('Mermaid rendering error:', error);
    }

    // Add click handlers to rendered diagrams
    document.querySelectorAll('.mermaid-container').forEach((container, index) => {
      // Find the rendered SVG inside the container
      const svg = container.querySelector('svg');
      if (!svg) {
        console.warn(`No SVG found in container ${index}`);
        return;
      }

      // Make the container clickable
      container.style.cursor = 'pointer';
      container.title = '点击查看大图';

      container.addEventListener('click', function(e) {
        e.preventDefault();
        e.stopPropagation();

        // Clone the SVG for the modal
        const svgClone = svg.cloneNode(true);
        const tempDiv = document.createElement('div');
        tempDiv.appendChild(svgClone);

        console.log('Opening modal for diagram', index);
        showModal(tempDiv.innerHTML);
      });

      console.log(`Click handler added to diagram ${index}`);
    });
  });
</script>

<blockquote>
  <p>内核如何决定”现在该换谁运行了”？</p>
</blockquote>

<h2 id="引言操作系统的心跳">引言：操作系统的”心跳”</h2>

<p>当你的Linux系统同时运行着数百个进程，内核是如何决定在什么时刻暂停一个任务、让另一个任务运行的？这个看似简单的问题，背后隐藏着操作系统设计中最核心的权衡：<strong>公平性 vs. 实时性</strong>，<strong>吞吐量 vs. 响应延迟</strong>。</p>

<p>答案的关键在于一个持续跳动的”心跳”——<strong>定时器中断（Timer Interrupt）</strong>。它就像一个永不停歇的闹钟，每隔几毫秒就提醒内核：”该检查一下，是不是要换个任务运行了？”</p>

<p>但这只是故事的一部分。本文将深入Linux内核的调度子系统，揭示定时器中断在任务调度中的真实角色，以及它与抢占机制、实时操作系统的微妙关系。</p>

<h2 id="一定时器中断调度的驱动力还是可选项">一、定时器中断：调度的驱动力还是可选项？</h2>

<h3 id="11-传统观点定时器中断是调度的核心">1.1 传统观点：定时器中断是调度的核心</h3>

<p>在经典的操作系统教科书中，任务调度的基本模型是这样的：</p>

<pre><code class="language-mermaid">sequenceDiagram
    participant HW as 硬件定时器
    participant IRQ as 中断控制器
    participant Kernel as 内核调度器
    participant Task as 当前任务
    
    Note over HW: 每隔固定时间(如1ms)
    HW-&gt;&gt;IRQ: 产生定时器中断
    IRQ-&gt;&gt;Kernel: 触发中断处理程序
    Kernel-&gt;&gt;Kernel: scheduler_tick()
    Kernel-&gt;&gt;Kernel: 检查时间片是否用完
    alt 时间片用完
        Kernel-&gt;&gt;Task: 设置 TIF_NEED_RESCHED
        Kernel-&gt;&gt;Kernel: 触发任务切换
    else 继续运行
        Kernel-&gt;&gt;Task: 返回继续执行
    end
</code></pre>

<p>这个模型在Linux早期版本（以及许多教学用的简化内核）中是准确的：</p>

<ol>
  <li><strong>硬件定时器</strong>（如x86的PIT或APIC timer）每隔固定时间（称为一个”tick”，通常是1ms或10ms）产生中断</li>
  <li>内核的<strong>时钟中断处理程序</strong>被调用</li>
  <li>调度器检查当前任务的<strong>时间片（time slice）</strong>是否用完</li>
  <li>如果用完，设置”需要重新调度”标志，在中断返回时触发任务切换</li>
</ol>

<h3 id="12-现代linuxtickless与动态时钟">1.2 现代Linux：Tickless与动态时钟</h3>

<p>然而，现代Linux内核（特别是启用了<strong>CONFIG_NO_HZ_FULL</strong>的系统）引入了”tickless”模式<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>，彻底改变了这个模型：</p>

<p><strong>传统模式（HZ=1000）</strong>：即使CPU完全空闲，每秒也会产生1000次定时器中断。</p>

<p><strong>Tickless模式</strong>：当CPU只运行一个任务且没有定时器到期时，内核会<strong>完全停止周期性的时钟中断</strong>，只在以下情况下设置定时器：</p>
<ul>
  <li>有定时器事件需要处理</li>
  <li>调度器需要检查任务状态</li>
  <li>RCU需要进行grace period处理</li>
</ul>

<p>这意味着：<strong>定时器中断不是调度的必要条件，而是一种优化手段</strong>。</p>

<pre><code class="language-mermaid">graph LR
    subgraph "传统定时器模式 (CONFIG_HZ_PERIODIC)"
        A[1ms] --&gt; B[中断]
        B --&gt; C[1ms]
        C --&gt; D[中断]
        D --&gt; E[1ms]
        E --&gt; F[中断]
    end
    
    subgraph "Tickless模式 (CONFIG_NO_HZ_FULL)"
        G[运行中...] -.-&gt;|仅在需要时| H[中断]
        H -.-&gt;|可能很长时间| I[中断]
    end
    
    style B fill:#FF6347
    style D fill:#FF6347
    style F fill:#FF6347
    style H fill:#87CEEB
    style I fill:#87CEEB
</code></pre>

<h3 id="13-那么调度到底在哪里发生">1.3 那么，调度到底在哪里发生？</h3>

<p>Linux内核中，任务切换（调用<code class="language-plaintext highlighter-rouge">schedule()</code>函数）可以在以下几个<strong>调度点（scheduling point）</strong>发生：</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">调度点</th>
      <th style="text-align: left">触发条件</th>
      <th style="text-align: left">是否依赖定时器中断</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>中断返回</strong></td>
      <td style="text-align: left">从任何中断（包括时钟中断）返回用户态时，检查<code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED</code>标志</td>
      <td style="text-align: left">部分依赖</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>系统调用返回</strong></td>
      <td style="text-align: left">系统调用结束返回用户态前</td>
      <td style="text-align: left">不依赖</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>主动调用</strong></td>
      <td style="text-align: left">任务调用<code class="language-plaintext highlighter-rouge">schedule()</code>、<code class="language-plaintext highlighter-rouge">yield()</code>或阻塞在I/O上</td>
      <td style="text-align: left">不依赖</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>抢占点</strong></td>
      <td style="text-align: left">内核代码中的<code class="language-plaintext highlighter-rouge">preempt_enable()</code>或<code class="language-plaintext highlighter-rouge">cond_resched()</code></td>
      <td style="text-align: left">不依赖</td>
    </tr>
  </tbody>
</table>

<p><strong>结论</strong>：定时器中断<strong>不是唯一的调度驱动力</strong>，但它是保证<strong>公平性和防止任务饿死</strong>的关键机制。</p>

<h2 id="二深入内核代码定时器中断如何触发调度">二、深入内核代码：定时器中断如何触发调度</h2>

<h3 id="21-时钟中断的处理路径">2.1 时钟中断的处理路径</h3>

<p>在Linux内核中，时钟中断的处理流程如下（以x86-64为例）：</p>

<p><strong>硬件中断 → 中断处理程序 → 调度器检查</strong></p>

<p>关键函数调用链（基于Linux 6.x/7.x）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// arch/x86/kernel/time.c - 时钟中断入口</span>
<span class="kt">void</span> <span class="n">__irq_entry</span> <span class="nf">smp_apic_timer_interrupt</span><span class="p">(</span><span class="k">struct</span> <span class="n">pt_regs</span> <span class="o">*</span><span class="n">regs</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">entering_irq</span><span class="p">();</span>
    <span class="n">trace_local_timer_entry</span><span class="p">(</span><span class="n">LOCAL_TIMER_VECTOR</span><span class="p">);</span>
    <span class="n">local_apic_timer_interrupt</span><span class="p">();</span>  <span class="c1">// 处理本地APIC定时器</span>
    <span class="n">trace_local_timer_exit</span><span class="p">(</span><span class="n">LOCAL_TIMER_VECTOR</span><span class="p">);</span>
    <span class="n">exiting_irq</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<p>调用链继续：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>local_apic_timer_interrupt()
  └─&gt; tick_handle_periodic() 或 hrtimer_interrupt()  // 取决于是否启用高精度定时器
      └─&gt; update_process_times()
          └─&gt; scheduler_tick()  // 调度器的时钟处理函数
</code></pre></div></div>

<h3 id="22-调度器的时钟心跳scheduler_tick">2.2 调度器的时钟心跳：<code class="language-plaintext highlighter-rouge">scheduler_tick()</code></h3>

<p>这是调度器在每个时钟中断中被调用的核心函数，定义在<a href="https://github.com/torvalds/linux/blob/master/kernel/sched/core.c"><code class="language-plaintext highlighter-rouge">kernel/sched/core.c</code></a>中<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*
 * This function gets called by the timer code, with HZ frequency.
 * We call it with interrupts disabled.
 */</span>
<span class="kt">void</span> <span class="nf">scheduler_tick</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">cpu</span> <span class="o">=</span> <span class="n">smp_processor_id</span><span class="p">();</span>
    <span class="k">struct</span> <span class="n">rq</span> <span class="o">*</span><span class="n">rq</span> <span class="o">=</span> <span class="n">cpu_rq</span><span class="p">(</span><span class="n">cpu</span><span class="p">);</span>
    <span class="k">struct</span> <span class="n">task_struct</span> <span class="o">*</span><span class="n">curr</span> <span class="o">=</span> <span class="n">rq</span><span class="o">-&gt;</span><span class="n">curr</span><span class="p">;</span>
    
    <span class="c1">// 更新运行队列时钟</span>
    <span class="n">update_rq_clock</span><span class="p">(</span><span class="n">rq</span><span class="p">);</span>
    
    <span class="c1">// 调用当前调度类的 task_tick 方法</span>
    <span class="n">curr</span><span class="o">-&gt;</span><span class="n">sched_class</span><span class="o">-&gt;</span><span class="n">task_tick</span><span class="p">(</span><span class="n">rq</span><span class="p">,</span> <span class="n">curr</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    
    <span class="c1">// 检查是否需要触发负载均衡</span>
    <span class="n">trigger_load_balance</span><span class="p">(</span><span class="n">rq</span><span class="p">);</span>
    
    <span class="c1">// ... 其他统计和处理</span>
<span class="p">}</span>
</code></pre></div></div>

<p>关键点：</p>
<ol>
  <li><strong>每个CPU独立处理</strong>：<code class="language-plaintext highlighter-rouge">scheduler_tick()</code>在每个CPU上独立运行</li>
  <li><strong>调度类多态</strong>：通过<code class="language-plaintext highlighter-rouge">task_tick</code>回调，不同调度策略（CFS、RT、DEADLINE）有不同的处理逻辑</li>
  <li><strong>不直接切换任务</strong>：这个函数只<strong>标记</strong>是否需要重新调度，真正的切换在中断返回时发生</li>
</ol>

<h3 id="23-cfs调度类的时钟处理">2.3 CFS调度类的时钟处理</h3>

<p>对于普通任务（<code class="language-plaintext highlighter-rouge">SCHED_OTHER</code>），调度器使用<strong>完全公平调度器（CFS）</strong>。在<a href="https://github.com/torvalds/linux/blob/master/kernel/sched/fair.c"><code class="language-plaintext highlighter-rouge">kernel/sched/fair.c</code></a>中<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span> <span class="nf">task_tick_fair</span><span class="p">(</span><span class="k">struct</span> <span class="n">rq</span> <span class="o">*</span><span class="n">rq</span><span class="p">,</span> <span class="k">struct</span> <span class="n">task_struct</span> <span class="o">*</span><span class="n">curr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">queued</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">cfs_rq</span> <span class="o">*</span><span class="n">cfs_rq</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">sched_entity</span> <span class="o">*</span><span class="n">se</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">curr</span><span class="o">-&gt;</span><span class="n">se</span><span class="p">;</span>
    
    <span class="n">for_each_sched_entity</span><span class="p">(</span><span class="n">se</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">cfs_rq</span> <span class="o">=</span> <span class="n">cfs_rq_of</span><span class="p">(</span><span class="n">se</span><span class="p">);</span>
        <span class="n">entity_tick</span><span class="p">(</span><span class="n">cfs_rq</span><span class="p">,</span> <span class="n">se</span><span class="p">,</span> <span class="n">queued</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="c1">// ... NUMA平衡等</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">void</span> <span class="nf">entity_tick</span><span class="p">(</span><span class="k">struct</span> <span class="n">cfs_rq</span> <span class="o">*</span><span class="n">cfs_rq</span><span class="p">,</span> <span class="k">struct</span> <span class="n">sched_entity</span> <span class="o">*</span><span class="n">curr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">queued</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// 更新当前任务的虚拟运行时间</span>
    <span class="n">update_curr</span><span class="p">(</span><span class="n">cfs_rq</span><span class="p">);</span>
    
    <span class="c1">// 检查是否需要抢占</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">cfs_rq</span><span class="o">-&gt;</span><span class="n">nr_running</span> <span class="o">&gt;</span> <span class="mi">1</span><span class="p">)</span>
        <span class="n">check_preempt_tick</span><span class="p">(</span><span class="n">cfs_rq</span><span class="p">,</span> <span class="n">curr</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>虚拟运行时间（vruntime）</strong> 是CFS的核心概念：</p>
<ul>
  <li>每个任务都有一个<code class="language-plaintext highlighter-rouge">vruntime</code>，表示它已经”使用”了多少CPU时间（按优先级加权）</li>
  <li>调度器总是选择<code class="language-plaintext highlighter-rouge">vruntime</code>最小的任务运行</li>
  <li><code class="language-plaintext highlighter-rouge">check_preempt_tick()</code>检查当前任务的<code class="language-plaintext highlighter-rouge">vruntime</code>是否明显大于队列中其他任务，如果是，则设置<code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED</code></li>
</ul>

<h2 id="三抢占机制何时真正切换任务">三、抢占机制：何时真正切换任务？</h2>

<h3 id="31-抢占标志位tif_need_resched">3.1 抢占标志位：<code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED</code></h3>

<p>设置这个标志只是”建议”内核应该切换任务，但何时真正切换取决于<strong>抢占模型</strong>：</p>

<pre><code class="language-mermaid">flowchart TD
    A[scheduler_tick 检测到需要切换] --&gt; B[设置 TIF_NEED_RESCHED]
    B --&gt; C{当前在哪里?}
    
    C --&gt;|用户态| D[立即抢占&lt;br/&gt;在中断返回时]
    C --&gt;|内核态| E{抢占模型?}
    
    E --&gt;|PREEMPT_NONE| F[等待系统调用返回&lt;br/&gt;或显式调度点]
    E --&gt;|PREEMPT_VOLUNTARY| G[等待 cond_resched&lt;br/&gt;检查点]
    E --&gt;|PREEMPT_FULL| H[几乎立即抢占&lt;br/&gt;除非持有自旋锁]
    
    style D fill:#90EE90
    style F fill:#FFD700
    style G fill:#FFB6C1
    style H fill:#FF6347
</code></pre>

<h3 id="32-中断返回路径实际的切换点">3.2 中断返回路径：实际的切换点</h3>

<p>在x86-64架构上，中断返回时的处理（<a href="https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_64.S"><code class="language-plaintext highlighter-rouge">arch/x86/entry/entry_64.S</code></a>）<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ENTRY(interrupt_return)
    // ... 保存寄存器等
    
    testl $_TIF_NEED_RESCHED, %edi  // 检查是否需要重新调度
    jz restore_regs_and_return       // 如果不需要,直接返回
    
    // 需要调度
    call schedule                     // 调用调度器
    
restore_regs_and_return:
    // ... 恢复寄存器并返回用户态
    iretq
</code></pre></div></div>

<p><strong>关键点</strong>：</p>
<ul>
  <li>如果返回<strong>用户态</strong>，总是会检查并响应<code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED</code></li>
  <li>如果返回<strong>内核态</strong>，取决于配置的抢占模型</li>
</ul>

<h2 id="四rtos-vs-通用linux调度哲学的根本差异">四、RTOS vs. 通用Linux：调度哲学的根本差异</h2>

<h3 id="41-实时操作系统的调度特点">4.1 实时操作系统的调度特点</h3>

<p>你的草稿中提到了一个关键区别：<strong>RTOS的核心不是时间片轮转，而是基于优先级的抢占</strong><sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>。</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">特征</th>
      <th style="text-align: left">Linux (CFS)</th>
      <th style="text-align: left">RTOS (如FreeRTOS)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>调度目标</strong></td>
      <td style="text-align: left">公平性：确保所有任务都能获得CPU时间</td>
      <td style="text-align: left">确定性：最高优先级任务必须最快响应</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>时间片</strong></td>
      <td style="text-align: left">动态计算的虚拟运行时间</td>
      <td style="text-align: left">相同优先级才使用时间片轮转</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>抢占延迟</strong></td>
      <td style="text-align: left">毫秒级（取决于抢占模型）</td>
      <td style="text-align: left">微秒级（优先级抢占几乎立即发生）</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Tickless</strong></td>
      <td style="text-align: left">支持（省电）</td>
      <td style="text-align: left">部分RTOS支持，但优先保证实时性</td>
    </tr>
  </tbody>
</table>

<h3 id="42-preempt_rt将linux变成rtos">4.2 PREEMPT_RT：将Linux变成RTOS</h3>

<p>Linux的<strong>PREEMPT_RT补丁</strong><sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>通过以下改造，将通用内核变成硬实时系统：</p>

<p><strong>关键技术1：中断线程化</strong></p>

<pre><code class="language-mermaid">sequenceDiagram
    participant HW as 硬件中断
    participant Handler as 中断处理程序(顶半部)
    participant Thread as 内核中断线程
    participant Sched as 实时调度器
    participant RT as 高优先级RT任务
    
    Note over HW,RT: 标准Linux模式
    HW-&gt;&gt;Handler: 中断到来
    activate Handler
    Handler-&gt;&gt;Handler: 长时间处理（关闭抢占）
    deactivate Handler
    Note right of Handler: RT任务必须等待
    
    Note over HW,RT: PREEMPT_RT模式
    HW-&gt;&gt;Handler: 中断到来
    Handler-&gt;&gt;Thread: 唤醒中断线程（极快）
    Thread-&gt;&gt;Sched: 进入就绪队列
    Sched-&gt;&gt;Sched: 比较优先级
    alt RT任务优先级更高
        Sched-&gt;&gt;RT: 立即运行RT任务
    else 中断线程优先级更高
        Sched-&gt;&gt;Thread: 运行中断处理
    end
</code></pre>

<p><strong>关键技术2：自旋锁变互斥锁</strong></p>

<p>标准Linux中的<code class="language-plaintext highlighter-rouge">spinlock</code>在PREEMPT_RT下被替换为<code class="language-plaintext highlighter-rouge">rt_mutex</code>（支持优先级继承），避免了高优先级任务在自旋锁上空转的问题。</p>

<h2 id="五lazy抢占kernel-70的新权衡">五、Lazy抢占：Kernel 7.0的新权衡</h2>

<p>你的另一篇文章分析的<code class="language-plaintext highlighter-rouge">PREEMPT_LAZY</code><sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>正是这个权衡的最新演化：</p>

<p><strong>传统<code class="language-plaintext highlighter-rouge">PREEMPT_VOLUNTARY</code></strong>：</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 内核代码中散布的检查点</span>
<span class="k">if</span> <span class="p">(</span><span class="n">need_resched</span><span class="p">())</span>  <span class="c1">// 检查 TIF_NEED_RESCHED</span>
    <span class="n">schedule</span><span class="p">();</span>       <span class="c1">// 立即让出CPU</span>
</code></pre></div></div>

<p><strong>新的<code class="language-plaintext highlighter-rouge">PREEMPT_LAZY</code></strong>：</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 设置惰性标志</span>
<span class="n">set_tsk_need_resched_lazy</span><span class="p">(</span><span class="n">current</span><span class="p">);</span>

<span class="c1">// cond_resched() 不再检查惰性标志</span>
<span class="c1">// 只在时钟中断时升级为紧急标志</span>
<span class="k">if</span> <span class="p">(</span><span class="n">tick_happened</span> <span class="o">&amp;&amp;</span> <span class="n">test_lazy_flag</span><span class="p">())</span>
    <span class="n">set_tsk_need_resched</span><span class="p">(</span><span class="n">current</span><span class="p">);</span>  <span class="c1">// 升级为紧急抢占</span>
</code></pre></div></div>

<p><strong>设计哲学转变</strong>：</p>
<ul>
  <li><strong>旧模式</strong>：通过代码中的启发式检查点实现”礼貌让出”</li>
  <li><strong>新模式</strong>：将抢占决策集中到调度器的时钟中断中，简化内核但增加了抢占延迟</li>
</ul>

<p>这个改变导致PostgreSQL性能下降的原因，正是因为它破坏了数据库自旋锁对”临界区内不会被抢占”的隐含假设。</p>

<h2 id="六总结调度是一门平衡的艺术">六、总结：调度是一门平衡的艺术</h2>

<p>回到最初的问题：<strong>Linux内核是否核心依赖定时器中断来进行任务调度？</strong></p>

<p><strong>答案是分层的</strong>：</p>

<ol>
  <li><strong>理论上</strong>：不依赖。系统调用返回、主动让出、I/O阻塞等都可以触发调度。</li>
  <li><strong>实践上</strong>：依赖。定时器中断是保证公平性、防止任务饿死、更新调度统计的关键机制。</li>
  <li><strong>现代内核</strong>：可选。Tickless模式下，单任务运行时可以完全没有周期性中断。</li>
  <li><strong>实时系统</strong>：弱依赖。RTOS更依赖事件驱动的抢占，时钟中断仅用于时间片轮转。</li>
</ol>

<p><strong>关键技术实现</strong>：</p>
<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">scheduler_tick()</code></strong>：每个时钟中断调用，更新vruntime，检查是否需要抢占</li>
  <li><strong><code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED</code>标志</strong>：建议切换的信号，但何时响应取决于抢占模型</li>
  <li><strong>中断返回路径</strong>：实际任务切换的执行点</li>
  <li><strong>抢占模型</strong>：决定了内核态代码的可中断性</li>
</ul>

<p><strong>设计权衡</strong>：</p>
<ul>
  <li>频繁的时钟中断 → 更好的公平性和响应，但更高的开销</li>
  <li>Tickless → 省电和减少干扰，但需要更复杂的调度逻辑</li>
  <li>全抢占 → 低延迟，但吞吐量可能下降</li>
  <li>惰性抢占 → 简化内核，但需要应用层适配（如使用RSEQ）</li>
</ul>

<p>这些权衡没有”完美答案”，只有针对不同场景的”合适选择”。这也是为什么从通用服务器到硬实时系统，Linux提供了如此丰富的调度配置选项。</p>

<hr />

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Linux内核文档，《Reducing OS jitter due to per-cpu kthreads》，详细说明了NO_HZ_FULL模式的设计和使用。参见：<a href="https://github.com/torvalds/linux/blob/master/Documentation/timers/no_hz.rst">Documentation/timers/no_hz.rst</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>Linux内核源码 <a href="https://github.com/torvalds/linux/blob/master/kernel/sched/core.c"><code class="language-plaintext highlighter-rouge">kernel/sched/core.c</code></a> — <code class="language-plaintext highlighter-rouge">scheduler_tick()</code>函数是时钟中断调用调度器的入口点，更新运行队列时钟并调用调度类的task_tick回调。 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>Linux内核源码 <a href="https://github.com/torvalds/linux/blob/master/kernel/sched/fair.c"><code class="language-plaintext highlighter-rouge">kernel/sched/fair.c</code></a> — CFS调度器的实现，包括<code class="language-plaintext highlighter-rouge">task_tick_fair()</code>和虚拟运行时间的更新逻辑。 <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Linux内核源码 <a href="https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_64.S"><code class="language-plaintext highlighter-rouge">arch/x86/entry/entry_64.S</code></a> — x86-64架构的中断返回路径，包括<code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED</code>标志检查和调度调用。 <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>FreeRTOS文档，《The FreeRTOS Kernel》，说明了基于优先级的抢占式调度机制。参见：<a href="https://www.freertos.org/implementation/a00008.html">https://www.freertos.org/implementation/a00008.html</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p>Linux PREEMPT_RT项目，《Real-Time Linux Wiki》，详细介绍了实时补丁的实现原理，包括中断线程化和优先级继承互斥锁。参见：<a href="https://wiki.linuxfoundation.org/realtime/start">https://wiki.linuxfoundation.org/realtime/start</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p>Peter Zijlstra，Linux内核提交 <a href="https://github.com/torvalds/linux/commit/7c70cb94d29c"><code class="language-plaintext highlighter-rouge">7c70cb94d29c</code></a> — <em>sched: Add Lazy preemption model</em>。引入了惰性抢占标志<code class="language-plaintext highlighter-rouge">TIF_NEED_RESCHED_LAZY</code>，改变了传统的抢占检查机制。链接：<a href="https://lkml.kernel.org/r/20241007075055.331243614@infradead.org">https://lkml.kernel.org/r/20241007075055.331243614@infradead.org</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">IDT 与 SYSCALL：差异、演化、Linux 实现与性能</title><link href="https://weinan.io/2026/03/30/x86-64-syscall-idt-linux-kernel-sdm.html" rel="alternate" type="text/html" title="IDT 与 SYSCALL：差异、演化、Linux 实现与性能" /><published>2026-03-30T00:00:00+00:00</published><updated>2026-03-30T00:00:00+00:00</updated><id>https://weinan.io/2026/03/30/x86-64-syscall-idt-linux-kernel-sdm</id><content type="html" xml:base="https://weinan.io/2026/03/30/x86-64-syscall-idt-linux-kernel-sdm.html"><![CDATA[<style>
/* Mermaid diagram container */
.mermaid-container {
  position: relative;
  display: block;
  cursor: pointer;
  transition: opacity 0.2s;
  max-width: 100%;
  margin: 20px 0;
  overflow-x: auto;
}

.mermaid-container:hover {
  opacity: 0.8;
}

.mermaid-container svg {
  max-width: 100%;
  height: auto;
  display: block;
}

/* Modal overlay */
.mermaid-modal {
  display: none;
  position: fixed;
  z-index: 9999;
  left: 0;
  top: 0;
  width: 100%;
  height: 100%;
  background-color: rgba(0, 0, 0, 0.9);
  animation: fadeIn 0.3s;
}

.mermaid-modal.active {
  display: flex;
  align-items: center;
  justify-content: center;
}

@keyframes fadeIn {
  from { opacity: 0; }
  to { opacity: 1; }
}

/* Modal content */
.mermaid-modal-content {
  position: relative;
  width: 90vw;
  height: 90vh;
  overflow: hidden;
  background: white;
  padding: 20px;
  border-radius: 8px;
  box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
  display: flex;
  align-items: center;
  justify-content: center;
}

.mermaid-modal-diagram {
  transform-origin: center center;
  transition: transform 0.2s ease;
  display: inline-block;
  min-width: 100%;
  cursor: grab;
  user-select: none;
}

.mermaid-modal-diagram.dragging {
  cursor: grabbing;
  transition: none;
}

.mermaid-modal-diagram svg {
  width: 100%;
  height: auto;
  display: block;
  pointer-events: none;
}

/* Control buttons */
.mermaid-controls {
  position: absolute;
  top: 10px;
  right: 10px;
  display: flex;
  gap: 8px;
  z-index: 10000;
}

.mermaid-btn {
  background: rgba(255, 255, 255, 0.9);
  border: 1px solid #ddd;
  border-radius: 4px;
  padding: 8px 12px;
  cursor: pointer;
  font-size: 14px;
  transition: background 0.2s;
  color: #333;
  font-weight: 500;
}

.mermaid-btn:hover {
  background: white;
  box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}

/* Close button */
.mermaid-close {
  background: #f44336;
  color: white;
  border: none;
}

.mermaid-close:hover {
  background: #d32f2f;
}

/* Zoom indicator */
.mermaid-zoom-level {
  position: absolute;
  bottom: 20px;
  left: 20px;
  background: rgba(0, 0, 0, 0.7);
  color: white;
  padding: 6px 12px;
  border-radius: 4px;
  font-size: 14px;
  z-index: 10000;
}
</style>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';

  mermaid.initialize({
    startOnLoad: false,
    theme: 'default',
    securityLevel: 'loose',
    htmlLabels: true,
    themeVariables: {
      fontSize: '14px'
    }
  });

  let currentZoom = 1;
  let currentModal = null;
  let isDragging = false;
  let startX = 0;
  let startY = 0;
  let translateX = 0;
  let translateY = 0;

  // Create modal HTML
  function createModal() {
    const modal = document.createElement('div');
    modal.className = 'mermaid-modal';
    modal.innerHTML = `
      <div class="mermaid-controls">
        <button class="mermaid-btn zoom-in">放大 +</button>
        <button class="mermaid-btn zoom-out">缩小 -</button>
        <button class="mermaid-btn zoom-reset">重置</button>
        <button class="mermaid-btn mermaid-close">关闭 ✕</button>
      </div>
      <div class="mermaid-modal-content">
        <div class="mermaid-modal-diagram"></div>
      </div>
      <div class="mermaid-zoom-level">100%</div>
    `;
    document.body.appendChild(modal);
    return modal;
  }

  // Show modal with diagram
  function showModal(diagramContent) {
    if (!currentModal) {
      currentModal = createModal();
      setupModalEvents();
    }

    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.innerHTML = diagramContent;

    // Remove any width/height attributes from SVG to make it responsive
    const svg = modalDiagram.querySelector('svg');
    if (svg) {
      svg.removeAttribute('width');
      svg.removeAttribute('height');
      svg.style.width = '100%';
      svg.style.height = 'auto';
    }

    // Setup drag functionality
    setupDrag(modalDiagram);

    currentModal.classList.add('active');
    currentZoom = 1;
    resetPosition();
    updateZoom();
    document.body.style.overflow = 'hidden';
  }

  // Hide modal
  function hideModal() {
    if (currentModal) {
      currentModal.classList.remove('active');
      document.body.style.overflow = '';
    }
  }

  // Update zoom level
  function updateZoom() {
    if (!currentModal) return;
    const diagram = currentModal.querySelector('.mermaid-modal-diagram');
    const zoomLevel = currentModal.querySelector('.mermaid-zoom-level');
    diagram.style.transform = `translate(${translateX}px, ${translateY}px) scale(${currentZoom})`;
    zoomLevel.textContent = `${Math.round(currentZoom * 100)}%`;
  }

  // Reset position when zoom changes
  function resetPosition() {
    translateX = 0;
    translateY = 0;
  }

  // Setup drag functionality
  function setupDrag(element) {
    element.addEventListener('mousedown', startDrag);
    element.addEventListener('touchstart', startDrag);
  }

  function startDrag(e) {
    if (e.type === 'mousedown' && e.button !== 0) return; // Only left click

    isDragging = true;
    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.classList.add('dragging');

    if (e.type === 'touchstart') {
      startX = e.touches[0].clientX - translateX;
      startY = e.touches[0].clientY - translateY;
    } else {
      startX = e.clientX - translateX;
      startY = e.clientY - translateY;
    }

    document.addEventListener('mousemove', drag);
    document.addEventListener('touchmove', drag);
    document.addEventListener('mouseup', stopDrag);
    document.addEventListener('touchend', stopDrag);
  }

  function drag(e) {
    if (!isDragging) return;
    e.preventDefault();

    if (e.type === 'touchmove') {
      translateX = e.touches[0].clientX - startX;
      translateY = e.touches[0].clientY - startY;
    } else {
      translateX = e.clientX - startX;
      translateY = e.clientY - startY;
    }

    updateZoom();
  }

  function stopDrag() {
    isDragging = false;
    const modalDiagram = currentModal?.querySelector('.mermaid-modal-diagram');
    if (modalDiagram) {
      modalDiagram.classList.remove('dragging');
    }
    document.removeEventListener('mousemove', drag);
    document.removeEventListener('touchmove', drag);
    document.removeEventListener('mouseup', stopDrag);
    document.removeEventListener('touchend', stopDrag);
  }

  // Setup modal event listeners
  function setupModalEvents() {
    if (!currentModal) return;

    // Close button
    currentModal.querySelector('.mermaid-close').addEventListener('click', hideModal);

    // Zoom buttons
    currentModal.querySelector('.zoom-in').addEventListener('click', () => {
      currentZoom = Math.min(currentZoom + 0.25, 3);
      updateZoom();
    });

    currentModal.querySelector('.zoom-out').addEventListener('click', () => {
      currentZoom = Math.max(currentZoom - 0.25, 0.5);
      updateZoom();
    });

    currentModal.querySelector('.zoom-reset').addEventListener('click', () => {
      currentZoom = 1;
      resetPosition();
      updateZoom();
    });

    // Close on background click
    currentModal.addEventListener('click', (e) => {
      if (e.target === currentModal) {
        hideModal();
      }
    });

    // Close on ESC key
    document.addEventListener('keydown', (e) => {
      if (e.key === 'Escape' && currentModal.classList.contains('active')) {
        hideModal();
      }
    });
  }

  // Convert Jekyll-rendered code blocks to mermaid divs
  document.addEventListener('DOMContentLoaded', async function() {
    const codeBlocks = document.querySelectorAll('code.language-mermaid');

    for (const codeBlock of codeBlocks) {
      const pre = codeBlock.parentElement;
      const container = document.createElement('div');
      container.className = 'mermaid-container';

      const mermaidDiv = document.createElement('div');
      mermaidDiv.className = 'mermaid';
      mermaidDiv.textContent = codeBlock.textContent;

      container.appendChild(mermaidDiv);
      pre.replaceWith(container);
    }

    // Render all mermaid diagrams
    try {
      await mermaid.run({
        querySelector: '.mermaid'
      });
      console.log('Mermaid diagrams rendered successfully');
    } catch (error) {
      console.error('Mermaid rendering error:', error);
    }

    // Add click handlers to rendered diagrams
    document.querySelectorAll('.mermaid-container').forEach((container, index) => {
      // Find the rendered SVG inside the container
      const svg = container.querySelector('svg');
      if (!svg) {
        console.warn(`No SVG found in container ${index}`);
        return;
      }

      // Make the container clickable
      container.style.cursor = 'pointer';
      container.title = '点击查看大图';

      container.addEventListener('click', function(e) {
        e.preventDefault();
        e.stopPropagation();

        // Clone the SVG for the modal
        const svgClone = svg.cloneNode(true);
        const tempDiv = document.createElement('div');
        tempDiv.appendChild(svgClone);

        console.log('Opening modal for diagram', index);
        showModal(tempDiv.innerHTML);
      });

      console.log(`Click handler added to diagram ${index}`);
    });
  });
</script>

<p>全文分三部分：</p>

<ul>
  <li><strong>IDT 与 <code class="language-plaintext highlighter-rouge">SYSCALL</code> 的机制差异与历史脉络</strong></li>
  <li><strong>x86-64 Linux 上从 <code class="language-plaintext highlighter-rouge">syscall</code> 指令到内核服务的执行路径</strong>（对照 SDM 与 <code class="language-plaintext highlighter-rouge">arch/x86</code>）</li>
  <li><strong>经 IDT 的入核与 <code class="language-plaintext highlighter-rouge">SYSCALL</code> 入核在开销与实现上的对比</strong></li>
</ul>

<p>硬件叙述以 Intel <em>Software Developer’s Manual</em>（Volume 3A 等）为准，软件以 Linux 主线 <code class="language-plaintext highlighter-rouge">arch/x86</code> 为准；引用标号见文末 <strong>References</strong>。</p>

<hr />

<h2 id="主题一idt-与-syscall-的区别与演化">主题一：IDT 与 <code class="language-plaintext highlighter-rouge">SYSCALL</code> 的区别与演化</h2>

<h3 id="谁在决定内核入口">谁在决定内核入口</h3>

<ul>
  <li><strong>异常、硬件中断、<code class="language-plaintext highlighter-rouge">INT n</code></strong>：CPU 用 <strong>IDT（Interrupt Descriptor Table）</strong> 按 <strong>向量号</strong> 取门描述符，再按架构规则完成特权级与栈等处理；OS 负责 <strong>填表</strong> 并用 <strong><code class="language-plaintext highlighter-rouge">LIDT</code></strong> 之类加载 <strong>IDTR</strong>。该路径与一组 <strong>MSR</strong> 配合编程的 <strong><code class="language-plaintext highlighter-rouge">SYSCALL</code> 入核</strong>是两套并存机制<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>。</li>
  <li><strong><code class="language-plaintext highlighter-rouge">SYSCALL</code>（64 位长模式下的系统调用主路径之一）</strong>：CPU 根据 <strong><code class="language-plaintext highlighter-rouge">IA32_STAR</code>、<code class="language-plaintext highlighter-rouge">IA32_LSTAR</code>、<code class="language-plaintext highlighter-rouge">IA32_FMASK</code></strong> 等 <strong>MSR</strong> 切到 ring 0 并跳转到 <strong><code class="language-plaintext highlighter-rouge">IA32_LSTAR</code> 指向的 RIP</strong>，<strong>不查 IDT</strong><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup><sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>。</li>
</ul>

<p>二者都是架构规定的入口协议，但针对的事件类别不同：前者服务 <strong>异步/异常类事件</strong> 的统一交付，后者服务 <strong>用户态主动发起的系统调用</strong> 的专用快速通道。</p>

<h3 id="64-位模式下的-idt-索引">64 位模式下的 IDT 索引</h3>

<p>在 <strong>64-bit / IA-32e</strong> 下，门描述符为 <strong>16 字节</strong>；向量 <em>k</em> 对应表项在 IDT 中的字节偏移为 <strong>k × 16</strong>（与 legacy 模式下 8 字节项不同）<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>。</p>

<p>手册在 64-bit mode IDT gate 处写道<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">5</a></sup>：</p>

<blockquote>
  <p>In 64-bit mode, the IDT index is formed by scaling the interrupt vector by 16. The first eight bytes (bytes 7:0) of a 64-bit mode interrupt gate are similar but not identical to legacy 32-bit interrupt gates. The type field (bits 11:8 in bytes 7:4) is described in Table 3-2. The Interrupt Stack Table (IST) field (bits 4:0 in bytes 7:4) is used by the stack switching mechanisms described in Section 6.14.5, “Interrupt Stack Table.” Bytes 11:8 hold the upper 32 bits of the target RIP (interrupt segment offset) in canonical form.</p>
</blockquote>

<h3 id="对照表">对照表</h3>

<table>
  <thead>
    <tr>
      <th style="text-align: left">特性</th>
      <th style="text-align: left">经 IDT 的路径</th>
      <th style="text-align: left"><code class="language-plaintext highlighter-rouge">SYSCALL</code> 路径</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">典型触发</td>
      <td style="text-align: left">硬件中断、CPU 异常、<code class="language-plaintext highlighter-rouge">INT n</code>（含历史上的 <code class="language-plaintext highlighter-rouge">int 0x80</code>）</td>
      <td style="text-align: left">用户态执行 <strong><code class="language-plaintext highlighter-rouge">syscall</code></strong></td>
    </tr>
    <tr>
      <td style="text-align: left">入口定位</td>
      <td style="text-align: left">CPU 按向量查 <strong>IDT 门</strong></td>
      <td style="text-align: left">CPU 读 <strong><code class="language-plaintext highlighter-rouge">IA32_LSTAR</code> 等 MSR</strong></td>
    </tr>
    <tr>
      <td style="text-align: left">门/MSR 语义</td>
      <td style="text-align: left">类型、DPL、IST、段选择子等 <strong>由 CPU 解释</strong></td>
      <td style="text-align: left"><strong><code class="language-plaintext highlighter-rouge">STAR</code>/<code class="language-plaintext highlighter-rouge">LSTAR</code>/<code class="language-plaintext highlighter-rouge">FMASK</code> 组合</strong>，由 OS 预编程</td>
    </tr>
    <tr>
      <td style="text-align: left">是否使用 IDT</td>
      <td style="text-align: left">是</td>
      <td style="text-align: left"><strong>否</strong>（本条目不讨论 FRED 等后续扩展）</td>
    </tr>
  </tbody>
</table>

<h3 id="与系统调用号--内核函数的关系">与「系统调用号 → 内核函数」的关系</h3>

<p>抽象上都可说成 <strong>编号映射到处理逻辑</strong>：IDT 用 <strong>中断向量</strong>，系统调用用 <strong><code class="language-plaintext highlighter-rouge">RAX</code> 中的调用号</strong>。<br />
<strong>差别在于</strong>：IDT 的查表与跳转是 <strong>CPU 事件交付的一部分</strong>；而 <strong><code class="language-plaintext highlighter-rouge">RAX → __x64_sys_*</code></strong> 属于 <strong>内核在进入 <code class="language-plaintext highlighter-rouge">do_syscall_64</code> 之后的纯软件分发</strong>，处理器并不解析“系统调用号”的语义。</p>

<h4 id="三条不同的表--入口--快车道">三条不同的「表 / 入口 / 快车道」</h4>

<p>将机制分为以下三层（可与 <strong>上文「对照表」</strong>、<strong>下文「机制层对比」</strong> 对照阅读）：</p>

<ul>
  <li>
    <p><strong>IDT（及经其投递的中断/异常/<code class="language-plaintext highlighter-rouge">INT n</code>）</strong>
 由 CPU 规定、面向<strong>全体异步与异常事件</strong>的 <strong>通用交付协议</strong>：功能全、约束多，不以“最短一次用户主动系统调用”为唯一优化目标<sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>。</p>
  </li>
  <li>
    <p><strong>系统调用分发（软件）</strong>
 Linux 仍保留 <strong><code class="language-plaintext highlighter-rouge">sys_call_table[]</code></strong>，方便 <strong>trace</strong> 等子系统解析符号地址；<strong>64 位主路径</strong>上则由 <strong><code class="language-plaintext highlighter-rouge">x64_sys_call()</code> 的 <code class="language-plaintext highlighter-rouge">switch (nr)</code></strong> 落到 <strong><code class="language-plaintext highlighter-rouge">__x64_sys_*</code></strong>。无论数组还是 <strong><code class="language-plaintext highlighter-rouge">switch</code></strong>，都属于 <strong><code class="language-plaintext highlighter-rouge">syscall</code> 已经进核之后</strong> 的普通控制流，<strong>不是 CPU 替代的 IDT 查表</strong><sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">6</a></sup>。</p>
  </li>
  <li>
    <p><strong>系统调用硬件快车道（<code class="language-plaintext highlighter-rouge">SYSCALL</code> + 若干 MSR）</strong>
 <strong>入口 <code class="language-plaintext highlighter-rouge">RIP</code> 与 <code class="language-plaintext highlighter-rouge">CS</code>/<code class="language-plaintext highlighter-rouge">SS</code>/<code class="language-plaintext highlighter-rouge">RFLAGS</code> 掩码</strong>由 <strong><code class="language-plaintext highlighter-rouge">STAR</code>/<code class="language-plaintext highlighter-rouge">LSTAR</code>/<code class="language-plaintext highlighter-rouge">FMASK</code>（及 <code class="language-plaintext highlighter-rouge">EFER.SCE</code>）</strong> 预编程；这是在 <strong>不进 IDT</strong> 的前提下完成的 <strong><code class="language-plaintext highlighter-rouge">ring 3 → ring 0</code> 专用序列</strong><sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup><sup id="fnref:11:1" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">5</a></sup>。<strong><code class="language-plaintext highlighter-rouge">__x64_sys_*</code> 分发</strong>在这一硬件入核序列完成之后，才由 <strong><code class="language-plaintext highlighter-rouge">do_syscall_64</code> / <code class="language-plaintext highlighter-rouge">x64_sys_call</code></strong> 等以 <strong>普通内核控制流</strong>执行<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">6</a></sup>。</p>
  </li>
</ul>

<h3 id="一条简化的演化脉络x86--linux-相关">一条简化的演化脉络（x86 / Linux 相关）</h3>

<ul>
  <li><strong>80386 及保护模式</strong>：<strong>IDT</strong> 与 <strong><code class="language-plaintext highlighter-rouge">INT n</code></strong> 成为统一的异常/中断/软中断交付入口；内核通过设置向量 <em>n</em> 的门，把控制流交给对应处理例程。</li>
  <li><strong>32 位 Linux</strong>：用户态系统调用长期使用 <strong><code class="language-plaintext highlighter-rouge">int 0x80</code></strong>，即 <strong>CPU 查 IDT 向量 0x80</strong> 进入内核（仍属 IDT 路径）<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">7</a></sup>。</li>
  <li><strong>约 Pentium II / Pro 一代</strong>：Intel 引入 <strong><code class="language-plaintext highlighter-rouge">SYSENTER</code>/<code class="language-plaintext highlighter-rouge">SYSEXIT</code></strong>，配合 <strong>MSR</strong> 提供另一条 <strong>不经 IDT 门描述符的</strong> 快速进核通道（Linux 在 <strong>32 位兼容路径</strong>等场景仍会碰到与 <strong><code class="language-plaintext highlighter-rouge">SYSENTER</code>/<code class="language-plaintext highlighter-rouge">SYSCALL</code></strong> 相关的入口约定）<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">8</a></sup>。</li>
  <li><strong>x86-64（AMD64 / Intel 64）</strong>：架构在 <strong>长模式</strong>下提供 <strong><code class="language-plaintext highlighter-rouge">SYSCALL</code>/<code class="language-plaintext highlighter-rouge">SYSRET</code></strong>（由 <strong><code class="language-plaintext highlighter-rouge">IA32_EFER.SCE</code></strong> 等控制使能，细节以 SDM 为准）。<strong>64 位 Linux 用户态</strong>通常通过 <strong>glibc 等内联 <code class="language-plaintext highlighter-rouge">syscall</code></strong>，内核入口落在 <strong><code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code></strong><sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup><sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">9</a></sup>。</li>
  <li><strong>并存</strong>：今日 64 位内核仍可能为 <strong>32 位进程</strong> 保留 <strong><code class="language-plaintext highlighter-rouge">int 0x80</code> / <code class="language-plaintext highlighter-rouge">SYSENTER</code> / 兼容入口</strong>（向量与实现见内核头文件与 <code class="language-plaintext highlighter-rouge">entry_64_compat</code> 等）；<strong>本文明细以 64 位 <code class="language-plaintext highlighter-rouge">syscall</code> 主线为主</strong>。</li>
</ul>

<hr />

<h2 id="主题二x86-64-linux-上-syscall-从-cpu-到内核的完整机制">主题二：x86-64 Linux 上 <code class="language-plaintext highlighter-rouge">syscall</code> 从 CPU 到内核的完整机制</h2>

<h3 id="三层结构总览">三层结构（总览）</h3>

<ul>
  <li><strong>CPU（SDM）</strong>：用户态约定 <strong><code class="language-plaintext highlighter-rouge">RAX</code>=调用号</strong>、参数寄存器后执行 <strong><code class="language-plaintext highlighter-rouge">syscall</code></strong>。硬件将 <strong><code class="language-plaintext highlighter-rouge">RIP → RCX</code>、<code class="language-plaintext highlighter-rouge">RFLAGS → R11</code></strong>，按 <strong>MSR</strong> 加载 <strong><code class="language-plaintext highlighter-rouge">CS</code>/<code class="language-plaintext highlighter-rouge">SS</code>/<code class="language-plaintext highlighter-rouge">RIP</code></strong>，并令 <strong><code class="language-plaintext highlighter-rouge">RFLAGS &lt;- RFLAGS &amp; ~IA32_FMASK</code></strong>；<strong>不保存 <code class="language-plaintext highlighter-rouge">RSP</code></strong>、不向栈压帧。</li>
  <li><strong>内核入口 <code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code></strong>（<code class="language-plaintext highlighter-rouge">arch/x86/entry/entry_64.S</code>）：<strong><code class="language-plaintext highlighter-rouge">swapgs</code></strong>、切换到 <strong>per-CPU 内核栈</strong>，在栈上构造 <strong><code class="language-plaintext highlighter-rouge">struct pt_regs</code></strong>，再 <strong><code class="language-plaintext highlighter-rouge">call do_syscall_64</code></strong>。</li>
  <li><strong>分发与返回</strong>：<strong><code class="language-plaintext highlighter-rouge">do_syscall_64</code></strong> → <strong><code class="language-plaintext highlighter-rouge">x64_sys_call</code></strong> 的 <strong><code class="language-plaintext highlighter-rouge">switch (nr)</code></strong> → 各 <strong><code class="language-plaintext highlighter-rouge">__x64_sys_*</code></strong>。返回时若满足契约则 <strong><code class="language-plaintext highlighter-rouge">SYSRET</code></strong>，否则 <strong><code class="language-plaintext highlighter-rouge">IRET</code></strong>。</li>
</ul>

<p>对比 <strong>IDT 路径</strong>：<strong>IDT</strong> 处理「向量 → 硬件按门交付」；<strong><code class="language-plaintext highlighter-rouge">syscall</code></strong> 处理「寄存器约定 + <strong>MSR</strong> 指定 <strong><code class="language-plaintext highlighter-rouge">RIP</code></strong> → <strong>软件</strong>补全栈帧再交付」。</p>

<h3 id="syscall-与-msr多寄存器协同而非单一-lstar"><code class="language-plaintext highlighter-rouge">SYSCALL</code> 与 MSR：多寄存器协同，而非单一 <code class="language-plaintext highlighter-rouge">LSTAR</code></h3>

<p><strong>MSR（Model Specific Register）</strong> 指通过 <strong><code class="language-plaintext highlighter-rouge">RDMSR</code>/<code class="language-plaintext highlighter-rouge">WRMSR</code></strong> 访问的 <strong>按编号独立编址</strong> 的一类寄存器；体系结构里与 <code class="language-plaintext highlighter-rouge">SYSCALL</code> 相关的常量名 <strong><code class="language-plaintext highlighter-rouge">IA32_STAR</code>、<code class="language-plaintext highlighter-rouge">IA32_LSTAR</code>、<code class="language-plaintext highlighter-rouge">IA32_FMASK</code></strong> 等各自对应不同 MSR 地址与语义。长模式下执行 <strong><code class="language-plaintext highlighter-rouge">SYSCALL</code></strong> 时，处理器按 <strong><code class="language-plaintext highlighter-rouge">IA32_EFER.SCE</code></strong> 判定该机制是否可用，再从 <strong><code class="language-plaintext highlighter-rouge">STAR</code>/<code class="language-plaintext highlighter-rouge">LSTAR</code>/<code class="language-plaintext highlighter-rouge">FMASK</code></strong> 读出 CS/SS、目标 RIP 与 RFLAGS 掩码<sup id="fnref:3:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup><sup id="fnref:11:2" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">5</a></sup>。</p>

<p>SDM 在 <strong><code class="language-plaintext highlighter-rouge">STAR</code>/<code class="language-plaintext highlighter-rouge">LSTAR</code>/<code class="language-plaintext highlighter-rouge">FMASK</code> 布局</strong>处写明<sup id="fnref:11:3" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">5</a></sup>：</p>

<blockquote>
  <p>See Figure 5-14 for the layout of IA32_STAR, IA32_LSTAR and IA32_FMASK.</p>
</blockquote>

<p>并在同一节给出 <strong><code class="language-plaintext highlighter-rouge">RIP</code> 取自 <code class="language-plaintext highlighter-rouge">IA32_LSTAR</code>、<code class="language-plaintext highlighter-rouge">RFLAGS</code> 与 <code class="language-plaintext highlighter-rouge">IA32_FMASK</code> 的组合关系</strong>（正文 <strong>「CPU 侧（与 Vol.3A §5.8.8 等一致）」</strong> 一节另有逐句引文）。</p>

<p>Linux 在 <strong>64 位内核引导路径</strong>中与上述分工对齐：<strong><code class="language-plaintext highlighter-rouge">syscall_init()</code></strong> 写 <strong><code class="language-plaintext highlighter-rouge">MSR_STAR</code></strong>（用户/内核段选择子约定），再调用 <strong><code class="language-plaintext highlighter-rouge">idt_syscall_init()</code></strong> 写 <strong><code class="language-plaintext highlighter-rouge">MSR_LSTAR</code></strong>（<code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code>）与 <strong><code class="language-plaintext highlighter-rouge">MSR_SYSCALL_MASK</code></strong>（对应 <strong><code class="language-plaintext highlighter-rouge">IA32_FMASK</code></strong>）<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">10</a></sup>：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">syscall_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
	<span class="cm">/* The default user and kernel segments */</span>
	<span class="n">wrmsr</span><span class="p">(</span><span class="n">MSR_STAR</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="n">__USER32_CS</span> <span class="o">&lt;&lt;</span> <span class="mi">16</span><span class="p">)</span> <span class="o">|</span> <span class="n">__KERNEL_CS</span><span class="p">);</span>

	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">cpu_feature_enabled</span><span class="p">(</span><span class="n">X86_FEATURE_FRED</span><span class="p">))</span>
		<span class="n">idt_syscall_init</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">inline</span> <span class="kt">void</span> <span class="nf">idt_syscall_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">wrmsrq</span><span class="p">(</span><span class="n">MSR_LSTAR</span><span class="p">,</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)</span><span class="n">entry_SYSCALL_64</span><span class="p">);</span>
	<span class="cm">/* ia32_enabled() / SYSENTER_* / MSR_CSTAR 分支：见 common.c 全文 */</span>
	<span class="n">wrmsrq</span><span class="p">(</span><span class="n">MSR_SYSCALL_MASK</span><span class="p">,</span>
	       <span class="n">X86_EFLAGS_CF</span><span class="o">|</span><span class="n">X86_EFLAGS_PF</span><span class="o">|</span><span class="n">X86_EFLAGS_AF</span><span class="o">|</span>
	       <span class="n">X86_EFLAGS_ZF</span><span class="o">|</span><span class="n">X86_EFLAGS_SF</span><span class="o">|</span><span class="n">X86_EFLAGS_TF</span><span class="o">|</span>
	       <span class="n">X86_EFLAGS_IF</span><span class="o">|</span><span class="n">X86_EFLAGS_DF</span><span class="o">|</span><span class="n">X86_EFLAGS_OF</span><span class="o">|</span>
	       <span class="n">X86_EFLAGS_IOPL</span><span class="o">|</span><span class="n">X86_EFLAGS_NT</span><span class="o">|</span><span class="n">X86_EFLAGS_RF</span><span class="o">|</span>
	       <span class="n">X86_EFLAGS_AC</span><span class="o">|</span><span class="n">X86_EFLAGS_ID</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>内核里 <strong><code class="language-plaintext highlighter-rouge">MSR_SYSCALL_MASK</code></strong> 与手册 <strong><code class="language-plaintext highlighter-rouge">IA32_FMASK</code></strong> 对应同一类编程接口；<strong><code class="language-plaintext highlighter-rouge">idt_syscall_init()</code></strong> 在 <strong><code class="language-plaintext highlighter-rouge">MSR_LSTAR</code> 与兼容路径 MSRs</strong> 之间的分支仍以 <code class="language-plaintext highlighter-rouge">arch/x86/kernel/cpu/common.c</code> 为准，<strong>「内核源码摘录（与上表对应）」</strong> 一节给出与当前主线一致的更长摘录。</p>

<p>从机制上概括：<strong><code class="language-plaintext highlighter-rouge">IA32_LSTAR</code> 只给出 ring-0 入口 <code class="language-plaintext highlighter-rouge">RIP</code></strong>；<strong><code class="language-plaintext highlighter-rouge">IA32_STAR</code> 给出 <code class="language-plaintext highlighter-rouge">SYSCALL</code>/<code class="language-plaintext highlighter-rouge">SYSRET</code> 使用的 CS/SS 选择子场</strong>；<strong><code class="language-plaintext highlighter-rouge">IA32_FMASK</code> 规定 <code class="language-plaintext highlighter-rouge">RFLAGS</code> 在进入时被清除的位</strong>；<strong><code class="language-plaintext highlighter-rouge">IA32_EFER.SCE</code> 使能整条 <code class="language-plaintext highlighter-rouge">SYSCALL</code>/<code class="language-plaintext highlighter-rouge">SYSRET</code> 路径</strong><sup id="fnref:3:4" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup><sup id="fnref:11:4" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">5</a></sup>。三颗 MSR 与总开关共同构成 SDM <strong>Figure 5-14</strong> 所描述的配置平面，操作系统需一并初始化，而不是仅写 <strong><code class="language-plaintext highlighter-rouge">LSTAR</code></strong> 一项。</p>

<h3 id="长模式专用syscall-与-sysret--三颗-msr-如何协同工作">长模式专用：<code class="language-plaintext highlighter-rouge">SYSCALL</code> 与 <code class="language-plaintext highlighter-rouge">SYSRET</code> —— 三颗 MSR 如何协同工作</h3>

<h4 id="核心概念三个-msr-各司其职">核心概念：三个 MSR 各司其职</h4>

<p>在 x86-64 长模式下，<code class="language-plaintext highlighter-rouge">syscall</code> 和 <code class="language-plaintext highlighter-rouge">sysret</code> 指令依赖三个 MSR（模型特定寄存器）来完成用户态到内核态、再回到用户态的完整流程。可以这样理解：</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">MSR 寄存器</th>
      <th style="text-align: left">作用</th>
      <th style="text-align: left">类比</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>IA32_STAR</strong></td>
      <td style="text-align: left">告诉 CPU：进入内核时用什么段（CS/SS），返回用户时用什么段</td>
      <td style="text-align: left"><strong>门禁卡的双重配置</strong>——进去刷A区，出来刷B区</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>IA32_LSTAR</strong></td>
      <td style="text-align: left">告诉 CPU：内核的入口函数地址在哪里</td>
      <td style="text-align: left"><strong>紧急出口的指向标</strong>——从这里进内核</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>IA32_FMASK</strong></td>
      <td style="text-align: left">告诉 CPU：进入内核时，RFLAGS 寄存器里哪些位要强制清零</td>
      <td style="text-align: left"><strong>安检过滤器</strong>——某些标志位不能带进内核</td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p><strong>重要说明</strong>：本文只讨论 <strong>IA-32e 长模式</strong>下带 <code class="language-plaintext highlighter-rouge">REX.W</code> 的 <code class="language-plaintext highlighter-rouge">syscall</code>/<code class="language-plaintext highlighter-rouge">sysret</code> 指令，不涉及 <code class="language-plaintext highlighter-rouge">IA32_CSTAR</code> 和 <code class="language-plaintext highlighter-rouge">SYSENTER</code>/<code class="language-plaintext highlighter-rouge">SYSEXIT</code> 等其他机制。</p>
</blockquote>

<hr />

<h4 id="流程图一条系统调用的完整旅程">流程图：一条系统调用的完整旅程</h4>

<p>下面这个流程图展示了从<strong>用户态执行 <code class="language-plaintext highlighter-rouge">syscall</code></strong> 到<strong>内核处理</strong>再到<strong>返回用户态</strong>的完整过程。每个框里都注明了“此时谁在读/写哪个 MSR”。</p>

<pre><code class="language-mermaid">sequenceDiagram
    participant OS as 操作系统(启动时)
    participant User as 用户态程序
    participant CPU as CPU硬件
    participant Kernel as 内核态代码

    Note over OS: 操作系统启动时，预先配置 MSR
    OS-&gt;&gt;CPU: IA32_EFER.SCE = 1 (开启 syscall 支持)
    OS-&gt;&gt;CPU: IA32_STAR = 入核/出核的 CS/SS 选择子
    OS-&gt;&gt;CPU: IA32_LSTAR = 内核入口地址
    OS-&gt;&gt;CPU: IA32_FMASK = RFLAGS 清零掩码

    Note over User: 用户态准备系统调用
    User-&gt;&gt;User: RAX = 系统调用号，参数存入 RDI/RSI/RDX/R10/R8/R9
    User-&gt;&gt;User: RSP 指向用户栈

    User-&gt;&gt;CPU: 执行 syscall 指令

    Note over CPU: syscall 指令的硬件自动行为
    CPU-&gt;&gt;CPU: RCX = 用户态下一条指令的 RIP
    CPU-&gt;&gt;CPU: R11 = 用户态完整 RFLAGS
    CPU-&gt;&gt;CPU: RIP = IA32_LSTAR (读 MSR)
    CPU-&gt;&gt;CPU: CS/SS = IA32_STAR 入核位域
    CPU-&gt;&gt;CPU: RFLAGS = RFLAGS &amp; (~IA32_FMASK) (按 FMASK 清零)
    
    Note over CPU: 特权级从 Ring 3 切换到 Ring 0
    CPU-&gt;&gt;Kernel: 跳转到 LSTAR 指向的内核入口

    Note over Kernel: 内核处理系统调用
    Kernel-&gt;&gt;Kernel: swapgs (切换到内核 GS)
    Kernel-&gt;&gt;Kernel: 手动切换 RSP 到内核栈
    Kernel-&gt;&gt;Kernel: 保存完整寄存器到内核栈 (形成 pt_regs)
    Kernel-&gt;&gt;Kernel: 根据 RAX 查 sys_call_table 分发
    Kernel-&gt;&gt;Kernel: 执行具体内核函数，返回值写入 RAX
    Kernel-&gt;&gt;Kernel: 恢复寄存器，准备返回

    Kernel-&gt;&gt;CPU: 执行 sysretq 指令

    Note over CPU: sysret 指令的硬件自动行为
    CPU-&gt;&gt;CPU: CS/SS = IA32_STAR 出核位域
    CPU-&gt;&gt;CPU: RIP = RCX (恢复用户态返回地址)
    CPU-&gt;&gt;CPU: RFLAGS = R11 (恢复用户态标志位)

    Note over CPU: 特权级从 Ring 0 切换回 Ring 3
    CPU-&gt;&gt;User: 跳转到用户态返回地址

    Note over User: 继续执行，RAX 中为系统调用返回值
</code></pre>

<hr />

<h4 id="关键要点避免踩坑">关键要点（避免踩坑）</h4>

<h5 id="syscall-不会自动切换-rsp"><code class="language-plaintext highlighter-rouge">syscall</code> 不会自动切换 RSP</h5>
<ul>
  <li>用户栈指针（RSP）<strong>不会</strong>被 <code class="language-plaintext highlighter-rouge">syscall</code> 指令改变。</li>
  <li>内核必须在入口代码中<strong>手动切换</strong>到内核栈（通常用 <code class="language-plaintext highlighter-rouge">swapgs</code> + 写 <code class="language-plaintext highlighter-rouge">rsp</code>）。</li>
  <li>这意味着：<strong>RSP 的保存和恢复是软件的责任</strong>，硬件不管。</li>
</ul>

<h5 id="sysret-的契约"><code class="language-plaintext highlighter-rouge">sysret</code> 的「契约」</h5>
<ul>
  <li><code class="language-plaintext highlighter-rouge">sysret</code> 指令假设：
    <ul>
      <li><strong>RCX</strong> 中存放着用户态的返回地址（由 <code class="language-plaintext highlighter-rouge">syscall</code> 自动保存）。</li>
      <li><strong>R11</strong> 中存放着用户态的 RFLAGS（由 <code class="language-plaintext highlighter-rouge">syscall</code> 自动保存）。</li>
    </ul>
  </li>
  <li><strong>如果内核代码不小心破坏了 RCX 或 R11，就不能再用 <code class="language-plaintext highlighter-rouge">sysret</code> 返回</strong>，必须改用 <code class="language-plaintext highlighter-rouge">iret</code> 路径。</li>
</ul>

<h5 id="返回值约定">返回值约定</h5>
<ul>
  <li>系统调用的返回值<strong>必须放在 RAX</strong> 中。</li>
  <li>这是用户态和内核态的约定，<code class="language-plaintext highlighter-rouge">sysret</code> 不会动 RAX。</li>
</ul>

<hr />

<h4 id="与-int-0x80--idt-路径的对比可选扩展">与 <code class="language-plaintext highlighter-rouge">int 0x80</code> + IDT 路径的对比（可选扩展）</h4>

<p>如果你想理解为什么这套机制比 <code class="language-plaintext highlighter-rouge">int 0x80</code> 快，可以这样对比：</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">动作</th>
      <th style="text-align: left"><code class="language-plaintext highlighter-rouge">int 0x80</code>（老方法）</th>
      <th style="text-align: left"><code class="language-plaintext highlighter-rouge">syscall</code>（新方法）</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">保存返回地址</td>
      <td style="text-align: left">压栈（内存访问）</td>
      <td style="text-align: left">存 RCX（寄存器）</td>
    </tr>
    <tr>
      <td style="text-align: left">保存 RFLAGS</td>
      <td style="text-align: left">压栈（内存访问）</td>
      <td style="text-align: left">存 R11（寄存器）</td>
    </tr>
    <tr>
      <td style="text-align: left">查找入口</td>
      <td style="text-align: left">查内存中的 IDT 表</td>
      <td style="text-align: left">读 MSR 寄存器（CPU 内部）</td>
    </tr>
    <tr>
      <td style="text-align: left">切换栈</td>
      <td style="text-align: left">硬件自动切（TSS 机制）</td>
      <td style="text-align: left">软件手动切（更灵活）</td>
    </tr>
    <tr>
      <td style="text-align: left">保存段寄存器</td>
      <td style="text-align: left">硬件自动保存 5 个</td>
      <td style="text-align: left">根本不保存（因为用不上）</td>
    </tr>
    <tr>
      <td style="text-align: left">返回指令</td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">iret</code>（重量级）</td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">sysret</code>（轻量级）</td>
    </tr>
  </tbody>
</table>

<p><strong>核心结论</strong>：<code class="language-plaintext highlighter-rouge">syscall</code> 快，不是因为它“做的事少”，而是因为它“用寄存器代替了内存”，并且“去掉了历史包袱”。</p>

<p>在 <code class="language-plaintext highlighter-rouge">syscall</code>/<code class="language-plaintext highlighter-rouge">sysret</code> 机制中，最核心的 MSR 寄存器是以下<strong>三个</strong>：</p>

<hr />

<h4 id="核心三颗-msr">核心三颗 MSR</h4>

<table>
  <thead>
    <tr>
      <th style="text-align: left">MSR 名称</th>
      <th style="text-align: left">地址</th>
      <th style="text-align: left">作用</th>
      <th style="text-align: left">读/写时机</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>IA32_STAR</strong></td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">0xC0000081</code></td>
      <td style="text-align: left"><strong><code class="language-plaintext highlighter-rouge">syscall</code>/<code class="language-plaintext highlighter-rouge">sysret</code> 各自的 CS、SS 怎么取</strong>由 <strong>Figure 5-14</strong> 规定的<strong>不同位域</strong>决定（<strong><code class="language-plaintext highlighter-rouge">syscall</code></strong> 用入核场、<strong><code class="language-plaintext highlighter-rouge">sysret</code>（长模式）</strong>用出核场；<strong>不是</strong>「高 32 位=内核段、低 32 位=用户段」这种对半分）</td>
      <td style="text-align: left">操作系统启动时写入一次</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>IA32_LSTAR</strong></td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">0xC0000082</code></td>
      <td style="text-align: left">存储<strong>内核入口地址</strong>：<br /><code class="language-plaintext highlighter-rouge">syscall</code> 指令执行后 RIP 跳转的目标</td>
      <td style="text-align: left">操作系统启动时写入一次</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>IA32_FMASK</strong></td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">0xC0000084</code></td>
      <td style="text-align: left">存储<strong>RFLAGS 掩码</strong>：<br />进入内核时，RFLAGS 中对应位被强制清零</td>
      <td style="text-align: left">操作系统启动时写入一次</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="辅助-msr">辅助 MSR</h4>

<p>还有一个<strong>前提条件</strong>相关的 MSR：</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">MSR 名称</th>
      <th style="text-align: left">地址</th>
      <th style="text-align: left">作用</th>
      <th style="text-align: left">说明</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>IA32_EFER</strong></td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">0xC0000080</code></td>
      <td style="text-align: left">第 0 位（SCE 位）必须为 1</td>
      <td style="text-align: left">否则 <code class="language-plaintext highlighter-rouge">syscall</code> 指令会触发 <code class="language-plaintext highlighter-rouge">#UD</code> 异常</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="一句话总结">一句话总结</h4>

<blockquote>
  <p><strong><code class="language-plaintext highlighter-rouge">IA32_STAR</code> 管“段”（权限），<code class="language-plaintext highlighter-rouge">IA32_LSTAR</code> 管“地址”（去哪），<code class="language-plaintext highlighter-rouge">IA32_FMASK</code> 管“标志位”（环境），三颗 MSR 配合 <code class="language-plaintext highlighter-rouge">IA32_EFER.SCE</code> 开关，共同决定了 <code class="language-plaintext highlighter-rouge">syscall</code> 的完整行为。</strong></p>
</blockquote>

<h3 id="端到端序列示意">端到端序列（示意）</h3>

<pre><code class="language-mermaid">sequenceDiagram
    participant User as 用户态进程
    participant CPU as CPU硬件
    participant Kernel as Linux内核
    User-&gt;&gt;User: RAX=nr，RDI/RSI/RDX/R10/R8/R9 为 arg0–arg5
    User-&gt;&gt;CPU: 执行 syscall
    CPU-&gt;&gt;CPU: RCX←返回点 RIP，R11←RFLAGS
    CPU-&gt;&gt;CPU: RIP←IA32_LSTAR；RFLAGS 按 IA32_FMASK 清零若干位
    CPU-&gt;&gt;Kernel: 进入 entry_SYSCALL_64
    Kernel-&gt;&gt;Kernel: swapgs，切内核栈，推 pt_regs
    Kernel-&gt;&gt;Kernel: do_syscall_64，x64_sys_call 按 nr 分发
    Kernel-&gt;&gt;Kernel: 写回 RAX 返回值或负 errno
    Kernel-&gt;&gt;Kernel: 可 SYSRET 则 SYSRET，否则 IRET
    CPU-&gt;&gt;User: 回到用户态，自 RCX 所指指令继续
</code></pre>

<h4 id="与上图步骤对应的内核代码linuxarchx86">与上图步骤对应的内核代码（<code class="language-plaintext highlighter-rouge">linux/arch/x86</code>）</h4>

<p>序列图里最前段由用户态约定（glibc / vDSO 等内联 <strong><code class="language-plaintext highlighter-rouge">syscall</code></strong>，见 <strong>man <code class="language-plaintext highlighter-rouge">syscall(2)</code></strong><sup id="fnref:7:1" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">9</a></sup>）；其后为 CPU 根据 <strong><code class="language-plaintext highlighter-rouge">IA32_LSTAR</code>/<code class="language-plaintext highlighter-rouge">IA32_FMASK</code>/<code class="language-plaintext highlighter-rouge">IA32_STAR</code></strong> 的行为，内核侧在启动时写 MSR（<strong><code class="language-plaintext highlighter-rouge">idt_syscall_init()</code></strong> 等，见 <strong>「内核源码摘录（与上表对应）」</strong> 与 <sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">10</a></sup>）。自 <strong><code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code> 起</strong> 按下述代码块列举，惯例与 <strong><code class="language-plaintext highlighter-rouge">/Users/weli/works/bootimage-example/LINUX_X86_64_ENTRY_AND_PT_REGS.md</code></strong> 一致：围栏第一行为 <strong><code class="language-plaintext highlighter-rouge">起始行:结束行:arch/…/文件</code></strong>（相对 <strong><code class="language-plaintext highlighter-rouge">linux/</code></strong> 源码树根；本文行号依 <strong><code class="language-plaintext highlighter-rouge">/Users/weli/works/linux</code></strong>）。</p>

<p><strong><code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code>（<code class="language-plaintext highlighter-rouge">arch/x86/entry/entry_64.S</code>）</strong> — <strong><code class="language-plaintext highlighter-rouge">IA32_LSTAR</code></strong> 指向此处：<strong><code class="language-plaintext highlighter-rouge">swapgs</code></strong>、装入 <strong><code class="language-plaintext highlighter-rouge">cpu_current_top_of_stack</code></strong>、<strong><code class="language-plaintext highlighter-rouge">pt_regs</code></strong> 布局压栈、<strong><code class="language-plaintext highlighter-rouge">PUSH_AND_CLEAR_REGS</code></strong>、<strong><code class="language-plaintext highlighter-rouge">movq %rsp,%rdi</code></strong> / <strong><code class="language-plaintext highlighter-rouge">movslq %eax,%rsi</code></strong>、<strong><code class="language-plaintext highlighter-rouge">call do_syscall_64</code></strong>。</p>

<pre><code class="language-87:121:arch/x86/entry/entry_64.S">SYM_CODE_START(entry_SYSCALL_64)
	UNWIND_HINT_ENTRY
	ENDBR

	swapgs
	/* tss.sp2 is scratch space. */
	movq	%rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
	SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp

SYM_INNER_LABEL(entry_SYSCALL_64_safe_stack, SYM_L_GLOBAL)
	ANNOTATE_NOENDBR

	/* Construct struct pt_regs on stack */
	pushq	$__USER_DS				/* pt_regs-&gt;ss */
	pushq	PER_CPU_VAR(cpu_tss_rw + TSS_sp2)	/* pt_regs-&gt;sp */
	pushq	%r11					/* pt_regs-&gt;flags */
	pushq	$__USER_CS				/* pt_regs-&gt;cs */
	pushq	%rcx					/* pt_regs-&gt;ip */
SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
	pushq	%rax					/* pt_regs-&gt;orig_ax */

	PUSH_AND_CLEAR_REGS rax=$-ENOSYS

	/* IRQs are off. */
	movq	%rsp, %rdi
	/* Sign extend the lower 32bit as syscall numbers are treated as int */
	movslq	%eax, %rsi

	/* clobbers %rax, make sure it is after saving the syscall nr */
	IBRS_ENTER
	UNTRAIN_RET
	CLEAR_BRANCH_HISTORY

	call	do_syscall_64		/* returns with IRQs disabled */
</code></pre>

<p><strong><code class="language-plaintext highlighter-rouge">do_syscall_64</code>（前半）、<code class="language-plaintext highlighter-rouge">do_syscall_x64</code>、<code class="language-plaintext highlighter-rouge">x64_sys_call</code>（<code class="language-plaintext highlighter-rouge">arch/x86/entry/syscall_64.c</code>）</strong> — 与上引 <strong>112–114</strong> 行入参一致；合法系统调用号下 <strong><code class="language-plaintext highlighter-rouge">regs-&gt;ax</code></strong> 在 <strong><code class="language-plaintext highlighter-rouge">do_syscall_x64</code> → <code class="language-plaintext highlighter-rouge">x64_sys_call</code></strong> 链上更新。</p>

<pre><code class="language-87:100:arch/x86/entry/syscall_64.c">__visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
{
	add_random_kstack_offset();
	nr = syscall_enter_from_user_mode(regs, nr);

	instrumentation_begin();

	if (!do_syscall_x64(regs, nr) &amp;&amp; !do_syscall_x32(regs, nr) &amp;&amp; nr != -1) {
		/* Invalid system call, but still a system call. */
		regs-&gt;ax = __x64_sys_ni_syscall(regs);
	}

	instrumentation_end();
	syscall_exit_to_user_mode(regs);
</code></pre>

<pre><code class="language-53:67:arch/x86/entry/syscall_64.c">static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
{
	/*
	 * Convert negative numbers to very high and thus out of range
	 * numbers for comparisons.
	 */
	unsigned int unr = nr;

	if (likely(unr &lt; NR_syscalls)) {
		unr = array_index_nospec(unr, NR_syscalls);
		regs-&gt;ax = x64_sys_call(regs, unr);
		return true;
	}
	return false;
}
</code></pre>

<pre><code class="language-34:41:arch/x86/entry/syscall_64.c">#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
{
	switch (nr) {
	#include &lt;asm/syscalls_64.h&gt;
	default: return __x64_sys_ni_syscall(regs);
	}
}
</code></pre>

<p><strong><code class="language-plaintext highlighter-rouge">__x64_sys_*</code> 原型、<code class="language-plaintext highlighter-rouge">sys_call_table[]</code>、生成 <code class="language-plaintext highlighter-rouge">syscalls_64.h</code></strong> — 各 <strong><code class="language-plaintext highlighter-rouge">__x64_sys_*</code></strong> 实现分布在 <strong><code class="language-plaintext highlighter-rouge">kernel/</code></strong>、<strong><code class="language-plaintext highlighter-rouge">fs/</code></strong> 等；编号表 <strong><code class="language-plaintext highlighter-rouge">arch/x86/entry/syscalls/syscall_64.tbl</code></strong>，<strong>Kbuild</strong> 生成 <strong><code class="language-plaintext highlighter-rouge">arch/x86/include/generated/asm/syscalls_64.h</code></strong>（<strong><code class="language-plaintext highlighter-rouge">$(out)</code></strong> 见下）。</p>

<pre><code class="language-12:14:arch/x86/entry/syscall_64.c">#define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
#define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
#include &lt;asm/syscalls_64.h&gt;
</code></pre>

<pre><code class="language-28:31:arch/x86/entry/syscall_64.c">#define __SYSCALL(nr, sym) __x64_##sym,
const sys_call_ptr_t sys_call_table[] = {
#include &lt;asm/syscalls_64.h&gt;
};
</code></pre>

<pre><code class="language-1:3:arch/x86/entry/syscalls/Makefile"># SPDX-License-Identifier: GPL-2.0
out := arch/$(SRCARCH)/include/generated/asm
uapi := arch/$(SRCARCH)/include/generated/uapi/asm
</code></pre>

<pre><code class="language-8:9:arch/x86/entry/syscalls/Makefile">syscall32 := $(src)/syscall_32.tbl
syscall64 := $(src)/syscall_64.tbl
</code></pre>

<pre><code class="language-53:55:arch/x86/entry/syscalls/Makefile">$(out)/syscalls_64.h: abis := common,64
$(out)/syscalls_64.h: $(syscall64) $(systbl) FORCE
	$(call if_changed,systbl)
</code></pre>

<p><strong><code class="language-plaintext highlighter-rouge">SYSRET</code> 快路径与 <code class="language-plaintext highlighter-rouge">IRET</code> 慢路径</strong> — <strong><code class="language-plaintext highlighter-rouge">do_syscall_64</code></strong> 末尾 <strong><code class="language-plaintext highlighter-rouge">return true</code></strong> 且 <strong><code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code></strong> 中 <strong><code class="language-plaintext highlighter-rouge">testb %al,%al</code></strong> 成功则 <strong><code class="language-plaintext highlighter-rouge">sysretq</code></strong>；否则 <strong><code class="language-plaintext highlighter-rouge">jmp</code> / <code class="language-plaintext highlighter-rouge">jz</code></strong> 汇入 <strong><code class="language-plaintext highlighter-rouge">swapgs_restore_regs_and_return_to_usermode</code></strong> 后经 <strong><code class="language-plaintext highlighter-rouge">iretq</code></strong>。</p>

<pre><code class="language-102:140:arch/x86/entry/syscall_64.c">	/*
	 * Check that the register state is valid for using SYSRET to exit
	 * to userspace.  Otherwise use the slower but fully capable IRET
	 * exit path.
	 */

	/* XEN PV guests always use the IRET path */
	if (cpu_feature_enabled(X86_FEATURE_XENPV))
		return false;

	/* SYSRET requires RCX == RIP and R11 == EFLAGS */
	if (unlikely(regs-&gt;cx != regs-&gt;ip || regs-&gt;r11 != regs-&gt;flags))
		return false;

	/* CS and SS must match the values set in MSR_STAR */
	if (unlikely(regs-&gt;cs != __USER_CS || regs-&gt;ss != __USER_DS))
		return false;

	if (unlikely(regs-&gt;ip &gt;= TASK_SIZE_MAX))
		return false;

	if (unlikely(regs-&gt;flags &amp; (X86_EFLAGS_RF | X86_EFLAGS_TF)))
		return false;

	/* Use SYSRET to exit to userspace */
	return true;
</code></pre>

<pre><code class="language-123:166:arch/x86/entry/entry_64.S">	/*
	 * Try to use SYSRET instead of IRET if we're returning to
	 * a completely clean 64-bit userspace context.  If we're not,
	 * go to the slow exit path.
	 * In the Xen PV case we must use iret anyway.
	 */

	ALTERNATIVE "testb %al, %al; jz swapgs_restore_regs_and_return_to_usermode", \
		"jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV

	/*
	 * We win! This label is here just for ease of understanding
	 * perf profiles. Nothing jumps here.
	 */
syscall_return_via_sysret:
	IBRS_EXIT
	POP_REGS pop_rdi=0

	/*
	 * Now all regs are restored except RSP and RDI.
	 * Save old stack pointer and switch to trampoline stack.
	 */
	movq	%rsp, %rdi
	movq	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
	UNWIND_HINT_END_OF_STACK

	pushq	RSP-RDI(%rdi)	/* RSP */
	pushq	(%rdi)		/* RDI */

	/*
	 * We are on the trampoline stack.  All regs except RDI are live.
	 * We can do future final exit work right here.
	 */
	STACKLEAK_ERASE_NOCLOBBER

	SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi

	popq	%rdi
	popq	%rsp
SYM_INNER_LABEL(entry_SYSRETQ_unsafe_stack, SYM_L_GLOBAL)
	ANNOTATE_NOENDBR
	swapgs
	CLEAR_CPU_BUFFERS
	sysretq
</code></pre>

<pre><code class="language-559:580:arch/x86/entry/entry_64.S">SYM_CODE_START_LOCAL(common_interrupt_return)
SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
	IBRS_EXIT
#ifdef CONFIG_XEN_PV
	ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
#endif
#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
	ALTERNATIVE "", "jmp .Lpti_restore_regs_and_return_to_usermode", X86_FEATURE_PTI
#endif

	STACKLEAK_ERASE
	POP_REGS
	add	$8, %rsp	/* orig_ax */
	UNWIND_HINT_IRET_REGS

.Lswapgs_and_iret:
	swapgs
	CLEAR_CPU_BUFFERS
	/* Assert that the IRET frame indicates user mode. */
	testb	$3, 8(%rsp)
	jnz	.Lnative_iret
	ud2
</code></pre>

<pre><code class="language-640:659:arch/x86/entry/entry_64.S">.Lnative_iret:
	UNWIND_HINT_IRET_REGS
	/*
	 * Are we returning to a stack segment from the LDT?  Note: in
	 * 64-bit mode SS:RSP on the exception stack is always valid.
	 */
#ifdef CONFIG_X86_ESPFIX64
	testb	$4, (SS-RIP)(%rsp)
	jnz	native_irq_return_ldt
#endif

SYM_INNER_LABEL(native_irq_return_iret, SYM_L_GLOBAL)
	ANNOTATE_NOENDBR // exc_double_fault
	/*
	 * This may fault.  Non-paranoid faults on return to userspace are
	 * handled by fixup_bad_iret.  These include #SS, #GP, and #NP.
	 * Double-faults due to espfix64 are handled in exc_double_fault.
	 * Other faults here are fatal.
	 */
	iretq
</code></pre>

<p>从 <strong><code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code></strong> 经 <strong><code class="language-plaintext highlighter-rouge">ALTERNATIVE</code></strong> 失败分支也会落到 <strong><code class="language-plaintext highlighter-rouge">swapgs_restore_regs_and_return_to_usermode</code></strong>，最终 <strong><code class="language-plaintext highlighter-rouge">iretq</code></strong>（上引 <strong>559–580</strong>、<strong>640–659</strong> 行；完整标签关系见 <sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">11</a></sup>）。</p>

<p>本地树路径：<strong><code class="language-plaintext highlighter-rouge">/Users/weli/works/linux</code></strong>（与主线 <code class="language-plaintext highlighter-rouge">torvalds/linux</code> 同源时行号一致；若你本地的 fork 有差异，以 <strong><code class="language-plaintext highlighter-rouge">git blame</code></strong> / 实际文件为准。）</p>

<h3 id="cpu-侧与-vol3a-588-等一致">CPU 侧（与 Vol.3A §5.8.8 等一致）</h3>

<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">RIP</code>（下一条指令）→ <code class="language-plaintext highlighter-rouge">RCX</code></strong>；<strong><code class="language-plaintext highlighter-rouge">RFLAGS</code> → <code class="language-plaintext highlighter-rouge">R11</code></strong><sup id="fnref:3:5" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>。</li>
  <li><strong><code class="language-plaintext highlighter-rouge">RIP</code></strong> 来自 <strong><code class="language-plaintext highlighter-rouge">IA32_LSTAR</code></strong>；<strong><code class="language-plaintext highlighter-rouge">CS</code>/<code class="language-plaintext highlighter-rouge">SS</code></strong> 的选择子与 <strong><code class="language-plaintext highlighter-rouge">IA32_STAR</code></strong> 的位域布局按 SDM Figure 5-14<sup id="fnref:3:6" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>。</li>
  <li><strong><code class="language-plaintext highlighter-rouge">RFLAGS &lt;- RFLAGS &amp; ~IA32_FMASK</code></strong>。Linux 在 <strong><code class="language-plaintext highlighter-rouge">arch/x86/kernel/cpu/common.c</code></strong> 的 <strong><code class="language-plaintext highlighter-rouge">idt_syscall_init()</code></strong> 中向 <strong><code class="language-plaintext highlighter-rouge">MSR_SYSCALL_MASK</code></strong> 写入含 <strong><code class="language-plaintext highlighter-rouge">X86_EFLAGS_IF</code></strong> 等位，使进入内核后 <strong><code class="language-plaintext highlighter-rouge">IF</code> 通常被清除</strong><sup id="fnref:3:7" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup><sup id="fnref:8:2" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">10</a></sup>。</li>
  <li><strong><code class="language-plaintext highlighter-rouge">SYSCALL</code> 不改变 <code class="language-plaintext highlighter-rouge">RSP</code></strong>；<strong><code class="language-plaintext highlighter-rouge">SYSRET</code> 也不恢复 <code class="language-plaintext highlighter-rouge">RSP</code></strong>，栈由内核显式管理<sup id="fnref:3:8" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup><sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>。</li>
</ol>

<p>同一节（§5.8.8）对 <code class="language-plaintext highlighter-rouge">SYSCALL</code>/<code class="language-plaintext highlighter-rouge">SYSRET</code> 的英文原文可对照如下<sup id="fnref:11:5" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">5</a></sup>：</p>

<blockquote>
  <p>For SYSCALL, the processor saves RFLAGS into R11 and the RIP of the next instruction into RCX; it then gets the privilege-level 0 target code segment, instruction pointer, stack segment, and flags as follows:</p>

  <p>Target instruction pointer — Reads a 64-bit address from IA32_LSTAR. (The WRMSR instruction ensures that the value of the IA32_LSTAR MSR is canonical.)<br />
Flags — The processor sets RFLAGS to the logical-AND of its current value with the complement of the value in the IA32_FMASK MSR.</p>
</blockquote>

<blockquote>
  <p>The SYSCALL instruction does not save the stack pointer, and the SYSRET instruction does not restore it. It is likely that the OS system-call handler will change the stack pointer from the user stack to the OS stack. If so, it is the responsibility of software first to save the user stack pointer.</p>
</blockquote>

<p>（手册在「gets the … as follows」之后对 <strong>Target code segment</strong>、<strong>Stack segment</strong> 等另有逐条说明，此处摘入与 <strong><code class="language-plaintext highlighter-rouge">LSTAR</code>/<code class="language-plaintext highlighter-rouge">FMASK</code></strong> 及 <strong>RSP</strong> 最直接相关的句子；完整列举见 <sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> 中 <strong>§5.8.8</strong> 与 <strong>Figure 5-14</strong>。）</p>

<h3 id="linux-侧源码锚点">Linux 侧（源码锚点）</h3>

<table>
  <thead>
    <tr>
      <th>内容</th>
      <th>文件与要点</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong><code class="language-plaintext highlighter-rouge">STAR</code>/<code class="language-plaintext highlighter-rouge">LSTAR</code>/<code class="language-plaintext highlighter-rouge">SYSCALL_MASK</code> 初始化</strong></td>
      <td><code class="language-plaintext highlighter-rouge">arch/x86/kernel/cpu/common.c</code>：<code class="language-plaintext highlighter-rouge">syscall_init()</code>、<code class="language-plaintext highlighter-rouge">idt_syscall_init()</code></td>
    </tr>
    <tr>
      <td><strong>入口汇编</strong></td>
      <td><code class="language-plaintext highlighter-rouge">arch/x86/entry/entry_64.S</code>：<code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code>（<code class="language-plaintext highlighter-rouge">swapgs</code>、<code class="language-plaintext highlighter-rouge">pt_regs</code>、<code class="language-plaintext highlighter-rouge">do_syscall_64</code>、若可则 <code class="language-plaintext highlighter-rouge">sysretq</code>）</td>
    </tr>
    <tr>
      <td><strong>C 分发与 <code class="language-plaintext highlighter-rouge">SYSRET</code>/<code class="language-plaintext highlighter-rouge">IRET</code> 判定</strong></td>
      <td><code class="language-plaintext highlighter-rouge">arch/x86/entry/syscall_64.c</code>：<code class="language-plaintext highlighter-rouge">do_syscall_64</code>、<code class="language-plaintext highlighter-rouge">x64_sys_call</code>；<strong><code class="language-plaintext highlighter-rouge">sys_call_table[]</code></strong> 仍存在于镜像中，<strong>主路径分发</strong>为 <strong><code class="language-plaintext highlighter-rouge">switch</code></strong></td>
    </tr>
  </tbody>
</table>

<h3 id="内核源码摘录与上表对应">内核源码摘录（与上表对应）</h3>

<p>下列片段与主线 Linux 树一致，便于和 SDM 对照阅读<sup id="fnref:8:3" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">10</a></sup><sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">11</a></sup><sup id="fnref:10:2" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">6</a></sup>。</p>

<p><code class="language-plaintext highlighter-rouge">arch/x86/kernel/cpu/common.c</code> — <code class="language-plaintext highlighter-rouge">idt_syscall_init()</code> 中写入 <strong><code class="language-plaintext highlighter-rouge">MSR_LSTAR</code></strong> 与 <strong><code class="language-plaintext highlighter-rouge">MSR_SYSCALL_MASK</code></strong>：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">inline</span> <span class="kt">void</span> <span class="nf">idt_syscall_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">wrmsrq</span><span class="p">(</span><span class="n">MSR_LSTAR</span><span class="p">,</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)</span><span class="n">entry_SYSCALL_64</span><span class="p">);</span>
	<span class="cm">/* ... IA32_SYSENTER_* and ia32_enabled() branches omitted ... */</span>
	<span class="cm">/*
	 * Flags to clear on syscall; clear as much as possible
	 * to minimize user space-kernel interference.
	 */</span>
	<span class="n">wrmsrq</span><span class="p">(</span><span class="n">MSR_SYSCALL_MASK</span><span class="p">,</span>
	       <span class="n">X86_EFLAGS_CF</span><span class="o">|</span><span class="n">X86_EFLAGS_PF</span><span class="o">|</span><span class="n">X86_EFLAGS_AF</span><span class="o">|</span>
	       <span class="n">X86_EFLAGS_ZF</span><span class="o">|</span><span class="n">X86_EFLAGS_SF</span><span class="o">|</span><span class="n">X86_EFLAGS_TF</span><span class="o">|</span>
	       <span class="n">X86_EFLAGS_IF</span><span class="o">|</span><span class="n">X86_EFLAGS_DF</span><span class="o">|</span><span class="n">X86_EFLAGS_OF</span><span class="o">|</span>
	       <span class="n">X86_EFLAGS_IOPL</span><span class="o">|</span><span class="n">X86_EFLAGS_NT</span><span class="o">|</span><span class="n">X86_EFLAGS_RF</span><span class="o">|</span>
	       <span class="n">X86_EFLAGS_AC</span><span class="o">|</span><span class="n">X86_EFLAGS_ID</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">arch/x86/entry/entry_64.S</code> — <code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code> 入口（硬件不压栈后，由这里构造 <strong><code class="language-plaintext highlighter-rouge">pt_regs</code></strong> 并调用 <strong><code class="language-plaintext highlighter-rouge">do_syscall_64</code></strong>）：</p>

<pre><code class="language-asm">SYM_CODE_START(entry_SYSCALL_64)
	swapgs
	movq	%rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
	SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
	/* Construct struct pt_regs on stack */
	pushq	$__USER_DS				/* pt_regs-&gt;ss */
	pushq	PER_CPU_VAR(cpu_tss_rw + TSS_sp2)	/* pt_regs-&gt;sp */
	pushq	%r11					/* pt_regs-&gt;flags */
	pushq	$__USER_CS				/* pt_regs-&gt;cs */
	pushq	%rcx					/* pt_regs-&gt;ip */
	pushq	%rax					/* pt_regs-&gt;orig_ax */
	PUSH_AND_CLEAR_REGS rax=$-ENOSYS
	movq	%rsp, %rdi
	movslq	%eax, %esi
	call	do_syscall_64		/* returns with IRQs disabled */
</code></pre>

<p><code class="language-plaintext highlighter-rouge">arch/x86/entry/syscall_64.c</code> — <strong><code class="language-plaintext highlighter-rouge">sys_call_table[]</code> 注释</strong>与 <strong><code class="language-plaintext highlighter-rouge">x64_sys_call()</code></strong> 的 <strong><code class="language-plaintext highlighter-rouge">switch</code></strong> 分发：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*
 * The sys_call_table[] is no longer used for system calls, but
 * kernel/trace/trace_syscalls.c still wants to know the system
 * call address.
 */</span>
<span class="cp">#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
</span><span class="kt">long</span> <span class="nf">x64_sys_call</span><span class="p">(</span><span class="k">const</span> <span class="k">struct</span> <span class="n">pt_regs</span> <span class="o">*</span><span class="n">regs</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">nr</span><span class="p">)</span>
<span class="p">{</span>
	<span class="k">switch</span> <span class="p">(</span><span class="n">nr</span><span class="p">)</span> <span class="p">{</span>
	<span class="cp">#include</span> <span class="cpf">&lt;asm/syscalls_64.h&gt;</span><span class="cp">
</span>	<span class="nl">default:</span> <span class="k">return</span> <span class="n">__x64_sys_ni_syscall</span><span class="p">(</span><span class="n">regs</span><span class="p">);</span>
	<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>同文件 <strong><code class="language-plaintext highlighter-rouge">do_syscall_64()</code></strong> — 前半dispatch、末尾返回值决定 <strong><code class="language-plaintext highlighter-rouge">SYSRET</code></strong> 与 <strong><code class="language-plaintext highlighter-rouge">IRET</code></strong>（以下与中版内核树连续片段一致，仅删去空白行以便排版）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Returns true to return using SYSRET, or false to use IRET */</span>
<span class="n">__visible</span> <span class="n">noinstr</span> <span class="n">bool</span> <span class="nf">do_syscall_64</span><span class="p">(</span><span class="k">struct</span> <span class="n">pt_regs</span> <span class="o">*</span><span class="n">regs</span><span class="p">,</span> <span class="kt">int</span> <span class="n">nr</span><span class="p">)</span>
<span class="p">{</span>
	<span class="n">add_random_kstack_offset</span><span class="p">();</span>
	<span class="n">nr</span> <span class="o">=</span> <span class="n">syscall_enter_from_user_mode</span><span class="p">(</span><span class="n">regs</span><span class="p">,</span> <span class="n">nr</span><span class="p">);</span>
	<span class="n">instrumentation_begin</span><span class="p">();</span>
	<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">do_syscall_x64</span><span class="p">(</span><span class="n">regs</span><span class="p">,</span> <span class="n">nr</span><span class="p">)</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="n">do_syscall_x32</span><span class="p">(</span><span class="n">regs</span><span class="p">,</span> <span class="n">nr</span><span class="p">)</span> <span class="o">&amp;&amp;</span> <span class="n">nr</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
		<span class="n">regs</span><span class="o">-&gt;</span><span class="n">ax</span> <span class="o">=</span> <span class="n">__x64_sys_ni_syscall</span><span class="p">(</span><span class="n">regs</span><span class="p">);</span>
	<span class="p">}</span>
	<span class="n">instrumentation_end</span><span class="p">();</span>
	<span class="n">syscall_exit_to_user_mode</span><span class="p">(</span><span class="n">regs</span><span class="p">);</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">cpu_feature_enabled</span><span class="p">(</span><span class="n">X86_FEATURE_XENPV</span><span class="p">))</span>
		<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">regs</span><span class="o">-&gt;</span><span class="n">cx</span> <span class="o">!=</span> <span class="n">regs</span><span class="o">-&gt;</span><span class="n">ip</span> <span class="o">||</span> <span class="n">regs</span><span class="o">-&gt;</span><span class="n">r11</span> <span class="o">!=</span> <span class="n">regs</span><span class="o">-&gt;</span><span class="n">flags</span><span class="p">))</span>
		<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">regs</span><span class="o">-&gt;</span><span class="n">cs</span> <span class="o">!=</span> <span class="n">__USER_CS</span> <span class="o">||</span> <span class="n">regs</span><span class="o">-&gt;</span><span class="n">ss</span> <span class="o">!=</span> <span class="n">__USER_DS</span><span class="p">))</span>
		<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">regs</span><span class="o">-&gt;</span><span class="n">ip</span> <span class="o">&gt;=</span> <span class="n">TASK_SIZE_MAX</span><span class="p">))</span>
		<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
	<span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">regs</span><span class="o">-&gt;</span><span class="n">flags</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">X86_EFLAGS_RF</span> <span class="o">|</span> <span class="n">X86_EFLAGS_TF</span><span class="p">)))</span>
		<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
	<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<hr />

<h2 id="主题三经-idt-的路径与-syscall-路径的性能与开销">主题三：经 IDT 的路径与 <code class="language-plaintext highlighter-rouge">SYSCALL</code> 路径的性能与开销</h2>

<p><strong><code class="language-plaintext highlighter-rouge">syscall</code> 相对 <code class="language-plaintext highlighter-rouge">int</code> + IDT 更快，主要不是因为“少查一次内存里的表”</strong>，而是因为 <strong><code class="language-plaintext highlighter-rouge">int</code> 走 IDT 门与异常/中断类交付</strong>，含 <strong>门与特权相关检查、中断帧布局</strong>，返回侧又常配合 <strong><code class="language-plaintext highlighter-rouge">IRET</code></strong>；<strong><code class="language-plaintext highlighter-rouge">SYSCALL</code>/<code class="language-plaintext highlighter-rouge">SYSRET</code></strong> 针对系统调用做了裁剪。内核里的 <strong>调用号分发</strong>发生在两条路径<strong>入核之后</strong>，不是整体差距的主因。</p>

<h3 id="路径对比示意">路径对比（示意）</h3>

<pre><code class="language-mermaid">graph TD
    subgraph 快路径_syscall
    A[用户态] --&gt;|syscall| B[CPU]
    B --&gt;|读 LSTAR/STAR/FMASK| C[内核入口 entry_SYSCALL_64]
    C --&gt;|do_syscall_64 + x64_sys_call| D[__x64_sys_*]
    end

    subgraph 传统路径_int0x80
    E[用户态] --&gt;|int 0x80| F[CPU]
    F --&gt;|经 IDT 向量门| G[中断门入口]
    G --&gt;|中断类交付与返回| H[内核处理]
    end
</code></pre>

<pre><code class="language-mermaid">graph TD
    subgraph 快路径_syscall
    A1[用户态] --&gt;|syscall| B1[CPU]
    B1 --&gt;|从 MSR 取入口| C1[内核入口]
    C1 --&gt;|软件分发| D1[__x64_sys_* 等]
    end

    subgraph 慢路径_int_idt
    E1[用户态] --&gt;|int 0x80| F1[CPU]
    F1 --&gt;|查 IDT| G1[IDT 门]
    G1 --&gt;|特权与栈检查 + 转入处理程序| H1[内核入口]
    H1 --&gt;|再做软件分发| I1[具体例程]
    end
</code></pre>

<h3 id="机制层对比">机制层对比</h3>

<table>
  <thead>
    <tr>
      <th style="text-align: left">特性</th>
      <th style="text-align: left"><strong><code class="language-plaintext highlighter-rouge">int 0x80</code> + IDT</strong></th>
      <th style="text-align: left"><strong><code class="language-plaintext highlighter-rouge">syscall</code> + MSR</strong></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">核心机制</td>
      <td style="text-align: left">软件中断，走 <strong>异常/中断类交付</strong></td>
      <td style="text-align: left"><strong>系统调用专用指令</strong></td>
    </tr>
    <tr>
      <td style="text-align: left">入口</td>
      <td style="text-align: left">CPU <strong>按向量查 IDT 门</strong></td>
      <td style="text-align: left">CPU <strong>从 MSR 取目标 <code class="language-plaintext highlighter-rouge">RIP</code> 等</strong></td>
    </tr>
    <tr>
      <td style="text-align: left">特权与门</td>
      <td style="text-align: left"><strong>DPL、门类型</strong> 等</td>
      <td style="text-align: left"><strong>不经同一套 IDT 门</strong></td>
    </tr>
    <tr>
      <td style="text-align: left">硬件保存的现场</td>
      <td style="text-align: left"><strong>中断/异常帧</strong>（含段与标志等，因事件与模式而异）</td>
      <td style="text-align: left"><strong>主要为 <code class="language-plaintext highlighter-rouge">RCX</code>/<code class="language-plaintext highlighter-rouge">R11</code> 的返回契约</strong></td>
    </tr>
    <tr>
      <td style="text-align: left">返回</td>
      <td style="text-align: left">常见 <strong><code class="language-plaintext highlighter-rouge">IRET</code></strong></td>
      <td style="text-align: left">条件满足时 <strong><code class="language-plaintext highlighter-rouge">SYSRET</code></strong>，否则 <strong><code class="language-plaintext highlighter-rouge">IRET</code></strong></td>
    </tr>
  </tbody>
</table>

<h3 id="单次查表与整条路径">单次查表与整条路径</h3>

<p><strong>硬件对 IDT 的一次访问</strong>与 <strong>内核对 <code class="language-plaintext highlighter-rouge">switch (nr)</code> 的几条指令</strong>各自都很快；差别主要来自 <strong>整条入核/出核</strong>：多保存了哪些状态、是否经过 <strong>IDT 门语义</strong>、返回是 <strong><code class="language-plaintext highlighter-rouge">IRET</code> 全功能</strong>还是 <strong><code class="language-plaintext highlighter-rouge">SYSRET</code> 窄契约</strong>、以及 Linux 在出口是否 <strong>回退到 <code class="language-plaintext highlighter-rouge">IRET</code></strong>。</p>

<h3 id="入核与出核int-0x80-与-syscall-的步骤对照">入核与出核：<code class="language-plaintext highlighter-rouge">int 0x80</code> 与 <code class="language-plaintext highlighter-rouge">syscall</code> 的步骤对照</h3>

<p>下表沿用在 <strong>IDT + <code class="language-plaintext highlighter-rouge">IRET</code></strong> 与 <strong><code class="language-plaintext highlighter-rouge">SYSCALL</code> + <code class="language-plaintext highlighter-rouge">SYSRET</code>（及 Linux 可能回退的 <code class="language-plaintext highlighter-rouge">IRET</code>）</strong> 之间做对照的常见写法；其中 <strong><code class="language-plaintext highlighter-rouge">int</code> 路径的栈帧</strong>以 <strong>64 位长模式</strong>下向内核栈压入的字段为准（<strong>SS、RSP、RFLAGS、CS、RIP</strong> 及可能的错误码等）<sup id="fnref:1:4" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>，与 legacy 保护模式下部分教材中的“多段寄存器”示意图并不完全同形。</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">动作</th>
      <th style="text-align: left"><strong><code class="language-plaintext highlighter-rouge">int 0x80</code>（经 IDT，<code class="language-plaintext highlighter-rouge">IRET</code> 返回）</strong></th>
      <th style="text-align: left"><strong><code class="language-plaintext highlighter-rouge">syscall</code>（<code class="language-plaintext highlighter-rouge">SYSRET</code> 快路径；条件不满足则 <code class="language-plaintext highlighter-rouge">IRET</code>）</strong></th>
      <th style="text-align: left">性能与实现上的含义</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>特权级切换</strong></td>
      <td style="text-align: left">Ring 3 → Ring 0</td>
      <td style="text-align: left">Ring 3 → Ring 0</td>
      <td style="text-align: left"><strong>两者都必须发生</strong>；不是时间差的主要来源。</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>栈切换</strong></td>
      <td style="text-align: left">与 <strong>TSS / IST</strong> 等绑定的 <strong>中断交付</strong> 语义下切到 <strong>内核栈</strong></td>
      <td style="text-align: left"><strong><code class="language-plaintext highlighter-rouge">swapgs</code></strong>，再由软件把 <strong><code class="language-plaintext highlighter-rouge">RSP</code></strong> 切到 <strong>per-CPU 内核栈顶</strong><sup id="fnref:9:2" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">11</a></sup></td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">int</code> 走通用中断模型的硬件路径；<code class="language-plaintext highlighter-rouge">syscall</code> 由内核显式维护 <strong><code class="language-plaintext highlighter-rouge">RSP</code></strong>，与 <strong>“<code class="language-plaintext highlighter-rouge">SYSCALL</code> 不改 <code class="language-plaintext highlighter-rouge">RSP</code>”</strong> 的硬件契约一致<sup id="fnref:3:9" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>。</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>硬件自动保存</strong></td>
      <td style="text-align: left"><strong>向栈压中断帧</strong>（长模式典型含 <strong>SS、RSP、RFLAGS、CS、RIP</strong>；另视向量压错误码）<sup id="fnref:1:5" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></td>
      <td style="text-align: left"><strong>不向栈压帧</strong>；仅用 <strong><code class="language-plaintext highlighter-rouge">RCX</code>/<code class="language-plaintext highlighter-rouge">R11</code></strong> 等约定配合 <strong>MSR</strong> 改变 <strong><code class="language-plaintext highlighter-rouge">RIP</code>/特权级/<code class="language-plaintext highlighter-rouge">RFLAGS</code> 掩码</strong><sup id="fnref:3:10" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">int</code> 在硬件一侧完成较多现场记录；<code class="language-plaintext highlighter-rouge">syscall</code> 把栈上工作留到 <strong><code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code></strong>。</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>软件补全现场</strong></td>
      <td style="text-align: left">入口例程继续保存其余寄存器、建 <strong><code class="language-plaintext highlighter-rouge">pt_regs</code></strong></td>
      <td style="text-align: left"><strong><code class="language-plaintext highlighter-rouge">PUSH_AND_CLEAR_REGS</code> 等</strong>补齐 <strong><code class="language-plaintext highlighter-rouge">pt_regs</code></strong><sup id="fnref:9:3" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">11</a></sup></td>
      <td style="text-align: left">进入 <strong>C 分发</strong>前，两条路径通常都要把通用寄存器镜像补全。</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>权限 / 门检查</strong></td>
      <td style="text-align: left"><strong>IDT 门</strong>的 <strong>DPL、类型</strong> 等与 <strong><code class="language-plaintext highlighter-rouge">INT n</code></strong> 相关的一致检查</td>
      <td style="text-align: left"><strong>不经</strong>与 <strong><code class="language-plaintext highlighter-rouge">int</code> 同一条</strong> 门描述符路径</td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">int</code> 多一层 <strong>IDT 门禁</strong> 语义的固定成本。</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>返回时现场恢复</strong></td>
      <td style="text-align: left"><strong><code class="language-plaintext highlighter-rouge">IRET</code></strong> 从栈帧恢复 <strong>SS、RSP、RFLAGS、CS、RIP</strong> 等</td>
      <td style="text-align: left"><strong><code class="language-plaintext highlighter-rouge">SYSRET</code></strong>：<strong><code class="language-plaintext highlighter-rouge">RIP←RCX</code>、<code class="language-plaintext highlighter-rouge">RFLAGS←R11</code></strong>（窄）；否则走 <strong><code class="language-plaintext highlighter-rouge">IRET</code></strong><sup id="fnref:10:3" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">6</a></sup></td>
      <td style="text-align: left"><strong><code class="language-plaintext highlighter-rouge">IRET</code></strong> 通用、重；<strong><code class="language-plaintext highlighter-rouge">SYSRET</code></strong> 轻，但 Linux 在 <strong><code class="language-plaintext highlighter-rouge">do_syscall_64</code></strong> 中细查与 <strong><code class="language-plaintext highlighter-rouge">SYSRET</code> 契约</strong>是否仍可满足<sup id="fnref:10:4" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">6</a></sup>。</td>
    </tr>
  </tbody>
</table>

<p>同一组维度在 <strong><code class="language-plaintext highlighter-rouge">syscall</code> 专题</strong>里也可以压缩理解：宏观上都要完成 <strong>ring 切换与寄存器约定</strong>，微观上 <strong><code class="language-plaintext highlighter-rouge">SYSCALL</code>/<code class="language-plaintext highlighter-rouge">SYSRET</code> 把可由专用指令“包办”的部分收紧</strong>，<strong><code class="language-plaintext highlighter-rouge">int</code>/IDT/<code class="language-plaintext highlighter-rouge">IRET</code></strong> 为覆盖全体中断/异常类型保留更宽的默认行为。</p>

<h4 id="与上表对应的三个技术要点64-位长模式">与上表对应的三个技术要点（64 位长模式）</h4>

<p>以下三点承接 <strong>上文「入核与出核」对照表</strong>，用语与 <strong>IA-32e 长模式</strong> 下的栈帧布局及当前 <strong>Linux <code class="language-plaintext highlighter-rouge">arch/x86/entry</code></strong> 实现一致。</p>

<ul>
  <li>
    <p><strong>硬件保存的寄存器现场不同</strong>
 <strong><code class="language-plaintext highlighter-rouge">INT n</code> 经 IDT</strong> 时走 <strong>通用中断/异常交付</strong>：在 <strong>64 位长模式</strong>下，CPU 向 <strong>当前特权级 0 栈</strong> 压入 <strong>SS、RSP、RFLAGS、CS、RIP</strong> 及视向量而定的 <strong>错误码</strong> 等，与同一条 <strong>IRET</strong> 恢复约定兼容、并由<strong>全体 IDT 向量</strong>共享这一框架<sup id="fnref:1:6" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>。<strong><code class="language-plaintext highlighter-rouge">SYSCALL</code></strong> <strong>不向栈压帧</strong>，仅用 <strong><code class="language-plaintext highlighter-rouge">RCX</code>、<code class="language-plaintext highlighter-rouge">R11</code></strong> 分别保留 <strong><code class="language-plaintext highlighter-rouge">RIP</code>、<code class="language-plaintext highlighter-rouge">RFLAGS</code> 的返回契约信息</strong>；通用寄存器与 <strong><code class="language-plaintext highlighter-rouge">RSP</code> 等</strong>由 <strong><code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code></strong> 等 <strong>软件路径</strong> 写入 <strong><code class="language-plaintext highlighter-rouge">struct pt_regs</code></strong><sup id="fnref:3:11" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup><sup id="fnref:9:4" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">11</a></sup>。</p>
  </li>
  <li>
    <p><strong>是否经过 IDT 与 DPL 检查</strong>
 <strong><code class="language-plaintext highlighter-rouge">INT n</code></strong> 根据 <strong>门描述符</strong> 做 <strong>DPL、门类型</strong> 等与 <strong>软件中断</strong> 相关的一致性检查<sup id="fnref:1:7" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>。<strong><code class="language-plaintext highlighter-rouge">SYSCALL</code></strong> <strong>不读取 IDT 门</strong>：<strong>CPL 0 入口 <code class="language-plaintext highlighter-rouge">RIP</code></strong>、<strong>段与 <code class="language-plaintext highlighter-rouge">RFLAGS</code> 掩码</strong>由 <strong><code class="language-plaintext highlighter-rouge">IA32_LSTAR</code>、<code class="language-plaintext highlighter-rouge">IA32_STAR</code>、<code class="language-plaintext highlighter-rouge">IA32_FMASK</code></strong> 及 <strong><code class="language-plaintext highlighter-rouge">IA32_EFER.SCE</code></strong> 预先约定<sup id="fnref:3:12" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup><sup id="fnref:11:6" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">5</a></sup>；<strong>合法性</strong>依赖 <strong>OS 对这些 MSR 与 GDT 项的初始化</strong>以及内核入口实现。</p>
  </li>
  <li>
    <p><strong>返回路径的恢复范围</strong>
 <strong><code class="language-plaintext highlighter-rouge">IRET</code></strong> 从栈上 <strong>中断帧</strong> 恢复 <strong>SS、RSP、RFLAGS、CS、RIP</strong> 等，<strong>语义覆盖完整</strong><sup id="fnref:1:8" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>。<strong><code class="language-plaintext highlighter-rouge">SYSRET</code></strong>（长模式下 <strong><code class="language-plaintext highlighter-rouge">REX.W</code></strong>）在契约成立时仅从 <strong><code class="language-plaintext highlighter-rouge">RCX</code>、<code class="language-plaintext highlighter-rouge">R11</code></strong> 恢复 <strong><code class="language-plaintext highlighter-rouge">RIP</code>、<code class="language-plaintext highlighter-rouge">RFLAGS</code></strong>，<strong>用户态 <code class="language-plaintext highlighter-rouge">CS</code>/<code class="language-plaintext highlighter-rouge">SS</code></strong> 按 <strong><code class="language-plaintext highlighter-rouge">IA32_STAR</code></strong> 出核位域装载<sup id="fnref:4:2" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>。<strong>Linux</strong> 在 <strong><code class="language-plaintext highlighter-rouge">do_syscall_64</code></strong> 中若判定 <strong><code class="language-plaintext highlighter-rouge">SYSRET</code> 契约</strong>不成立或须走通用返回路径，则 <strong>改用 <code class="language-plaintext highlighter-rouge">IRET</code></strong><sup id="fnref:10:5" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">6</a></sup>。</p>
  </li>
</ul>

<h3 id="数量级举例">数量级举例</h3>

<p>在常见 x86-64 桌面平台上，对 <strong><code class="language-plaintext highlighter-rouge">getpid</code> 类极短系统调用</strong>做周期计数，<strong><code class="language-plaintext highlighter-rouge">int 0x80</code></strong> 有时可达约 <strong>二百周期</strong>量级，<strong><code class="language-plaintext highlighter-rouge">syscall</code></strong> 多在约 <strong>数十至百余周期</strong>量级，可差数倍。结果强依赖 <strong>CPU、微架构、是否实际走 <code class="language-plaintext highlighter-rouge">SYSRET</code> 与测量方法</strong>；定量的结论应在目标机上用 <strong><code class="language-plaintext highlighter-rouge">perf</code> 等</strong>重复测量。</p>

<h3 id="小结">小结</h3>

<ul>
  <li><strong>IDT</strong>：通用 <strong>事件交付</strong> 机制，优先保证覆盖面与一致性，<strong>不以最短系统调用为唯一目标</strong>。</li>
  <li><strong>系统调用分发</strong>：<strong><code class="language-plaintext highlighter-rouge">x64_sys_call</code> 的 <code class="language-plaintext highlighter-rouge">switch</code></strong> 为主路径；<strong><code class="language-plaintext highlighter-rouge">sys_call_table[]</code></strong> 仍服务 <strong>观测/枚举</strong> 等需求；二者都在 <strong><code class="language-plaintext highlighter-rouge">syscall</code> 已进核之后</strong> 执行。</li>
  <li><strong><code class="language-plaintext highlighter-rouge">SYSCALL</code> + MSR</strong>：系统调用 <strong>专用</strong>硬件入口协议；真正缩短的是 <strong>经 MSR 的入核与在条件允许时的 <code class="language-plaintext highlighter-rouge">SYSRET</code> 返回</strong>，不是“少做一次 C 层分发”。</li>
  <li><strong>Linux</strong>：即便从 <strong><code class="language-plaintext highlighter-rouge">syscall</code></strong> 入核，仍可能在出口选用 <strong><code class="language-plaintext highlighter-rouge">IRET</code></strong>，与 <strong><code class="language-plaintext highlighter-rouge">SYSRET</code> 契约</strong>及历史、安全问题有关<sup id="fnref:10:6" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">6</a></sup>。</li>
</ul>

<hr />

<h2 id="建议的自修顺序">建议的自修顺序</h2>

<ul>
  <li>SDM：<strong>中断/异常与 IDT</strong>、<strong><code class="language-plaintext highlighter-rouge">SYSCALL</code>/<code class="language-plaintext highlighter-rouge">SYSRET</code></strong>。</li>
  <li>Linux：<strong><code class="language-plaintext highlighter-rouge">common.c</code>（MSR）→ <code class="language-plaintext highlighter-rouge">entry_64.S</code> → <code class="language-plaintext highlighter-rouge">syscall_64.c</code></strong>。</li>
  <li>对照阅读：<code class="language-plaintext highlighter-rouge">entry_64.S</code> 与 <code class="language-plaintext highlighter-rouge">syscall_64.c</code>，结合文末 References。</li>
</ul>

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html">Intel® 64 and IA-32 Architectures SDM — Combined Volumes</a> - 官方总入口（含 Volume 3 系统编程）；文中 IDT 64-bit 描述与中断/异常机制以此为准 <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:1:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:1:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:1:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a> <a href="#fnref:1:7" class="reversefootnote" role="doc-backlink">&#8617;<sup>8</sup></a> <a href="#fnref:1:8" class="reversefootnote" role="doc-backlink">&#8617;<sup>9</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://wiki.osdev.org/Interrupt_Descriptor_Table">OSDev Wiki — Interrupt Descriptor Table</a> - IDT 结构与模式差异的教学索引 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://www.felixcloutier.com/x86/syscall">x86 Instruction Reference — SYSCALL</a> - 指令级语义（<code class="language-plaintext highlighter-rouge">RCX</code>/<code class="language-plaintext highlighter-rouge">R11</code>、<code class="language-plaintext highlighter-rouge">LSTAR</code>、<code class="language-plaintext highlighter-rouge">FMASK</code>、<code class="language-plaintext highlighter-rouge">RSP</code> 不保存） <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:3:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:3:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:3:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:3:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a> <a href="#fnref:3:7" class="reversefootnote" role="doc-backlink">&#8617;<sup>8</sup></a> <a href="#fnref:3:8" class="reversefootnote" role="doc-backlink">&#8617;<sup>9</sup></a> <a href="#fnref:3:9" class="reversefootnote" role="doc-backlink">&#8617;<sup>10</sup></a> <a href="#fnref:3:10" class="reversefootnote" role="doc-backlink">&#8617;<sup>11</sup></a> <a href="#fnref:3:11" class="reversefootnote" role="doc-backlink">&#8617;<sup>12</sup></a> <a href="#fnref:3:12" class="reversefootnote" role="doc-backlink">&#8617;<sup>13</sup></a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://www.felixcloutier.com/x86/sysret">x86 Instruction Reference — SYSRET</a> - <code class="language-plaintext highlighter-rouge">SYSRET</code> 返回语义与 <code class="language-plaintext highlighter-rouge">RSP</code> 处理约束 <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:4:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p>正文所引 <strong>Intel SDM 英文原文</strong>出自 <em>Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A: System Programming Guide, Part 1</em>（约 <strong>§6.14</strong> 64-bit IDT gate、<strong>§5.8.8</strong> <code class="language-plaintext highlighter-rouge">SYSCALL</code>/<code class="language-plaintext highlighter-rouge">SYSRET</code>）；完整手册见 <sup id="fnref:1:9" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> 的官方下载入口 <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:11:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:11:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:11:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:11:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:11:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:11:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/entry/syscall_64.c">Linux Source — arch/x86/entry/syscall_64.c</a> - <code class="language-plaintext highlighter-rouge">do_syscall_64</code>、<code class="language-plaintext highlighter-rouge">x64_sys_call</code> 与 <code class="language-plaintext highlighter-rouge">SYSRET/IRET</code> 判定 <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:10:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:10:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:10:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:10:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:10:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://www.kernel.org/doc/html/latest/arch/x86/entry_64.html">Linux Kernel Documentation — entry_64</a> - x86 多入口说明（含 <code class="language-plaintext highlighter-rouge">entry_INT80_compat</code>、<code class="language-plaintext highlighter-rouge">system_call</code> 等） <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://www.felixcloutier.com/x86/sysenter">Intel x86 Instruction Set Reference — SYSENTER</a> - <code class="language-plaintext highlighter-rouge">SYSENTER/SYSEXIT</code> 的历史快速调用路径 <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://man7.org/linux/man-pages/man2/syscall.2.html">man7 — syscall(2)</a> - Linux 用户态系统调用 ABI 与调用约定说明 <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:7:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/cpu/common.c">Linux Source — arch/x86/kernel/cpu/common.c</a> - <code class="language-plaintext highlighter-rouge">syscall_init()</code> / <code class="language-plaintext highlighter-rouge">idt_syscall_init()</code> 与 <code class="language-plaintext highlighter-rouge">MSR_SYSCALL_MASK</code> 初始化 <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:8:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:8:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/entry/entry_64.S">Linux Source — arch/x86/entry/entry_64.S</a> - <code class="language-plaintext highlighter-rouge">entry_SYSCALL_64</code> 路径（<code class="language-plaintext highlighter-rouge">swapgs</code>、<code class="language-plaintext highlighter-rouge">pt_regs</code>、<code class="language-plaintext highlighter-rouge">sysretq</code>） <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:9:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:9:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:9:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">RDD 编程模型：从 Bash 脚本到分布式数据集的技术映射</title><link href="https://weinan.io/2026/03/29/rdd-bash-mapreduce-spark.html" rel="alternate" type="text/html" title="RDD 编程模型：从 Bash 脚本到分布式数据集的技术映射" /><published>2026-03-29T00:00:00+00:00</published><updated>2026-03-29T00:00:00+00:00</updated><id>https://weinan.io/2026/03/29/rdd-bash-mapreduce-spark</id><content type="html" xml:base="https://weinan.io/2026/03/29/rdd-bash-mapreduce-spark.html"><![CDATA[<p>RDD（Resilient Distributed Dataset，弹性分布式数据集）是 Apache Spark 的核心抽象。本文通过将 RDD 编程模型与经典的 Bash 脚本管道、MapReduce 计算范式进行系统类比，帮助开发者建立从单机脚本思维到分布式数据处理的平滑过渡。文章涵盖执行模型、操作分类、容错机制及实际代码对比。</p>

<hr />

<h2 id="1-引言">1. 引言</h2>

<p>在单机环境中，Bash 脚本通过管道组合文本处理工具（如 <code class="language-plaintext highlighter-rouge">grep</code>、<code class="language-plaintext highlighter-rouge">sort</code>、<code class="language-plaintext highlighter-rouge">uniq</code>、<code class="language-plaintext highlighter-rouge">wc</code>）完成数据处理任务。在分布式环境中，RDD 提供了类似的函数式 API，但将执行扩展到集群，并引入了<strong>惰性求值</strong>与<strong>容错机制</strong>。</p>

<p>理解 RDD 的一种有效方式是将其视为「<strong>分布式版的 Bash 管道</strong>」，其中每个命令对应一个转换操作，管道的末端对应一个触发执行的动作。</p>

<hr />

<h2 id="2-核心概念映射">2. 核心概念映射</h2>

<h3 id="21-执行模型对比">2.1 执行模型对比</h3>

<table>
  <thead>
    <tr>
      <th>概念</th>
      <th>Bash</th>
      <th>RDD</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>数据源</td>
      <td>文件、标准输入</td>
      <td><code class="language-plaintext highlighter-rouge">textFile()</code>、<code class="language-plaintext highlighter-rouge">parallelize()</code></td>
    </tr>
    <tr>
      <td>中间结果</td>
      <td>管道传递或临时文件</td>
      <td>RDD 引用，可缓存</td>
    </tr>
    <tr>
      <td>操作类型</td>
      <td>立即执行的命令</td>
      <td>转换（Transformation）与动作（Action）</td>
    </tr>
    <tr>
      <td>执行触发</td>
      <td>命令输入即执行</td>
      <td>动作调用时触发 DAG 执行</td>
    </tr>
    <tr>
      <td>并行性</td>
      <td>单进程，需手动 <code class="language-plaintext highlighter-rouge">&amp;</code></td>
      <td>自动分片并行</td>
    </tr>
    <tr>
      <td>容错</td>
      <td>脚本退出或重试</td>
      <td>基于血缘（Lineage）自动重建</td>
    </tr>
  </tbody>
</table>

<h3 id="22-操作类比">2.2 操作类比</h3>

<table>
  <thead>
    <tr>
      <th>功能</th>
      <th>Bash</th>
      <th>RDD</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>过滤行</td>
      <td><code class="language-plaintext highlighter-rouge">grep pattern</code></td>
      <td><code class="language-plaintext highlighter-rouge">filter(_.contains(pattern))</code></td>
    </tr>
    <tr>
      <td>提取字段</td>
      <td><code class="language-plaintext highlighter-rouge">cut -d',' -f2</code></td>
      <td><code class="language-plaintext highlighter-rouge">map(_.split(",")(1))</code></td>
    </tr>
    <tr>
      <td>排序</td>
      <td><code class="language-plaintext highlighter-rouge">sort</code></td>
      <td><code class="language-plaintext highlighter-rouge">sortBy()</code></td>
    </tr>
    <tr>
      <td>聚合计数</td>
      <td><code class="language-plaintext highlighter-rouge">uniq -c</code></td>
      <td><code class="language-plaintext highlighter-rouge">reduceByKey(_ + _)</code></td>
    </tr>
    <tr>
      <td>限制输出</td>
      <td><code class="language-plaintext highlighter-rouge">head -n</code></td>
      <td><code class="language-plaintext highlighter-rouge">take(n)</code></td>
    </tr>
    <tr>
      <td>保存结果</td>
      <td><code class="language-plaintext highlighter-rouge">&gt; output.txt</code></td>
      <td><code class="language-plaintext highlighter-rouge">saveAsTextFile(path)</code></td>
    </tr>
    <tr>
      <td>变量存储</td>
      <td><code class="language-plaintext highlighter-rouge">var=$(command)</code></td>
      <td><code class="language-plaintext highlighter-rouge">val rdd = transformation</code></td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="3-示例分析web-访问日志处理">3. 示例分析：Web 访问日志处理</h2>

<h3 id="31-业务场景">3.1 业务场景</h3>

<p>分析 Web 服务器日志，统计状态码为 404 的请求中，出现次数最多的前 5 个 URL 路径。</p>

<h3 id="32-bash-脚本实现">3.2 Bash 脚本实现</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 过滤状态码为404的行，提取URL路径，统计并排序</span>
<span class="nb">grep</span> <span class="s2">" 404 "</span> access.log | <span class="se">\</span>
<span class="nb">awk</span> <span class="s1">'{print $7}'</span> | <span class="se">\</span>
<span class="nb">sort</span> | <span class="se">\</span>
<span class="nb">uniq</span> <span class="nt">-c</span> | <span class="se">\</span>
<span class="nb">sort</span> <span class="nt">-nr</span> | <span class="se">\</span>
<span class="nb">head</span> <span class="nt">-5</span>
</code></pre></div></div>

<p><strong>执行特点</strong>：</p>

<ul>
  <li>每条命令立即执行</li>
  <li>中间结果通过管道在内存中传递</li>
  <li>单机顺序处理</li>
</ul>

<h3 id="33-rdd-实现">3.3 RDD 实现</h3>

<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nv">logRDD</span> <span class="k">=</span> <span class="nv">sc</span><span class="o">.</span><span class="py">textFile</span><span class="o">(</span><span class="s">"hdfs://cluster/logs/access.log"</span><span class="o">)</span>

<span class="k">val</span> <span class="nv">top404Urls</span> <span class="k">=</span> <span class="n">logRDD</span>
  <span class="o">.</span><span class="py">filter</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="nv">line</span><span class="o">.</span><span class="py">contains</span><span class="o">(</span><span class="s">" 404 "</span><span class="o">))</span>          <span class="c1">// 等价于 grep</span>
  <span class="o">.</span><span class="py">map</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="nv">line</span><span class="o">.</span><span class="py">split</span><span class="o">(</span><span class="s">" "</span><span class="o">)(</span><span class="mi">6</span><span class="o">))</span>                 <span class="c1">// 等价于 awk，提取URL</span>
  <span class="o">.</span><span class="py">map</span><span class="o">(</span><span class="n">url</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">url</span><span class="o">,</span> <span class="mi">1</span><span class="o">))</span>                            <span class="c1">// 准备计数</span>
  <span class="o">.</span><span class="py">reduceByKey</span><span class="o">(</span><span class="k">_</span> <span class="o">+</span> <span class="k">_</span><span class="o">)</span>                              <span class="c1">// 等价于 uniq -c</span>
  <span class="o">.</span><span class="py">map</span><span class="o">(</span><span class="nv">_</span><span class="o">.</span><span class="py">swap</span><span class="o">)</span>                                     <span class="c1">// 交换键值以便排序</span>
  <span class="o">.</span><span class="py">sortByKey</span><span class="o">(</span><span class="n">ascending</span> <span class="k">=</span> <span class="kc">false</span><span class="o">)</span>                    <span class="c1">// 等价于 sort -nr</span>
  <span class="o">.</span><span class="py">take</span><span class="o">(</span><span class="mi">5</span><span class="o">)</span>                                         <span class="c1">// 等价于 head -5</span>

<span class="nv">top404Urls</span><span class="o">.</span><span class="py">foreach</span><span class="o">(</span><span class="n">println</span><span class="o">)</span>
</code></pre></div></div>

<p><strong>执行特点</strong>：</p>

<ul>
  <li>所有转换（filter、map、reduceByKey）构建 DAG，不立即执行</li>
  <li><code class="language-plaintext highlighter-rouge">take(5)</code> 作为动作触发分布式计算</li>
  <li>数据自动分片，并行处理</li>
  <li>节点故障时自动基于血缘重算</li>
</ul>

<hr />

<h2 id="4-执行机制深入">4. 执行机制深入</h2>

<h3 id="41-惰性求值lazy-evaluation">4.1 惰性求值（Lazy Evaluation）</h3>

<p>Bash 采用<strong>渴望求值</strong>（Eager Evaluation），每个命令立即执行：</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 立即执行 grep，再执行 wc</span>
<span class="nb">grep</span> <span class="s2">"ERROR"</span> app.log | <span class="nb">wc</span> <span class="nt">-l</span>
</code></pre></div></div>

<p>RDD 采用<strong>惰性求值</strong>，只有动作调用时才执行：</p>

<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nv">errors</span> <span class="k">=</span> <span class="nv">logRDD</span><span class="o">.</span><span class="py">filter</span><span class="o">(</span><span class="nv">_</span><span class="o">.</span><span class="py">contains</span><span class="o">(</span><span class="s">"ERROR"</span><span class="o">))</span>  <span class="c1">// 仅记录转换</span>
<span class="k">val</span> <span class="nv">count</span> <span class="k">=</span> <span class="nv">errors</span><span class="o">.</span><span class="py">count</span><span class="o">()</span>                       <span class="c1">// 触发执行</span>
</code></pre></div></div>

<p><strong>优势</strong>：</p>

<ul>
  <li>允许执行计划优化（如谓词下推）</li>
  <li>避免不必要的数据扫描</li>
  <li>支持中间结果缓存复用</li>
</ul>

<h3 id="42-缓存机制类比">4.2 缓存机制类比</h3>

<table>
  <thead>
    <tr>
      <th>Bash</th>
      <th>RDD</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>中间结果写入临时文件</td>
      <td><code class="language-plaintext highlighter-rouge">rdd.cache()</code> 或 <code class="language-plaintext highlighter-rouge">rdd.persist()</code></td>
    </tr>
    <tr>
      <td>复用需重新读取文件</td>
      <td>缓存保留在内存/磁盘供后续复用</td>
    </tr>
    <tr>
      <td>手动清理临时文件</td>
      <td>自动 LRU 或显式 <code class="language-plaintext highlighter-rouge">unpersist()</code></td>
    </tr>
  </tbody>
</table>

<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nv">intermediate</span> <span class="k">=</span> <span class="nv">logRDD</span><span class="o">.</span><span class="py">filter</span><span class="o">(</span><span class="nv">_</span><span class="o">.</span><span class="py">contains</span><span class="o">(</span><span class="s">"404"</span><span class="o">))</span>
<span class="nv">intermediate</span><span class="o">.</span><span class="py">cache</span><span class="o">()</span>                         <span class="c1">// 类似写入临时文件</span>
<span class="k">val</span> <span class="nv">count</span> <span class="k">=</span> <span class="nv">intermediate</span><span class="o">.</span><span class="py">count</span><span class="o">()</span>             <span class="c1">// 首次计算并缓存</span>
<span class="k">val</span> <span class="nv">sample</span> <span class="k">=</span> <span class="nv">intermediate</span><span class="o">.</span><span class="py">take</span><span class="o">(</span><span class="mi">10</span><span class="o">)</span>           <span class="c1">// 从缓存直接读取</span>
</code></pre></div></div>

<hr />

<h2 id="5-容错机制">5. 容错机制</h2>

<h3 id="51-bash-的容错">5.1 Bash 的容错</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 简单的重试逻辑</span>
<span class="k">for </span>i <span class="k">in</span> <span class="o">{</span>1..3<span class="o">}</span><span class="p">;</span> <span class="k">do
    </span><span class="nb">grep</span> <span class="s2">"ERROR"</span> app.log <span class="o">&gt;</span> result.txt <span class="o">&amp;&amp;</span> <span class="nb">break
    sleep </span>5
<span class="k">done</span>
</code></pre></div></div>

<h3 id="52-rdd-的容错血缘-lineage">5.2 RDD 的容错（血缘 Lineage）</h3>

<p>RDD 记录每个转换操作的血缘关系。当分区数据丢失时，系统自动从源头或缓存重建：</p>

<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nv">rdd1</span> <span class="k">=</span> <span class="nv">sc</span><span class="o">.</span><span class="py">textFile</span><span class="o">(</span><span class="s">"data.txt"</span><span class="o">)</span>      <span class="c1">// 源头</span>
<span class="k">val</span> <span class="nv">rdd2</span> <span class="k">=</span> <span class="nv">rdd1</span><span class="o">.</span><span class="py">filter</span><span class="o">(</span><span class="nv">_</span><span class="o">.</span><span class="py">contains</span><span class="o">(</span><span class="s">"key"</span><span class="o">))</span> <span class="c1">// 转换1</span>
<span class="k">val</span> <span class="nv">rdd3</span> <span class="k">=</span> <span class="nv">rdd2</span><span class="o">.</span><span class="py">map</span><span class="o">(</span><span class="nv">_</span><span class="o">.</span><span class="py">split</span><span class="o">(</span><span class="s">","</span><span class="o">)(</span><span class="mi">0</span><span class="o">))</span>      <span class="c1">// 转换2</span>
<span class="k">val</span> <span class="nv">result</span> <span class="k">=</span> <span class="nv">rdd3</span><span class="o">.</span><span class="py">count</span><span class="o">()</span>                 <span class="c1">// 动作</span>

<span class="c1">// 若某分区在计算 count 时丢失，Spark 根据血缘从 data.txt 重新计算 rdd1→rdd2→rdd3 的该分区</span>
</code></pre></div></div>

<hr />

<h2 id="6-思维模型总结">6. 思维模型总结</h2>

<table>
  <thead>
    <tr>
      <th>思维维度</th>
      <th>Bash 模型</th>
      <th>RDD 模型</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>数据视角</td>
      <td>文本流</td>
      <td>分区集合</td>
    </tr>
    <tr>
      <td>操作视角</td>
      <td>命令链</td>
      <td>转换链 + 动作触发</td>
    </tr>
    <tr>
      <td>执行视角</td>
      <td>立即顺序执行</td>
      <td>延迟并行执行</td>
    </tr>
    <tr>
      <td>容错视角</td>
      <td>脚本退出</td>
      <td>血缘自动重建</td>
    </tr>
    <tr>
      <td>扩展视角</td>
      <td>手动分片、<code class="language-plaintext highlighter-rouge">xargs</code></td>
      <td>自动分片、动态资源</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="7-结论">7. 结论</h2>

<p>RDD 可以视为<strong>分布式、容错、惰性求值的 Bash 管道</strong>。它将 Bash 脚本中「命令 → 管道 → 重定向」的模型，扩展为「转换 → 血缘 → 动作」的分布式计算模型。对于熟悉单机文本处理的开发者，通过这种类比可以快速理解：</p>

<ul>
  <li><strong>转换</strong> = 管道中的命令（如 filter、map）</li>
  <li><strong>动作</strong> = 触发执行的命令（如 count、collect）</li>
  <li><strong>缓存</strong> = 临时文件复用</li>
  <li><strong>血缘</strong> = 自动化的错误重试机制</li>
</ul>

<p>这种映射不仅有助于降低学习曲线，也为设计高效的分布式数据处理流程提供了清晰的思维框架。</p>

<hr />

<h2 id="附录操作对照表">附录：操作对照表</h2>

<table>
  <thead>
    <tr>
      <th>操作类型</th>
      <th>Bash 命令</th>
      <th>RDD 方法</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>读取文件</td>
      <td><code class="language-plaintext highlighter-rouge">cat file.txt</code></td>
      <td><code class="language-plaintext highlighter-rouge">sc.textFile(path)</code></td>
    </tr>
    <tr>
      <td>过滤</td>
      <td><code class="language-plaintext highlighter-rouge">grep pattern</code></td>
      <td><code class="language-plaintext highlighter-rouge">filter(predicate)</code></td>
    </tr>
    <tr>
      <td>映射</td>
      <td><code class="language-plaintext highlighter-rouge">awk '{print $1}'</code></td>
      <td><code class="language-plaintext highlighter-rouge">map(func)</code></td>
    </tr>
    <tr>
      <td>扁平映射</td>
      <td><code class="language-plaintext highlighter-rouge">xargs -n1</code></td>
      <td><code class="language-plaintext highlighter-rouge">flatMap(func)</code></td>
    </tr>
    <tr>
      <td>聚合</td>
      <td><code class="language-plaintext highlighter-rouge">sort \| uniq -c</code></td>
      <td><code class="language-plaintext highlighter-rouge">reduceByKey(_ + _)</code></td>
    </tr>
    <tr>
      <td>排序</td>
      <td><code class="language-plaintext highlighter-rouge">sort -k2 -nr</code></td>
      <td><code class="language-plaintext highlighter-rouge">sortByKey()</code></td>
    </tr>
    <tr>
      <td>限制</td>
      <td><code class="language-plaintext highlighter-rouge">head -n</code></td>
      <td><code class="language-plaintext highlighter-rouge">take(n)</code></td>
    </tr>
    <tr>
      <td>保存</td>
      <td><code class="language-plaintext highlighter-rouge">&gt; output.txt</code></td>
      <td><code class="language-plaintext highlighter-rouge">saveAsTextFile(path)</code></td>
    </tr>
    <tr>
      <td>计数</td>
      <td><code class="language-plaintext highlighter-rouge">wc -l</code></td>
      <td><code class="language-plaintext highlighter-rouge">count()</code></td>
    </tr>
    <tr>
      <td>变量赋值</td>
      <td><code class="language-plaintext highlighter-rouge">var=$(cmd)</code></td>
      <td><code class="language-plaintext highlighter-rouge">val rdd = transformation</code></td>
    </tr>
  </tbody>
</table>

<hr />

<p><strong>文档版本</strong>：1.0<br />
<strong>适用场景</strong>：RDD 编程入门、技术培训、思维模型转换</p>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[RDD（Resilient Distributed Dataset，弹性分布式数据集）是 Apache Spark 的核心抽象。本文通过将 RDD 编程模型与经典的 Bash 脚本管道、MapReduce 计算范式进行系统类比，帮助开发者建立从单机脚本思维到分布式数据处理的平滑过渡。文章涵盖执行模型、操作分类、容错机制及实际代码对比。]]></summary></entry><entry><title type="html">从 Java task_server 到 Rust（htyts / htyproc）：用 AI 推进迁移，用 GitHub CI 与基础设施兜住 E2E</title><link href="https://weinan.io/2026/03/22/rust-task-server-migration-ai-ci-e2e.html" rel="alternate" type="text/html" title="从 Java task_server 到 Rust（htyts / htyproc）：用 AI 推进迁移，用 GitHub CI 与基础设施兜住 E2E" /><published>2026-03-22T00:00:00+00:00</published><updated>2026-03-22T00:00:00+00:00</updated><id>https://weinan.io/2026/03/22/rust-task-server-migration-ai-ci-e2e</id><content type="html" xml:base="https://weinan.io/2026/03/22/rust-task-server-migration-ai-ci-e2e.html"><![CDATA[<style>
/* Mermaid diagram container */
.mermaid-container {
  position: relative;
  display: block;
  cursor: pointer;
  transition: opacity 0.2s;
  max-width: 100%;
  margin: 20px 0;
  overflow-x: auto;
}

.mermaid-container:hover {
  opacity: 0.8;
}

.mermaid-container svg {
  max-width: 100%;
  height: auto;
  display: block;
}

/* Modal overlay */
.mermaid-modal {
  display: none;
  position: fixed;
  z-index: 9999;
  left: 0;
  top: 0;
  width: 100%;
  height: 100%;
  background-color: rgba(0, 0, 0, 0.9);
  animation: fadeIn 0.3s;
}

.mermaid-modal.active {
  display: flex;
  align-items: center;
  justify-content: center;
}

@keyframes fadeIn {
  from { opacity: 0; }
  to { opacity: 1; }
}

/* Modal content */
.mermaid-modal-content {
  position: relative;
  width: 90vw;
  height: 90vh;
  overflow: hidden;
  background: white;
  padding: 20px;
  border-radius: 8px;
  box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
  display: flex;
  align-items: center;
  justify-content: center;
}

.mermaid-modal-diagram {
  transform-origin: center center;
  transition: transform 0.2s ease;
  display: inline-block;
  min-width: 100%;
  cursor: grab;
  user-select: none;
}

.mermaid-modal-diagram.dragging {
  cursor: grabbing;
  transition: none;
}

.mermaid-modal-diagram svg {
  width: 100%;
  height: auto;
  display: block;
  pointer-events: none;
}

/* Control buttons */
.mermaid-controls {
  position: absolute;
  top: 10px;
  right: 10px;
  display: flex;
  gap: 8px;
  z-index: 10000;
}

.mermaid-btn {
  background: rgba(255, 255, 255, 0.9);
  border: 1px solid #ddd;
  border-radius: 4px;
  padding: 8px 12px;
  cursor: pointer;
  font-size: 14px;
  transition: background 0.2s;
  color: #333;
  font-weight: 500;
}

.mermaid-btn:hover {
  background: white;
  box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}

/* Close button */
.mermaid-close {
  background: #f44336;
  color: white;
  border: none;
}

.mermaid-close:hover {
  background: #d32f2f;
}

/* Zoom indicator */
.mermaid-zoom-level {
  position: absolute;
  bottom: 20px;
  left: 20px;
  background: rgba(0, 0, 0, 0.7);
  color: white;
  padding: 6px 12px;
  border-radius: 4px;
  font-size: 14px;
  z-index: 10000;
}
</style>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';

  mermaid.initialize({
    startOnLoad: false,
    theme: 'default',
    securityLevel: 'loose',
    htmlLabels: true,
    themeVariables: {
      fontSize: '14px'
    }
  });

  let currentZoom = 1;
  let currentModal = null;
  let isDragging = false;
  let startX = 0;
  let startY = 0;
  let translateX = 0;
  let translateY = 0;

  // Create modal HTML
  function createModal() {
    const modal = document.createElement('div');
    modal.className = 'mermaid-modal';
    modal.innerHTML = `
      <div class="mermaid-controls">
        <button class="mermaid-btn zoom-in">放大 +</button>
        <button class="mermaid-btn zoom-out">缩小 -</button>
        <button class="mermaid-btn zoom-reset">重置</button>
        <button class="mermaid-btn mermaid-close">关闭 ✕</button>
      </div>
      <div class="mermaid-modal-content">
        <div class="mermaid-modal-diagram"></div>
      </div>
      <div class="mermaid-zoom-level">100%</div>
    `;
    document.body.appendChild(modal);
    return modal;
  }

  // Show modal with diagram
  function showModal(diagramContent) {
    if (!currentModal) {
      currentModal = createModal();
      setupModalEvents();
    }

    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.innerHTML = diagramContent;

    // Remove any width/height attributes from SVG to make it responsive
    const svg = modalDiagram.querySelector('svg');
    if (svg) {
      svg.removeAttribute('width');
      svg.removeAttribute('height');
      svg.style.width = '100%';
      svg.style.height = 'auto';
    }

    // Setup drag functionality
    setupDrag(modalDiagram);

    currentModal.classList.add('active');
    currentZoom = 1;
    resetPosition();
    updateZoom();
    document.body.style.overflow = 'hidden';
  }

  // Hide modal
  function hideModal() {
    if (currentModal) {
      currentModal.classList.remove('active');
      document.body.style.overflow = '';
    }
  }

  // Update zoom level
  function updateZoom() {
    if (!currentModal) return;
    const diagram = currentModal.querySelector('.mermaid-modal-diagram');
    const zoomLevel = currentModal.querySelector('.mermaid-zoom-level');
    diagram.style.transform = `translate(${translateX}px, ${translateY}px) scale(${currentZoom})`;
    zoomLevel.textContent = `${Math.round(currentZoom * 100)}%`;
  }

  // Reset position when zoom changes
  function resetPosition() {
    translateX = 0;
    translateY = 0;
  }

  // Setup drag functionality
  function setupDrag(element) {
    element.addEventListener('mousedown', startDrag);
    element.addEventListener('touchstart', startDrag);
  }

  function startDrag(e) {
    if (e.type === 'mousedown' && e.button !== 0) return; // Only left click

    isDragging = true;
    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.classList.add('dragging');

    if (e.type === 'touchstart') {
      startX = e.touches[0].clientX - translateX;
      startY = e.touches[0].clientY - translateY;
    } else {
      startX = e.clientX - translateX;
      startY = e.clientY - translateY;
    }

    document.addEventListener('mousemove', drag);
    document.addEventListener('touchmove', drag);
    document.addEventListener('mouseup', stopDrag);
    document.addEventListener('touchend', stopDrag);
  }

  function drag(e) {
    if (!isDragging) return;
    e.preventDefault();

    if (e.type === 'touchmove') {
      translateX = e.touches[0].clientX - startX;
      translateY = e.touches[0].clientY - startY;
    } else {
      translateX = e.clientX - startX;
      translateY = e.clientY - startY;
    }

    updateZoom();
  }

  function stopDrag() {
    isDragging = false;
    const modalDiagram = currentModal?.querySelector('.mermaid-modal-diagram');
    if (modalDiagram) {
      modalDiagram.classList.remove('dragging');
    }
    document.removeEventListener('mousemove', drag);
    document.removeEventListener('touchmove', drag);
    document.removeEventListener('mouseup', stopDrag);
    document.removeEventListener('touchend', stopDrag);
  }

  // Setup modal event listeners
  function setupModalEvents() {
    if (!currentModal) return;

    // Close button
    currentModal.querySelector('.mermaid-close').addEventListener('click', hideModal);

    // Zoom buttons
    currentModal.querySelector('.zoom-in').addEventListener('click', () => {
      currentZoom = Math.min(currentZoom + 0.25, 3);
      updateZoom();
    });

    currentModal.querySelector('.zoom-out').addEventListener('click', () => {
      currentZoom = Math.max(currentZoom - 0.25, 0.5);
      updateZoom();
    });

    currentModal.querySelector('.zoom-reset').addEventListener('click', () => {
      currentZoom = 1;
      resetPosition();
      updateZoom();
    });

    // Close on background click
    currentModal.addEventListener('click', (e) => {
      if (e.target === currentModal) {
        hideModal();
      }
    });

    // Close on ESC key
    document.addEventListener('keydown', (e) => {
      if (e.key === 'Escape' && currentModal.classList.contains('active')) {
        hideModal();
      }
    });
  }

  // Convert Jekyll-rendered code blocks to mermaid divs
  document.addEventListener('DOMContentLoaded', async function() {
    const codeBlocks = document.querySelectorAll('code.language-mermaid');

    for (const codeBlock of codeBlocks) {
      const pre = codeBlock.parentElement;
      const container = document.createElement('div');
      container.className = 'mermaid-container';

      const mermaidDiv = document.createElement('div');
      mermaidDiv.className = 'mermaid';
      mermaidDiv.textContent = codeBlock.textContent;

      container.appendChild(mermaidDiv);
      pre.replaceWith(container);
    }

    // Render all mermaid diagrams
    try {
      await mermaid.run({
        querySelector: '.mermaid'
      });
      console.log('Mermaid diagrams rendered successfully');
    } catch (error) {
      console.error('Mermaid rendering error:', error);
    }

    // Add click handlers to rendered diagrams
    document.querySelectorAll('.mermaid-container').forEach((container, index) => {
      // Find the rendered SVG inside the container
      const svg = container.querySelector('svg');
      if (!svg) {
        console.warn(`No SVG found in container ${index}`);
        return;
      }

      // Make the container clickable
      container.style.cursor = 'pointer';
      container.title = '点击查看大图';

      container.addEventListener('click', function(e) {
        e.preventDefault();
        e.stopPropagation();

        // Clone the SVG for the modal
        const svgClone = svg.cloneNode(true);
        const tempDiv = document.createElement('div');
        tempDiv.appendChild(svgClone);

        console.log('Opening modal for diagram', index);
        showModal(tempDiv.innerHTML);
      });

      console.log(`Click handler added to diagram ${index}`);
    });
  });
</script>

<p>记录将现网 Java task_server、proc_server 与共享契约迁到 huiwing workspace（htyts_models + htyts + htyproc）的过程：如何用 Cursor 里的迁移计划驱动迭代、如何复用 AuthCore/htycommons，以及如何用 GitHub Actions、Diesel 迁移、Docker Compose 与 AuthCore 联调测试把回归成本压进流水线。</p>

<h2 id="背景计划里写什么代码里落什么">背景：计划里写什么，代码里落什么</h2>

<p>迁移前在 Cursor 里整理了一份结构化计划（<code class="language-plaintext highlighter-rouge">rust_迁移_task_ts_proc</code>），核心不是「逐文件翻译 Java」，而是先把<strong>契约</strong>钉死：</p>

<ul>
  <li><strong>task_server</strong> → 对外仍是 <code class="language-plaintext highlighter-rouge">**/api/v1/ts/**</code>：任务 CRUD、<code class="language-plaintext highlighter-rouge">one_pending_task</code> / <code class="language-plaintext highlighter-rouge">one_zombie_task</code>、分页列表，以及原 Quartz 承载的<strong>课程通知</strong>（改为 Rust 侧调度 + HTTP 调 htyuc/htykc）。</li>
  <li><strong>proc_server</strong> → <code class="language-plaintext highlighter-rouge">htyproc</code>：拉取 pending、按 <code class="language-plaintext highlighter-rouge">TaskType</code> 分发、与 Ts/Ai/Ngx/Uc/Ws 等下游 HTTP 对齐。</li>
  <li><strong>共享数据</strong> → <code class="language-plaintext highlighter-rouge">htyts_models</code>：与现网 JSON 兼容的 <code class="language-plaintext highlighter-rouge">ReqTask</code>、payload、<code class="language-plaintext highlighter-rouge">DbTask</code> 行结构；PostgreSQL + Redis（<code class="language-plaintext highlighter-rouge">TS_</code> 前缀等与 Java 一致），并优先复用 <strong>AuthCore <code class="language-plaintext highlighter-rouge">htycommons</code></strong> 的 <code class="language-plaintext highlighter-rouge">HtyResponse</code>、Axum 提取器、JWT 等，而不是在私有仓库里再造一套「长得像」的协议。</li>
</ul>

<p>计划里同时写清了<strong>仓库边界</strong>：任务域专有逻辑默认闭环在 huiwing；对 AuthCore 的改动要满足开源仓库的兼容、通用与安全预期——这一条直接影响了「哪些代码进 <code class="language-plaintext highlighter-rouge">htyts_models</code>、哪些只借鉴模式」。</p>

<h3 id="改造前后进程与-crate-边界">改造前后：进程与 crate 边界</h3>

<pre><code class="language-mermaid">flowchart LR
  subgraph before [改造前 Java]
    direction TB
    TS[task_server&lt;br/&gt;JAX-RS /api/v1/ts]
    PR[proc_server&lt;br/&gt;TaskProcessor]
    TC[task_commons&lt;br/&gt;DbTask ReqTask Redis]
    TS --&gt; TC
    PR --&gt; TC
  end

  subgraph after [改造后 Rust / huiwing]
    direction TB
    HTYTS[htyts&lt;br/&gt;Axum /api/v1/ts]
    HPROC[htyproc&lt;br/&gt;拉取与处理器]
    HM[htyts_models&lt;br/&gt;契约与 Diesel]
    HC[htycommons&lt;br/&gt;AuthCore]
    HTYTS --&gt; HM
    HPROC --&gt; HM
    HTYTS --&gt; HC
    HPROC --&gt; HC
  end

  TC -.-&gt;|契约对齐&lt;br/&gt;HTTP 路径与 JSON| HM
</code></pre>

<h3 id="迁移后的运行时结构与现网调用关系">迁移后的运行时结构（与现网调用关系）</h3>

<pre><code class="language-mermaid">flowchart TB
  subgraph clients [调用方]
    WEB[htymusic / htyadmin]
    NGXL[OpenResty / Lua]
  end

  subgraph htyts_crate [htyts]
    API["/api/v1/ts"]
    KC["/api/v1/ts/kc 课程通知调度"]
  end

  PG[(PostgreSQL&lt;br/&gt;dbtask)]
  RD[(Redis&lt;br/&gt;TS_ payload)]

  subgraph htyproc_crate [htyproc]
    LOOP[拉取 one_pending_task]
    HANDLERS[按 TaskType 处理]
  end

  subgraph downstream [下游 HTTP]
    UC[htyuc]
    WS[htyws]
    KC2[htykc]
    AISVC[ai 服务]
    NGX[ngx / OpenResty]
  end

  WEB --&gt; API
  NGXL --&gt; API
  API --&gt; PG
  API --&gt; RD
  KC --&gt; UC
  KC --&gt; KC2

  LOOP --&gt;|GET pending| API
  HANDLERS --&gt; UC
  HANDLERS --&gt; WS
  HANDLERS --&gt; KC2
  HANDLERS --&gt; AISVC
  HANDLERS --&gt; NGX
  HANDLERS --&gt;|POST update_task| API
</code></pre>

<h2 id="实际落到仓库里的变更2026-03-22-前后">实际落到仓库里的变更（2026-03-22 前后）</h2>

<p>主线已合入 <code class="language-plaintext highlighter-rouge">huiwing</code> 的 <code class="language-plaintext highlighter-rouge">main</code>（含 <a href="https://github.com/alchemy-studio/huiwing/pull/1631">PR #1631</a> 的合并），与本次主题相关的提交大致可以读成三层：</p>

<ol>
  <li>
    <p><strong><code class="language-plaintext highlighter-rouge">feat(rust): task_server / proc_server 迁移</code></strong><br />
在工作区引入 <code class="language-plaintext highlighter-rouge">htyts_models</code>、<code class="language-plaintext highlighter-rouge">htyts</code>、<code class="language-plaintext highlighter-rouge">htyproc</code>，把 Java 侧任务 API 与处理器迁到 Rust，并与现有 workspace（Axum、<code class="language-plaintext highlighter-rouge">htycommons</code>、依赖版本）对齐。</p>
  </li>
  <li>
    <p><strong><code class="language-plaintext highlighter-rouge">feat(htyts_models): Diesel migrations + schema/DbTaskRow</code> + <code class="language-plaintext highlighter-rouge">refactor(htyts): move DB ops into htyts_models</code></strong><br />
用 <strong>Diesel</strong> 管理 <code class="language-plaintext highlighter-rouge">dbtask</code> 表结构（<code class="language-plaintext highlighter-rouge">migrations/</code>、<code class="language-plaintext highlighter-rouge">diesel.toml</code>、<code class="language-plaintext highlighter-rouge">schema.rs</code>），把数据库访问集中到 <code class="language-plaintext highlighter-rouge">htyts_models</code>（风格上对齐 <code class="language-plaintext highlighter-rouge">htyuc_models</code>），<code class="language-plaintext highlighter-rouge">htyts</code> 通过重导出与 handler 调用；E2E 侧改为 <strong><code class="language-plaintext highlighter-rouge">diesel migration run</code></strong>，去掉维护一份独立 <code class="language-plaintext highlighter-rouge">init.sql</code> 的漂移风险。实现细节上顺带处理了与 workspace 里 <strong>reqwest 版本</strong>相关的 URL 拼装（例如用 <code class="language-plaintext highlighter-rouge">url::form_urlencoded</code> 等与现有服务一致）。</p>
  </li>
  <li>
    <p><strong><code class="language-plaintext highlighter-rouge">ci: HTYTS + AuthCore 周更联调与本地 docker-compose</code></strong></p>
    <ul>
      <li><strong>GitHub Actions</strong>：轻量的 <code class="language-plaintext highlighter-rouge">rust-ts.yml</code> 仍在 <strong>PR/push</strong> 上跑：Postgres + Redis + <code class="language-plaintext highlighter-rouge">diesel_cli</code> + 迁移 + <code class="language-plaintext highlighter-rouge">cargo test -p htyts --test ts_e2e_http</code>。</li>
      <li><strong>重任务</strong>：单独 workflow <strong>仅 <code class="language-plaintext highlighter-rouge">schedule</code>（例如每周）+ <code class="language-plaintext highlighter-rouge">workflow_dispatch</code></strong>，clone <strong>AuthCore</strong>、双库迁移、构建并启动 <strong>htyuc</strong>、再跑依赖真实 UC 的集成测试；避免把「起一整条 AuthCore 链」绑在每一次 PR 上。</li>
      <li><strong>本地</strong>：<code class="language-plaintext highlighter-rouge">docker-compose.authcore-e2e.yml</code> + <code class="language-plaintext highlighter-rouge">scripts/run-authcore-e2e-docker.sh</code>，用固定宿主机端口起双 Postgres + Redis，脚本里处理 <strong><code class="language-plaintext highlighter-rouge">LOGGER_LEVEL</code></strong>、<strong><code class="language-plaintext highlighter-rouge">env -u CARGO_TARGET_DIR</code> 构建 htyuc</strong>（避免 IDE 注入的 target 目录导致找不到二进制）等踩坑点。</li>
    </ul>
  </li>
</ol>

<p>合并顺序上，<code class="language-plaintext highlighter-rouge">#1631</code> 把 Diesel/DB 重构与 CI 演进合进主线后，又与已存在的 AuthCore 联调提交并存于历史里；若只看「最终能力」，可以理解为：<strong>主线同时具备 Diesel 管理的任务表、轻量 HTTP E2E、以及与 AuthCore UC 的可选联调路径</strong>。</p>

<h3 id="数据层diesel-迁移与-crate-分工">数据层：Diesel 迁移与 crate 分工</h3>

<pre><code class="language-mermaid">flowchart LR
  subgraph repo [仓库内]
    MIG[htyts_models/migrations]
    SCH[schema.rs + DbTaskRow]
    OPS[impl 于 models.rs]
    MIG --&gt; SCH
    SCH --&gt; OPS
  end

  subgraph consumers [使用者]
    H[htyts handlers]
    H --&gt; OPS
  end

  subgraph ci [CI / 本地]
    D[diesel migration run]
    D --&gt; MIG
  end
</code></pre>

<h2 id="我们怎样用-ai-完成重写而不是胡写">我们怎样用 AI 完成「重写」而不是「胡写」</h2>

<p>结合上面那份计划，实际协作方式更接近下面几条，而不是「一句话生成整个仓库」：</p>

<ol>
  <li>
    <p><strong>计划即边界</strong><br />
把 Java 包路径、现网路由、Redis 前缀、与 htykc/htyuc 的调用关系写进计划后，后续无论是拆 crate 还是写 handler，都有一个<strong>可对照的清单</strong>，减少模型自由发挥。</p>
  </li>
  <li>
    <p><strong>契约优先于行数</strong><br />
先固定 <code class="language-plaintext highlighter-rouge">ReqTask</code> / <code class="language-plaintext highlighter-rouge">HtyResponse</code> / 错误码与 Java 行为一致，再补实现；AI 适合批量生成样板与对称的 handler，但<strong>字段名、状态机、与 UC 的 JWT 语义</strong>需要人眼对照现网或集成测试。</p>
  </li>
  <li>
    <p><strong>迭代式纠偏</strong><br />
例如联调 UC 时发现：<code class="language-plaintext highlighter-rouge">verify_jwt</code> 在 UC 侧会查 <strong>Redis 里是否存有与 <code class="language-plaintext highlighter-rouge">token_id</code> 对应的完整 JWT</strong>——本地随手 <code class="language-plaintext highlighter-rouge">jwt_encode_token</code> 出来的串并不会过校验；最终 E2E 改为走 <strong><code class="language-plaintext highlighter-rouge">login_with_password</code>（fixture 用户）</strong> 拿「已在 UC Redis 里登记」的 token，这是对<strong>真实协议</strong>的修正，而不是计划里一开始就能写全的细节。</p>
  </li>
  <li>
    <p><strong>Review 仍然是闸门</strong><br />
AI 加速的是起草与重构 diff；合并进 <code class="language-plaintext highlighter-rouge">main</code> 仍走 PR、CI 绿灯与人工扫一眼安全面（密钥、日志、对外 HTTP）。</p>
  </li>
</ol>

<h3 id="人机协作工作流计划驱动">人机协作工作流（计划驱动）</h3>

<pre><code class="language-mermaid">flowchart TD
  PL[迁移计划与契约清单&lt;br/&gt;Java 路径 / Redis 前缀 / 路由]
  ISS[拆解为可执行 Issue 与 Prompt]
  AI[AI 起草实现与重构]
  PR[PR 与自动化测试]
  PL --&gt; ISS
  ISS --&gt; AI
  AI --&gt; PR
  PR --&gt; RV{评审与对照现网}
  RV --&gt;|需纠偏| ISS
  RV --&gt;|通过| MAIN[合并 main]
</code></pre>

<h2 id="github-ci-与基础设施e2e-分两层做">GitHub CI 与基础设施：E2E 分两层做</h2>

<ul>
  <li>
    <p><strong>默认 CI（每次 PR）</strong><br />
Docker services 起 Postgres + Redis，迁移到最新 schema，跑 <strong><code class="language-plaintext highlighter-rouge">ts_e2e_http</code></strong>。成本可控、反馈快，适合防止「改 handler 把契约改断」。</p>
  </li>
  <li>
    <p><strong>AuthCore 联调（周更 / 手动）</strong><br />
需要第二套 PG（UC 库）、UC 的 <code class="language-plaintext highlighter-rouge">diesel</code> + fixture SQL、以及 release 级 <strong>htyuc</strong> 进程；测试用例标成 <strong><code class="language-plaintext highlighter-rouge">#[ignore]</code></strong>，只在专门 workflow 或本地脚本里加 <code class="language-plaintext highlighter-rouge">--ignored</code> 跑。这样<strong>不把重依赖强加给每个贡献者</strong>，又能在主干上周期性验证「HTYTS + HTYUC + 同一 <code class="language-plaintext highlighter-rouge">JWT_KEY</code>」这条真实链路。</p>
  </li>
  <li>
    <p><strong>本地 Docker Compose</strong><br />
与 CI 同源的思路：compose 只负责<strong>基础设施</strong>，业务进程（htyuc、Rust 测试）仍在宿主机用 cargo 跑，便于调试日志与 attach；脚本把环境变量、端口、以及 UC 启动条件写死成可重复的一步。</p>
  </li>
</ul>

<h3 id="ci-与-e2e轻量-pr-与重联调分流">CI 与 E2E：轻量 PR 与重联调分流</h3>

<pre><code class="language-mermaid">flowchart TB
  subgraph every_pr [每次 PR / push — rust-ts.yml]
    E1[GitHub Actions job]
    E2[Service: Postgres + Redis]
    E3[diesel_cli + migration run]
    E4["cargo test ts_e2e_http"]
    E1 --&gt; E2 --&gt; E3 --&gt; E4
  end

  subgraph weekly [周更或手动 — htyts-authcore-weekly.yml]
    W1[checkout huiwing + AuthCore]
    W2[双 Postgres + Redis]
    W3[UC migrate + fixture SQL]
    W4[build htyuc release + 启动]
    W5["cargo test ts_e2e_authcore_http -- --ignored"]
    W1 --&gt; W2 --&gt; W3 --&gt; W4 --&gt; W5
  end

  subgraph docker_local [本地复现]
    D1[docker-compose.authcore-e2e]
    D2[run-authcore-e2e-docker.sh]
    D1 --&gt; D2
  end

  every_pr -.-&gt;|快速回归契约| MR[合并信心]
  weekly -.-&gt;|真实 UC verify 链路| MR
  docker_local -.-&gt;|与 CI 同构调试| MR
</code></pre>

<h3 id="联调链sudo-校验走-htyuc概念">联调链：sudo 校验走 HTYUC（概念）</h3>

<pre><code class="language-mermaid">sequenceDiagram
  participant C as 客户端
  participant TS as htyts
  participant R as TS Redis 缓存
  participant UC as htyuc

  C-&gt;&gt;TS: create_task + HtySudoerToken
  alt 缓存未命中且 TOKEN_VERIFY=true
    TS-&gt;&gt;UC: POST verify_jwt_token
    UC-&gt;&gt;UC: JWT 与 UC Redis 一致则 r=true
    UC--&gt;&gt;TS: HtyResponse
    TS-&gt;&gt;R: 写入 sudo 缓存
  else 缓存命中
    TS-&gt;&gt;R: 读 TS_SUDO_T
  end
  TS--&gt;&gt;C: 201 / 业务响应
</code></pre>

<h2 id="小结">小结</h2>

<p>这一轮迁移的本质是：<strong>用一份明确的迁移计划约束 AI 与人工的分工</strong>，用 <strong>Diesel + CI 迁移步骤</strong>约束 schema 与运行时一致，再用 <strong>分层 E2E（轻量每次跑、重联调周期跑 + 本地 compose）</strong> 把「像 Java 一样能跑」变成可重复验证的事实。若你也在做「Java 服务 → Rust + 现网契约不变」，最值得提前投资的往往是<strong>契约文档与 CI 里的数据库/迁移</strong>，其次才是具体某一层的代码行数。</p>

<hr />

<p><em>仓库：<code class="language-plaintext highlighter-rouge">alchemy-studio/huiwing</code>；开源基础设施：<code class="language-plaintext highlighter-rouge">alchemy-studio/AuthCore</code>。文中涉及的 PR、workflow 与脚本以仓库当前 <code class="language-plaintext highlighter-rouge">main</code> 为准。</em></p>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">__stack_chk_guard 深入解析：原理、示例与 musl/glibc 代码路径</title><link href="https://weinan.io/2026/03/21/stack-chk-guard-musl-glibc.html" rel="alternate" type="text/html" title="__stack_chk_guard 深入解析：原理、示例与 musl/glibc 代码路径" /><published>2026-03-21T00:00:00+00:00</published><updated>2026-03-21T00:00:00+00:00</updated><id>https://weinan.io/2026/03/21/stack-chk-guard-musl-glibc</id><content type="html" xml:base="https://weinan.io/2026/03/21/stack-chk-guard-musl-glibc.html"><![CDATA[<p><code class="language-plaintext highlighter-rouge">__stack_chk_guard</code> 是 GCC/Clang 栈保护（SSP, Stack Smashing Protector）机制中的核心变量之一。很多人在反汇编里见过它，但常见误解是：它是不是内核变量、什么时候初始化、为什么能拦截栈溢出。本文基于一个具体示例和运行时实现路径，把这些问题串起来说明。</p>

<h2 id="一概念说明__stack_chk_guard-是什么">一、概念说明：<code class="language-plaintext highlighter-rouge">__stack_chk_guard</code> 是什么</h2>

<p><code class="language-plaintext highlighter-rouge">__stack_chk_guard</code> 本质上是 canary（金丝雀）参考值。编译器在函数入口和出口自动插入检查逻辑：</p>

<ol>
  <li>函数入口：把 guard 值保存到当前函数栈帧。</li>
  <li>函数返回前：比较栈中的副本和原始 guard。</li>
  <li>不一致：调用 <code class="language-plaintext highlighter-rouge">__stack_chk_fail()</code>，进程立即终止。</li>
</ol>

<p>这个机制的安全意义是：攻击者若想覆盖返回地址，通常必须先破坏 canary，而 canary 一旦被改写，函数返回前就会被检测出来。</p>

<h2 id="二谁在做这件事编译器与-c-库分工">二、谁在做这件事：编译器与 C 库分工</h2>

<p>栈保护不是单一组件完成，而是协同机制：</p>

<ul>
  <li>编译器（GCC/Clang）：负责插桩，自动生成“保存 canary / 校验 canary”代码。</li>
  <li>libc（musl/glibc）：负责初始化 guard，并提供失败处理函数 <code class="language-plaintext highlighter-rouge">__stack_chk_fail</code>。</li>
</ul>

<p>所以需要明确：<code class="language-plaintext highlighter-rouge">__stack_chk_guard</code> 变量本体属于用户态运行时，不是“内核维护的全局变量”。内核通常只在进程启动时通过 <code class="language-plaintext highlighter-rouge">AT_RANDOM</code> 等渠道提供随机熵。</p>

<h2 id="三具体例子一个会溢出的登录函数">三、具体例子：一个会溢出的登录函数</h2>

<p>下面是一个最小示例（故意保留不安全写法）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="cp">
</span>
<span class="kt">void</span> <span class="nf">login</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">password</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="n">buffer</span><span class="p">[</span><span class="mi">8</span><span class="p">];</span>
    <span class="kt">int</span> <span class="n">is_admin</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="n">strcpy</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="n">password</span><span class="p">);</span>  <span class="c1">// 无边界检查，存在溢出风险</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="s">"secret"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">is_admin</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">puts</span><span class="p">(</span><span class="n">is_admin</span> <span class="o">?</span> <span class="s">"welcome admin"</span> <span class="o">:</span> <span class="s">"bad password"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="31-不启用栈保护时">3.1 不启用栈保护时</h3>

<p>如果输入超过 <code class="language-plaintext highlighter-rouge">buffer</code> 容量，溢出会继续覆盖相邻栈数据，严重时可改写返回地址，形成控制流劫持入口。</p>

<h3 id="32-启用栈保护后">3.2 启用栈保护后</h3>

<p>编译器会在 <code class="language-plaintext highlighter-rouge">login</code> 的序言保存 canary，在尾声做比较。如果输入过长导致 canary 被覆盖，返回前触发失败处理，进程中止。典型编译方式：</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcc <span class="nt">-O2</span> <span class="nt">-fstack-protector</span> <span class="nt">-o</span> demo demo.c
</code></pre></div></div>

<p>更激进版本：</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcc <span class="nt">-O2</span> <span class="nt">-fstack-protector-all</span> <span class="nt">-o</span> demo demo.c
</code></pre></div></div>

<p>典型失败输出（glibc 环境常见）：</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>*** stack smashing detected ***: terminated
Aborted (core dumped)
</code></pre></div></div>

<h3 id="33-简单场景说明按输入长度看行为">3.3 简单场景说明（按输入长度看行为）</h3>

<p>这个例子可以直接用三种输入理解：</p>

<ol>
  <li>输入 <code class="language-plaintext highlighter-rouge">secret</code>（6 字节）<br />
<code class="language-plaintext highlighter-rouge">buffer[8]</code> 能完整容纳，不发生溢出，canary 不变，函数正常返回。</li>
  <li>输入 <code class="language-plaintext highlighter-rouge">12345678</code>（8 字节）<br />
刚好写满缓冲区边界，通常也不会覆盖 canary，函数正常返回。</li>
  <li>输入 <code class="language-plaintext highlighter-rouge">123456789</code>（9 字节及以上）<br />
超出缓冲区后继续向后写，极易覆盖 canary；函数尾声比较失败，调用 <code class="language-plaintext highlighter-rouge">__stack_chk_fail</code> 终止进程。</li>
</ol>

<p>对应内存上的直觉是：想碰到返回地址，先要经过 canary 槽位；canary 先变，程序就先终止。</p>

<h2 id="四代码分析从汇编模式到运行时路径">四、代码分析：从汇编模式到运行时路径</h2>

<p>不同架构指令细节不同，但总体结构一致。可抽象为：</p>

<pre><code class="language-asm">; 函数入口
load guard -&gt; reg
store reg -&gt; [stack_canary_slot]

; ... 函数主体 ...

; 函数返回前
load [stack_canary_slot] -&gt; reg1
load guard -&gt; reg2
cmp reg1, reg2
jne __stack_chk_fail
ret
</code></pre>

<p>这解释了为什么该机制能拦住大量“覆盖返回地址”的经典栈溢出：覆盖路径上必须先穿过 canary 槽位。</p>

<h2 id="五__stack_chk_guard-何时会变化">五、<code class="language-plaintext highlighter-rouge">__stack_chk_guard</code> 何时会变化</h2>

<p>正常情况下，guard 在进程启动早期初始化一次，随后应保持稳定。运行中发现 guard 改变，通常意味着以下之一：</p>

<ul>
  <li>发生了严重内存破坏（例如任意地址写、全局区越界）。</li>
  <li>调试或安全研究场景下被人工改写（如 GDB、注入库）。</li>
  <li>程序自身存在未定义行为导致误写。</li>
</ul>

<p>因此，“运行时 guard 变化”是高度可疑信号，不应视为正常现象。</p>

<h2 id="六musl-c-代码说明定义初始化线程传播失败路径">六、musl C 代码说明（定义、初始化、线程传播、失败路径）</h2>

<p>下面用你分析中对应的 musl 代码路径来说明关键点。</p>

<h3 id="61-__stack_chk_guard-定义与-__init_ssp-初始化">6.1 <code class="language-plaintext highlighter-rouge">__stack_chk_guard</code> 定义与 <code class="language-plaintext highlighter-rouge">__init_ssp</code> 初始化</h3>

<p><code class="language-plaintext highlighter-rouge">src/env/__stack_chk_fail.c</code>（简化）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uintptr_t</span> <span class="n">__stack_chk_guard</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">__init_ssp</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">entropy</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">entropy</span><span class="p">)</span> <span class="n">memcpy</span><span class="p">(</span><span class="o">&amp;</span><span class="n">__stack_chk_guard</span><span class="p">,</span> <span class="n">entropy</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">uintptr_t</span><span class="p">));</span>
    <span class="k">else</span> <span class="n">__stack_chk_guard</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="o">&amp;</span><span class="n">__stack_chk_guard</span> <span class="o">*</span> <span class="mi">1103515245</span><span class="p">;</span>

<span class="cp">#if UINTPTR_MAX &gt;= 0xffffffffffffffff
</span>    <span class="p">((</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">__stack_chk_guard</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="cp">#endif
</span>
    <span class="n">__pthread_self</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">canary</span> <span class="o">=</span> <span class="n">__stack_chk_guard</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>这段代码表达了四件事：</p>

<ol>
  <li>guard 是用户态全局变量（<code class="language-plaintext highlighter-rouge">uintptr_t __stack_chk_guard;</code>）。</li>
  <li>优先使用外部熵（<code class="language-plaintext highlighter-rouge">entropy</code>，通常来自 <code class="language-plaintext highlighter-rouge">AT_RANDOM</code>）。</li>
  <li>无熵时使用兜底值（可用但强度较弱）。</li>
  <li>初始化后同步到当前线程的 <code class="language-plaintext highlighter-rouge">canary</code> 字段，供线程上下文中的检查路径使用。</li>
</ol>

<h3 id="62-进程启动阶段如何把-at_random-传给-__init_ssp">6.2 进程启动阶段如何把 <code class="language-plaintext highlighter-rouge">AT_RANDOM</code> 传给 <code class="language-plaintext highlighter-rouge">__init_ssp</code></h3>

<p><code class="language-plaintext highlighter-rouge">src/env/__libc_start_main.c</code>（简化）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">__init_libc</span><span class="p">(</span><span class="kt">char</span> <span class="o">**</span><span class="n">envp</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">pn</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">size_t</span> <span class="n">i</span><span class="p">,</span> <span class="o">*</span><span class="n">auxv</span><span class="p">,</span> <span class="n">aux</span><span class="p">[</span><span class="n">AUX_CNT</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
    <span class="p">...</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">auxv</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="n">i</span><span class="o">+=</span><span class="mi">2</span><span class="p">)</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">auxv</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">&lt;</span> <span class="n">AUX_CNT</span><span class="p">)</span> <span class="n">aux</span><span class="p">[</span><span class="n">auxv</span><span class="p">[</span><span class="n">i</span><span class="p">]]</span> <span class="o">=</span> <span class="n">auxv</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">];</span>
    <span class="p">...</span>
    <span class="n">__init_tls</span><span class="p">(</span><span class="n">aux</span><span class="p">);</span>
    <span class="n">__init_ssp</span><span class="p">((</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">aux</span><span class="p">[</span><span class="n">AT_RANDOM</span><span class="p">]);</span>
    <span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>这里的关键是：在 <code class="language-plaintext highlighter-rouge">main</code> 执行前，musl 已完成 guard 初始化。<br />
所以业务代码进入前，SSP 依赖的数据已就绪。</p>

<h3 id="63-线程结构与新线程-canary-继承">6.3 线程结构与新线程 canary 继承</h3>

<p>线程结构中有 canary 字段（<code class="language-plaintext highlighter-rouge">src/internal/pthread_impl.h</code>）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">pthread</span> <span class="p">{</span>
    <span class="p">...</span>
    <span class="kt">uintptr_t</span> <span class="n">canary</span><span class="p">;</span>
    <span class="p">...</span>
<span class="p">};</span>
</code></pre></div></div>

<p>线程创建时复制父线程 canary（<code class="language-plaintext highlighter-rouge">src/thread/pthread_create.c</code>）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">new</span><span class="o">-&gt;</span><span class="n">canary</span> <span class="o">=</span> <span class="n">self</span><span class="o">-&gt;</span><span class="n">canary</span><span class="p">;</span>
</code></pre></div></div>

<p>这保证了多线程下 canary 数据在运行时结构里是一致可用的。</p>

<h3 id="64-校验失败后的处理__stack_chk_fail">6.4 校验失败后的处理：<code class="language-plaintext highlighter-rouge">__stack_chk_fail</code></h3>

<p><code class="language-plaintext highlighter-rouge">src/env/__stack_chk_fail.c</code>（简化）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">__stack_chk_fail</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">a_crash</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<p>musl 的失败路径非常短：直接崩溃退出，不尝试恢复。<br />
这是典型 fail-fast 策略，避免在“栈已损坏”状态继续执行复杂逻辑。</p>

<h3 id="65-musl-这一套实现的工程特征">6.5 musl 这一套实现的工程特征</h3>

<ul>
  <li>启动期初始化清晰：<code class="language-plaintext highlighter-rouge">__init_libc -&gt; __init_ssp</code>。</li>
  <li>线程传播路径直接：当前线程写入 + 新线程继承。</li>
  <li>失败处理最小化：<code class="language-plaintext highlighter-rouge">a_crash()</code> 终止，降低攻击面。</li>
</ul>

<h2 id="七glibc-对照同目标不同工程风格">七、glibc 对照：同目标，不同工程风格</h2>

<p>glibc 与 musl 在核心目标上一致：都通过 canary 检测栈破坏并 fail-fast。差异更多体现在工程层面：</p>

<ul>
  <li>平台适配路径更复杂；</li>
  <li>错误提示通常更显式（常见 <code class="language-plaintext highlighter-rouge">stack smashing detected</code>）；</li>
  <li>失败处理同样尽量克制，避免依赖过多复杂运行时状态。</li>
</ul>

<h2 id="八边界与局限它不是万能防护">八、边界与局限：它不是万能防护</h2>

<p><code class="language-plaintext highlighter-rouge">__stack_chk_guard</code> 很重要，但能力边界也要明确：</p>

<ul>
  <li>主要针对栈上的典型覆盖路径；</li>
  <li>对堆溢出、信息泄露、UAF、逻辑漏洞不直接提供完整防护；</li>
  <li>需要和 ASLR、NX、RELRO、FORTIFY_SOURCE 等机制组合使用；</li>
  <li>也不能替代安全编码（边界检查、避免危险 API、最小权限设计）。</li>
</ul>

<h2 id="九实践建议">九、实践建议</h2>

<ol>
  <li>在构建系统中默认启用 <code class="language-plaintext highlighter-rouge">-fstack-protector-strong</code>（或更强策略）。</li>
  <li>同时启用 PIE、RELRO、NX 和 FORTIFY_SOURCE。</li>
  <li>优先替换高风险 API（如 <code class="language-plaintext highlighter-rouge">strcpy</code>, <code class="language-plaintext highlighter-rouge">sprintf</code>, <code class="language-plaintext highlighter-rouge">gets</code>）。</li>
  <li>将 canary 视为“最后一道完整性检查”，而非唯一安全策略。</li>
</ol>

<h2 id="十结论">十、结论</h2>

<p><code class="language-plaintext highlighter-rouge">__stack_chk_guard</code> 的价值可以概括为一句话：<br />
它通过“函数级栈完整性校验”把很多本可沉默成功的栈覆盖攻击，转化为可检测、可终止的失败路径。</p>

<p>从机制到实现，无论是 musl 还是 glibc，本质都遵循同一个原则：在控制流可信度下降时，尽快停止执行，避免把漏洞升级为可利用攻击。</p>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[__stack_chk_guard 是 GCC/Clang 栈保护（SSP, Stack Smashing Protector）机制中的核心变量之一。很多人在反汇编里见过它，但常见误解是：它是不是内核变量、什么时候初始化、为什么能拦截栈溢出。本文基于一个具体示例和运行时实现路径，把这些问题串起来说明。]]></summary></entry><entry><title type="html">用户栈溢出与缺页：内核如何扩展栈与触发 SIGSEGV</title><link href="https://weinan.io/2026/03/18/stack-overflow-page-faults-benchmark.html" rel="alternate" type="text/html" title="用户栈溢出与缺页：内核如何扩展栈与触发 SIGSEGV" /><published>2026-03-18T00:00:00+00:00</published><updated>2026-03-18T00:00:00+00:00</updated><id>https://weinan.io/2026/03/18/stack-overflow-page-faults-benchmark</id><content type="html" xml:base="https://weinan.io/2026/03/18/stack-overflow-page-faults-benchmark.html"><![CDATA[<style>
/* Mermaid diagram container */
.mermaid-container {
  position: relative;
  display: block;
  cursor: pointer;
  transition: opacity 0.2s;
  max-width: 100%;
  margin: 20px 0;
  overflow-x: auto;
}

.mermaid-container:hover {
  opacity: 0.8;
}

.mermaid-container svg {
  max-width: 100%;
  height: auto;
  display: block;
}

/* Modal overlay */
.mermaid-modal {
  display: none;
  position: fixed;
  z-index: 9999;
  left: 0;
  top: 0;
  width: 100%;
  height: 100%;
  background-color: rgba(0, 0, 0, 0.9);
  animation: fadeIn 0.3s;
}

.mermaid-modal.active {
  display: flex;
  align-items: center;
  justify-content: center;
}

@keyframes fadeIn {
  from { opacity: 0; }
  to { opacity: 1; }
}

/* Modal content */
.mermaid-modal-content {
  position: relative;
  width: 90vw;
  height: 90vh;
  overflow: hidden;
  background: white;
  padding: 20px;
  border-radius: 8px;
  box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
  display: flex;
  align-items: center;
  justify-content: center;
}

.mermaid-modal-diagram {
  transform-origin: center center;
  transition: transform 0.2s ease;
  display: inline-block;
  min-width: 100%;
  cursor: grab;
  user-select: none;
}

.mermaid-modal-diagram.dragging {
  cursor: grabbing;
  transition: none;
}

.mermaid-modal-diagram svg {
  width: 100%;
  height: auto;
  display: block;
  pointer-events: none;
}

/* Control buttons */
.mermaid-controls {
  position: absolute;
  top: 10px;
  right: 10px;
  display: flex;
  gap: 8px;
  z-index: 10000;
}

.mermaid-btn {
  background: rgba(255, 255, 255, 0.9);
  border: 1px solid #ddd;
  border-radius: 4px;
  padding: 8px 12px;
  cursor: pointer;
  font-size: 14px;
  transition: background 0.2s;
  color: #333;
  font-weight: 500;
}

.mermaid-btn:hover {
  background: white;
  box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}

/* Close button */
.mermaid-close {
  background: #f44336;
  color: white;
  border: none;
}

.mermaid-close:hover {
  background: #d32f2f;
}

/* Zoom indicator */
.mermaid-zoom-level {
  position: absolute;
  bottom: 20px;
  left: 20px;
  background: rgba(0, 0, 0, 0.7);
  color: white;
  padding: 6px 12px;
  border-radius: 4px;
  font-size: 14px;
  z-index: 10000;
}
</style>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';

  mermaid.initialize({
    startOnLoad: false,
    theme: 'default',
    securityLevel: 'loose',
    htmlLabels: true,
    themeVariables: {
      fontSize: '14px'
    }
  });

  let currentZoom = 1;
  let currentModal = null;
  let isDragging = false;
  let startX = 0;
  let startY = 0;
  let translateX = 0;
  let translateY = 0;

  // Create modal HTML
  function createModal() {
    const modal = document.createElement('div');
    modal.className = 'mermaid-modal';
    modal.innerHTML = `
      <div class="mermaid-controls">
        <button class="mermaid-btn zoom-in">放大 +</button>
        <button class="mermaid-btn zoom-out">缩小 -</button>
        <button class="mermaid-btn zoom-reset">重置</button>
        <button class="mermaid-btn mermaid-close">关闭 ✕</button>
      </div>
      <div class="mermaid-modal-content">
        <div class="mermaid-modal-diagram"></div>
      </div>
      <div class="mermaid-zoom-level">100%</div>
    `;
    document.body.appendChild(modal);
    return modal;
  }

  // Show modal with diagram
  function showModal(diagramContent) {
    if (!currentModal) {
      currentModal = createModal();
      setupModalEvents();
    }

    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.innerHTML = diagramContent;

    // Remove any width/height attributes from SVG to make it responsive
    const svg = modalDiagram.querySelector('svg');
    if (svg) {
      svg.removeAttribute('width');
      svg.removeAttribute('height');
      svg.style.width = '100%';
      svg.style.height = 'auto';
    }

    // Setup drag functionality
    setupDrag(modalDiagram);

    currentModal.classList.add('active');
    currentZoom = 1;
    resetPosition();
    updateZoom();
    document.body.style.overflow = 'hidden';
  }

  // Hide modal
  function hideModal() {
    if (currentModal) {
      currentModal.classList.remove('active');
      document.body.style.overflow = '';
    }
  }

  // Update zoom level
  function updateZoom() {
    if (!currentModal) return;
    const diagram = currentModal.querySelector('.mermaid-modal-diagram');
    const zoomLevel = currentModal.querySelector('.mermaid-zoom-level');
    diagram.style.transform = `translate(${translateX}px, ${translateY}px) scale(${currentZoom})`;
    zoomLevel.textContent = `${Math.round(currentZoom * 100)}%`;
  }

  // Reset position when zoom changes
  function resetPosition() {
    translateX = 0;
    translateY = 0;
  }

  // Setup drag functionality
  function setupDrag(element) {
    element.addEventListener('mousedown', startDrag);
    element.addEventListener('touchstart', startDrag);
  }

  function startDrag(e) {
    if (e.type === 'mousedown' && e.button !== 0) return; // Only left click

    isDragging = true;
    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.classList.add('dragging');

    if (e.type === 'touchstart') {
      startX = e.touches[0].clientX - translateX;
      startY = e.touches[0].clientY - translateY;
    } else {
      startX = e.clientX - translateX;
      startY = e.clientY - translateY;
    }

    document.addEventListener('mousemove', drag);
    document.addEventListener('touchmove', drag);
    document.addEventListener('mouseup', stopDrag);
    document.addEventListener('touchend', stopDrag);
  }

  function drag(e) {
    if (!isDragging) return;
    e.preventDefault();

    if (e.type === 'touchmove') {
      translateX = e.touches[0].clientX - startX;
      translateY = e.touches[0].clientY - startY;
    } else {
      translateX = e.clientX - startX;
      translateY = e.clientY - startY;
    }

    updateZoom();
  }

  function stopDrag() {
    isDragging = false;
    const modalDiagram = currentModal?.querySelector('.mermaid-modal-diagram');
    if (modalDiagram) {
      modalDiagram.classList.remove('dragging');
    }
    document.removeEventListener('mousemove', drag);
    document.removeEventListener('touchmove', drag);
    document.removeEventListener('mouseup', stopDrag);
    document.removeEventListener('touchend', stopDrag);
  }

  // Setup modal event listeners
  function setupModalEvents() {
    if (!currentModal) return;

    // Close button
    currentModal.querySelector('.mermaid-close').addEventListener('click', hideModal);

    // Zoom buttons
    currentModal.querySelector('.zoom-in').addEventListener('click', () => {
      currentZoom = Math.min(currentZoom + 0.25, 3);
      updateZoom();
    });

    currentModal.querySelector('.zoom-out').addEventListener('click', () => {
      currentZoom = Math.max(currentZoom - 0.25, 0.5);
      updateZoom();
    });

    currentModal.querySelector('.zoom-reset').addEventListener('click', () => {
      currentZoom = 1;
      resetPosition();
      updateZoom();
    });

    // Close on background click
    currentModal.addEventListener('click', (e) => {
      if (e.target === currentModal) {
        hideModal();
      }
    });

    // Close on ESC key
    document.addEventListener('keydown', (e) => {
      if (e.key === 'Escape' && currentModal.classList.contains('active')) {
        hideModal();
      }
    });
  }

  // Convert Jekyll-rendered code blocks to mermaid divs
  document.addEventListener('DOMContentLoaded', async function() {
    const codeBlocks = document.querySelectorAll('code.language-mermaid');

    for (const codeBlock of codeBlocks) {
      const pre = codeBlock.parentElement;
      const container = document.createElement('div');
      container.className = 'mermaid-container';

      const mermaidDiv = document.createElement('div');
      mermaidDiv.className = 'mermaid';
      mermaidDiv.textContent = codeBlock.textContent;

      container.appendChild(mermaidDiv);
      pre.replaceWith(container);
    }

    // Render all mermaid diagrams
    try {
      await mermaid.run({
        querySelector: '.mermaid'
      });
      console.log('Mermaid diagrams rendered successfully');
    } catch (error) {
      console.error('Mermaid rendering error:', error);
    }

    // Add click handlers to rendered diagrams
    document.querySelectorAll('.mermaid-container').forEach((container, index) => {
      // Find the rendered SVG inside the container
      const svg = container.querySelector('svg');
      if (!svg) {
        console.warn(`No SVG found in container ${index}`);
        return;
      }

      // Make the container clickable
      container.style.cursor = 'pointer';
      container.title = '点击查看大图';

      container.addEventListener('click', function(e) {
        e.preventDefault();
        e.stopPropagation();

        // Clone the SVG for the modal
        const svgClone = svg.cloneNode(true);
        const tempDiv = document.createElement('div');
        tempDiv.appendChild(svgClone);

        console.log('Opening modal for diagram', index);
        showModal(tempDiv.innerHTML);
      });

      console.log(`Click handler added to diagram ${index}`);
    });
  });
</script>

<p>用户态栈空间有限（如 <code class="language-plaintext highlighter-rouge">ulimit -s</code> 的 8MB），访问栈底之外的地址会触发缺页；内核在缺页处理中决定是<strong>扩展栈</strong>（分配新页）还是<strong>拒绝访问</strong>（发 SIGSEGV）。本文从内核视角说明：用户栈在 Linux 里如何表示、缺页时栈如何向下扩展、为何会触顶溢出，以及缺页次数、页缓存与架构（如 ARM64 页大小）对现象的影响。文内引用内核源码路径与片段均对应本地树 <code class="language-plaintext highlighter-rouge">linux/</code>（如 <code class="language-plaintext highlighter-rouge">/Users/weli/works/linux</code>），便于对照阅读。</p>

<h2 id="一现象与问题">一、现象与问题</h2>

<p>一个常见现象：用 <code class="language-plaintext highlighter-rouge">perf stat -e page-faults</code> 跑一个「不断向栈下增长直到崩溃」的程序，可能只看到<strong>几百次缺页</strong>就发生 SIGSEGV，而栈已使用数 MB。会自然产生两个问题：</p>

<ol>
  <li>为什么「这么少」的缺页就会栈溢出？</li>
  <li>缺页次数在不同运行、不同架构下为何差异很大（例如 x86-64 第二次运行明显减少，ARM64 首次就很少）？</li>
</ol>

<p>下面用内核机制统一解释，并用 <a href="https://github.com/liweinan/stack-vs-heap-benchmark">stack-vs-heap-benchmark</a> 中的 <code class="language-plaintext highlighter-rouge">stack_overflow_test crash</code> 作为可复现的样例（非论述主体）。</p>

<h2 id="二用户栈在内核中的表示">二、用户栈在内核中的表示</h2>

<p>用户栈对应一个<strong>向下增长</strong>的 VMA（<code class="language-plaintext highlighter-rouge">struct vm_area_struct</code>），由 <code class="language-plaintext highlighter-rouge">VM_GROWSDOWN</code> 标记。</p>

<ul>
  <li><strong>栈顶</strong>：高地址，由用户态 SP 指向；初始栈顶由 loader/内核在 exec 时设定，并受 <code class="language-plaintext highlighter-rouge">arch_pick_mmap_layout()</code> 等影响，会为栈预留空间并留出 <strong>stack guard gap</strong>。</li>
  <li><strong>栈底（当前）</strong>：即该 VMA 的 <code class="language-plaintext highlighter-rouge">vm_start</code>（低地址）；栈「向下长」即 <code class="language-plaintext highlighter-rouge">vm_start</code> 变小，VMA 向低地址扩展。</li>
  <li><strong>栈大小限制</strong>：由 <code class="language-plaintext highlighter-rouge">RLIMIT_STACK</code>（<code class="language-plaintext highlighter-rouge">ulimit -s</code>）提供，内核在<strong>扩展栈</strong>时用该限制做检查，超过则拒绝扩展并导致本次缺页处理失败，进而向用户态发 SIGSEGV。</li>
</ul>

<p>栈与其它映射之间保留的间隔由全局变量 <code class="language-plaintext highlighter-rouge">stack_guard_gap</code> 控制（默认 256 页，即 4KB 页下 1MB）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// mm/mmap.c</span>
<span class="cm">/* enforced gap between the expanding stack and other mappings. */</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">stack_guard_gap</span> <span class="o">=</span> <span class="mi">256UL</span><span class="o">&lt;&lt;</span><span class="n">PAGE_SHIFT</span><span class="p">;</span>
</code></pre></div></div>

<p>因此：<strong>栈溢出</strong>在内核侧的语义是——缺页发生在当前栈 VMA 的 <code class="language-plaintext highlighter-rouge">vm_start</code> 之下，且要么扩展会超过 <code class="language-plaintext highlighter-rouge">RLIMIT_STACK</code>，要么会侵入 <code class="language-plaintext highlighter-rouge">stack_guard_gap</code> 或其它映射，从而不允许扩展，只能返回错误并让上层发 SIGSEGV。</p>

<h2 id="三补充mm_struct--task_struct--vm_area_struct-的关系校对到当前内核">三、补充：<code class="language-plaintext highlighter-rouge">mm_struct</code> / <code class="language-plaintext highlighter-rouge">task_struct</code> / <code class="language-plaintext highlighter-rouge">vm_area_struct</code> 的关系（校对到当前内核）</h2>

<p>为避免把旧资料中的字段名带入本文，这里先给出与当前内核（<code class="language-plaintext highlighter-rouge">/Users/weli/works/linux</code>）一致的结构关系。你在阅读后续缺页与栈扩展路径时，可以把这张图当作“对象关系索引”。</p>

<pre><code class="language-mermaid">graph TB
    subgraph TASK[任务层]
      T[task_struct]
      TMM[mm]
      TAMM[active_mm]
    end

    subgraph MM[地址空间层]
      M[mm_struct]
      MMT[mm_mt\nMaple Tree of VMA]
      MPGD[pgd\npage table root]
      MLOCK[mmap_lock]
      MCOUNT[mm_users / mm_count]
      MSTAT[total_vm locked_vm stack_vm ...]
      MBOUND[start_code start_brk brk start_stack ...]
    end

    subgraph VMA[VMA层]
      V[vm_area_struct]
      VADDR[vm_start .. vm_end]
      VFLAGS[vm_flags]
      VFILE[vm_file / anon_vma]
    end

    subgraph PT[页表层 x86_64]
      PGD[PGD]
      P4D[P4D]
      PUD[PUD]
      PMD[PMD]
      PTE[PTE]
      PF[Page Frame]
    end

    T --&gt; TMM --&gt; M
    T --&gt; TAMM --&gt; M

    M --&gt; MMT --&gt; V
    M --&gt; MPGD --&gt; PGD
    M --&gt; MLOCK
    M --&gt; MCOUNT
    M --&gt; MSTAT
    M --&gt; MBOUND

    V --&gt; VADDR
    V --&gt; VFLAGS
    V --&gt; VFILE

    PGD --&gt; P4D --&gt; PUD --&gt; PMD --&gt; PTE --&gt; PF
    V -. address validity / permission .-&gt; PTE
</code></pre>

<h3 id="本图对应的关键校对点">本图对应的关键校对点</h3>

<ul>
  <li><code class="language-plaintext highlighter-rouge">mm_struct</code> 当前主组织结构是 <strong><code class="language-plaintext highlighter-rouge">mm_mt</code>（Maple Tree）</strong>，不是旧口径里的 <code class="language-plaintext highlighter-rouge">mmap + mm_rb</code>。</li>
  <li><code class="language-plaintext highlighter-rouge">mm_struct</code> 的 VMA 锁字段是 <strong><code class="language-plaintext highlighter-rouge">mmap_lock</code></strong>，不是 <code class="language-plaintext highlighter-rouge">mmap_sem</code>。</li>
  <li><code class="language-plaintext highlighter-rouge">task_struct</code> 里 <code class="language-plaintext highlighter-rouge">mm</code> / <code class="language-plaintext highlighter-rouge">active_mm</code> 的关系与经典描述一致。</li>
  <li>缺页建立映射时，VMA 负责“地址区间与权限语义”，页表负责“虚拟地址到物理页”的具体映射。</li>
</ul>

<h3 id="核心结构体定义文件对照阅读">核心结构体定义文件（对照阅读）</h3>

<ul>
  <li><code class="language-plaintext highlighter-rouge">struct mm_struct</code>：<code class="language-plaintext highlighter-rouge">include/linux/mm_types.h</code></li>
  <li><code class="language-plaintext highlighter-rouge">struct vm_area_struct</code>：<code class="language-plaintext highlighter-rouge">include/linux/mm_types.h</code></li>
  <li><code class="language-plaintext highlighter-rouge">struct task_struct</code>：<code class="language-plaintext highlighter-rouge">include/linux/sched.h</code></li>
  <li><code class="language-plaintext highlighter-rouge">mm_struct.mm_mt</code> 的类型 <code class="language-plaintext highlighter-rouge">struct maple_tree</code>：<code class="language-plaintext highlighter-rouge">include/linux/maple_tree.h</code>
    <ul>
      <li>字段出现位置：<code class="language-plaintext highlighter-rouge">include/linux/mm_types.h</code>（<code class="language-plaintext highlighter-rouge">struct maple_tree mm_mt;</code>）</li>
      <li>Maple Tree 实现文件：<code class="language-plaintext highlighter-rouge">lib/maple_tree.c</code></li>
    </ul>
  </li>
</ul>

<h3 id="maple-tree-与-vma-的真实绑定关系结构--流程">Maple Tree 与 VMA 的真实绑定关系（结构 + 流程）</h3>

<p>上面的关系图强调了 <code class="language-plaintext highlighter-rouge">mm_mt</code> 与 VMA 的关联，这里把“结构体层面怎么存”说清楚：</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">mm_struct</code> 里持有 <code class="language-plaintext highlighter-rouge">struct maple_tree mm_mt</code>（树容器）。</li>
  <li><code class="language-plaintext highlighter-rouge">maple_tree</code> 本体（<code class="language-plaintext highlighter-rouge">struct maple_tree</code>）只有锁、flags、<code class="language-plaintext highlighter-rouge">ma_root</code> 根指针，不直接内嵌 <code class="language-plaintext highlighter-rouge">vm_area_struct</code>。</li>
  <li><code class="language-plaintext highlighter-rouge">ma_root</code> 是编码过的 <code class="language-plaintext highlighter-rouge">void *</code> 入口：
    <ul>
      <li>常见（多条目）情况：<code class="language-plaintext highlighter-rouge">ma_root -&gt; maple_node -&gt; slot[] -&gt; vma*</code></li>
      <li>单条目优化情况：<code class="language-plaintext highlighter-rouge">ma_root</code> 可直接承载条目（编码后的 <code class="language-plaintext highlighter-rouge">vma*</code>），不经过 <code class="language-plaintext highlighter-rouge">maple_node</code></li>
    </ul>
  </li>
  <li>真正的节点是 <code class="language-plaintext highlighter-rouge">struct maple_node</code>；节点里有 <code class="language-plaintext highlighter-rouge">slot[]</code>，并通过 <code class="language-plaintext highlighter-rouge">maple_range_64</code> / <code class="language-plaintext highlighter-rouge">maple_arange_64</code> 维护 <code class="language-plaintext highlighter-rouge">pivot[]</code>（地址分界）。</li>
  <li>在 mm 场景中，<code class="language-plaintext highlighter-rouge">slot[]</code> 存放的是 <code class="language-plaintext highlighter-rouge">struct vm_area_struct *</code>（以 <code class="language-plaintext highlighter-rouge">void *</code> 形式存）。</li>
</ol>

<pre><code class="language-mermaid">graph TB
  MM["mm_struct"]
  MT["mm_mt: struct maple_tree"]
  ROOT["ma_root"]
  DIRECT["direct encoded entry&lt;br/&gt;(single-entry optimization)"]
  NODE["maple_node"]
  PIV["pivot array&lt;br/&gt;地址区间边界"]
  SLOTS["slot array&lt;br/&gt;value = vma*"]
  A["VMA_A*&lt;br/&gt;vm_start..vm_end"]
  B["VMA_B*&lt;br/&gt;vm_start..vm_end"]
  C["VMA_C*&lt;br/&gt;vm_start..vm_end"]

  MM --&gt; MT --&gt; ROOT
  ROOT --&gt; NODE
  ROOT --&gt; DIRECT --&gt; A
  NODE --&gt; PIV
  NODE --&gt; SLOTS
  SLOTS --&gt; A
  SLOTS --&gt; B
  SLOTS --&gt; C
</code></pre>

<h3 id="创建绑定过程地址区间---vma-指针">创建（绑定）过程：地址区间 -&gt; VMA 指针</h3>

<p>在 <code class="language-plaintext highlighter-rouge">mm/vma.h</code> 的 <code class="language-plaintext highlighter-rouge">vma_iter_store_gfp()</code> 里，内核会：</p>

<ul>
  <li>用 <code class="language-plaintext highlighter-rouge">__mas_set_range(&amp;vmi-&gt;mas, vma-&gt;vm_start, vma-&gt;vm_end - 1)</code> 设定 key 区间；</li>
  <li>再用 <code class="language-plaintext highlighter-rouge">mas_store_gfp(&amp;vmi-&gt;mas, vma, gfp)</code> 把 <code class="language-plaintext highlighter-rouge">vma*</code> 作为 value 存入 <code class="language-plaintext highlighter-rouge">mm_mt</code>。</li>
</ul>

<p>因此绑定关系是：</p>

<ul>
  <li><strong>key</strong> = 虚拟地址范围 <code class="language-plaintext highlighter-rouge">[vm_start, vm_end)</code>（内部以 <code class="language-plaintext highlighter-rouge">start..end-1</code> 存）</li>
  <li><strong>value</strong> = <code class="language-plaintext highlighter-rouge">struct vm_area_struct *</code></li>
</ul>

<h3 id="查找过程给定地址---命中-vma">查找过程：给定地址 -&gt; 命中 VMA</h3>

<p><code class="language-plaintext highlighter-rouge">vma_iterator</code> 通过 <code class="language-plaintext highlighter-rouge">mas_init(&amp;vmi-&gt;mas, &amp;mm-&gt;mm_mt, addr)</code> 绑定到当前进程的 <code class="language-plaintext highlighter-rouge">mm_mt</code>，
随后 <code class="language-plaintext highlighter-rouge">vma_find()</code> 调 <code class="language-plaintext highlighter-rouge">mas_find()</code> 从 <code class="language-plaintext highlighter-rouge">ma_root</code> 开始查找：</p>

<ul>
  <li>若 <code class="language-plaintext highlighter-rouge">ma_root</code> 为直接条目，直接返回对应 <code class="language-plaintext highlighter-rouge">vm_area_struct *</code>；</li>
  <li>若 <code class="language-plaintext highlighter-rouge">ma_root</code> 指向节点，则按 <code class="language-plaintext highlighter-rouge">pivot</code> 导航到 <code class="language-plaintext highlighter-rouge">slot[]</code>，再返回对应 <code class="language-plaintext highlighter-rouge">vm_area_struct *</code>。</li>
</ul>

<p>这一点也解释了为什么当前内核文档口径应该写成：</p>

<ul>
  <li>“Maple Tree (<code class="language-plaintext highlighter-rouge">mm_mt</code>) 按地址区间索引 VMA 指针”</li>
</ul>

<p>而不是旧口径的“<code class="language-plaintext highlighter-rouge">mmap</code> 链表 + <code class="language-plaintext highlighter-rouge">mm_rb</code> 红黑树”主路径。</p>

<h2 id="四缺页时栈如何扩展从查-vma-到-expand_downwards">四、缺页时栈如何扩展：从查 VMA 到 expand_downwards</h2>

<p>用户访问栈上尚未映射的地址时，CPU 触发缺页异常，进入架构相关的 fault 处理（如 x86-64 的 <code class="language-plaintext highlighter-rouge">do_user_addr_fault</code>），再通过通用层查找 VMA 并决定是否扩展栈。</p>

<h3 id="41-查找-vma-与无-vma-则尝试扩展栈">4.1 查找 VMA 与「无 VMA 则尝试扩展栈」</h3>

<p>在支持 <code class="language-plaintext highlighter-rouge">CONFIG_LOCK_MM_AND_FIND_VMA</code> 的路径上（如 x86-64），会调用 <code class="language-plaintext highlighter-rouge">lock_mm_and_find_vma()</code>（<code class="language-plaintext highlighter-rouge">mm/mmap_lock.c</code>）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// mm/mmap_lock.c (约 251–286 行)</span>
<span class="n">vma</span> <span class="o">=</span> <span class="n">find_vma</span><span class="p">(</span><span class="n">mm</span><span class="p">,</span> <span class="n">addr</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">likely</span><span class="p">(</span><span class="n">vma</span> <span class="o">&amp;&amp;</span> <span class="p">(</span><span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_start</span> <span class="o">&lt;=</span> <span class="n">addr</span><span class="p">)))</span>
    <span class="k">return</span> <span class="n">vma</span><span class="p">;</span>

<span class="cm">/* 地址落在某 VMA 起始之下，仅当该 VMA 是向下扩展的栈时才允许扩展 */</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">vma</span> <span class="o">||</span> <span class="o">!</span><span class="p">(</span><span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_flags</span> <span class="o">&amp;</span> <span class="n">VM_GROWSDOWN</span><span class="p">))</span> <span class="p">{</span>
    <span class="n">mmap_read_unlock</span><span class="p">(</span><span class="n">mm</span><span class="p">);</span>
    <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>   <span class="cm">/* 上层会进入 bad_area，发 SIGSEGV */</span>
<span class="p">}</span>
<span class="c1">// ...</span>
<span class="k">if</span> <span class="p">(</span><span class="n">expand_stack_locked</span><span class="p">(</span><span class="n">vma</span><span class="p">,</span> <span class="n">addr</span><span class="p">))</span>
    <span class="k">goto</span> <span class="n">fail</span><span class="p">;</span>
</code></pre></div></div>

<p>含义：若 <code class="language-plaintext highlighter-rouge">addr</code> 不在任何已有 VMA 内（或正好在栈 VMA 的 <code class="language-plaintext highlighter-rouge">vm_start</code> 之下），则只有当前「紧邻其上的」VMA 是 <code class="language-plaintext highlighter-rouge">VM_GROWSDOWN</code>（用户栈）时才尝试扩展；否则返回 NULL，缺页无法解析，最终发 SIGSEGV。</p>

<h3 id="42-expand_stack_locked--expand_downwards">4.2 expand_stack_locked → expand_downwards</h3>

<p><code class="language-plaintext highlighter-rouge">expand_stack_locked()</code> 在向下扩展的配置下（常见配置）直接调用 <code class="language-plaintext highlighter-rouge">expand_downwards()</code>（<code class="language-plaintext highlighter-rouge">mm/mmap.c</code> 与 <code class="language-plaintext highlighter-rouge">mm/vma.c</code>）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// mm/mmap.c</span>
<span class="kt">int</span> <span class="nf">expand_stack_locked</span><span class="p">(</span><span class="k">struct</span> <span class="n">vm_area_struct</span> <span class="o">*</span><span class="n">vma</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">address</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">expand_downwards</span><span class="p">(</span><span class="n">vma</span><span class="p">,</span> <span class="n">address</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">expand_downwards()</code>（<code class="language-plaintext highlighter-rouge">mm/vma.c</code> 约 3024–3102 行）主要做三件事：</p>

<ol>
  <li><strong>检查 VM_GROWSDOWN</strong>，并做地址与 <code class="language-plaintext highlighter-rouge">mmap_min_addr</code> 等校验。</li>
  <li><strong>强制 stack_guard_gap</strong>：若在 <code class="language-plaintext highlighter-rouge">addr</code> 下方存在其它可访问的 VMA，且与当前栈的间距小于 <code class="language-plaintext highlighter-rouge">stack_guard_gap</code>，则拒绝扩展，返回 <code class="language-plaintext highlighter-rouge">-ENOMEM</code>。</li>
  <li><strong>在允许扩展的前提下</strong>，调用 <code class="language-plaintext highlighter-rouge">acct_stack_growth()</code> 做栈限制检查；通过则更新 VMA 的 <code class="language-plaintext highlighter-rouge">vm_start</code>（及相关结构），完成栈的向下延伸。</li>
</ol>

<h3 id="43-栈限制检查acct_stack_growth">4.3 栈限制检查：acct_stack_growth</h3>

<p>栈能扩展的「总大小」由 <code class="language-plaintext highlighter-rouge">rlimit(RLIMIT_STACK)</code> 限制（对应 <code class="language-plaintext highlighter-rouge">ulimit -s</code>）。扩展前在 <code class="language-plaintext highlighter-rouge">acct_stack_growth()</code> 中统一检查（<code class="language-plaintext highlighter-rouge">mm/vma.c</code> 约 2898–2930 行）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// mm/vma.c</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">acct_stack_growth</span><span class="p">(</span><span class="k">struct</span> <span class="n">vm_area_struct</span> <span class="o">*</span><span class="n">vma</span><span class="p">,</span>
                             <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">size</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">grow</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">mm_struct</span> <span class="o">*</span><span class="n">mm</span> <span class="o">=</span> <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_mm</span><span class="p">;</span>
    <span class="c1">// ...</span>
    <span class="cm">/* Stack limit test */</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">size</span> <span class="o">&gt;</span> <span class="n">rlimit</span><span class="p">(</span><span class="n">RLIMIT_STACK</span><span class="p">))</span>
        <span class="k">return</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span>
    <span class="c1">// ...</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>这里 <code class="language-plaintext highlighter-rouge">size</code> 是扩展后的栈 VMA 总大小。一旦「当前栈已用 + 本次要扩展」超过 <code class="language-plaintext highlighter-rouge">RLIMIT_STACK</code>，就返回 <code class="language-plaintext highlighter-rouge">-ENOMEM</code>，<code class="language-plaintext highlighter-rouge">expand_downwards()</code> 失败，缺页路径无法解析该地址，上层会进入 bad_area 并给进程发 SIGSEGV。也就是说：<strong>触顶 = 扩展被 rlimit 拒绝</strong>，而不是「多缺了一次页」本身；缺页只是触发这次检查的契机。</p>

<h3 id="44-扩展成功后匿名页分配">4.4 扩展成功后：匿名页分配</h3>

<p>扩展栈 VMA 只调整了虚拟区间（<code class="language-plaintext highlighter-rouge">vm_start</code> 下移），并未立刻分配物理页。物理页在<strong>第一次访问</strong>该新区间内的地址时，由通用缺页逻辑分配：此时 VMA 已包含该地址，<code class="language-plaintext highlighter-rouge">find_vma</code> 会命中栈 VMA，进入 <code class="language-plaintext highlighter-rouge">handle_mm_fault()</code> → <code class="language-plaintext highlighter-rouge">__handle_mm_fault()</code> → <code class="language-plaintext highlighter-rouge">handle_pte_fault()</code>，对匿名、未映射的 PTE 走 <code class="language-plaintext highlighter-rouge">do_anonymous_page()</code>（<code class="language-plaintext highlighter-rouge">mm/memory.c</code> 约 5022 行），分配匿名页并建立映射。因此：<strong>每第一次接触一个新页，产生一次缺页</strong>；栈用量由 SP 下移多少决定，缺页次数则等于「新被触及的页数」，二者相关但不等价。</p>

<h2 id="五缺页与触顶的完整路径小结">五、缺页与「触顶」的完整路径小结</h2>

<ol>
  <li><strong>用户访问</strong>栈下未映射地址 → CPU #PF。</li>
  <li><strong>arch</strong>（如 <code class="language-plaintext highlighter-rouge">arch/x86/mm/fault.c</code>）→ <code class="language-plaintext highlighter-rouge">do_user_addr_fault()</code> → <code class="language-plaintext highlighter-rouge">lock_mm_and_find_vma(mm, address, regs)</code>。</li>
  <li><strong>mm/mmap_lock.c</strong>：<code class="language-plaintext highlighter-rouge">find_vma(mm, addr)</code>；若 <code class="language-plaintext highlighter-rouge">addr</code> 在栈 VMA 之下且该 VMA 为 <code class="language-plaintext highlighter-rouge">VM_GROWSDOWN</code>，则 <code class="language-plaintext highlighter-rouge">expand_stack_locked(vma, addr)</code>。</li>
  <li><strong>mm/mmap.c</strong>：<code class="language-plaintext highlighter-rouge">expand_stack_locked()</code> → <strong>mm/vma.c</strong>：<code class="language-plaintext highlighter-rouge">expand_downwards()</code> → 检查 <code class="language-plaintext highlighter-rouge">stack_guard_gap</code>，再 <code class="language-plaintext highlighter-rouge">acct_stack_growth()</code> 检查 <code class="language-plaintext highlighter-rouge">size &gt; rlimit(RLIMIT_STACK)</code>；若通过则扩展 <code class="language-plaintext highlighter-rouge">vma-&gt;vm_start</code>。</li>
  <li>若扩展失败（rlimit 或 guard gap），<code class="language-plaintext highlighter-rouge">lock_mm_and_find_vma</code> 返回 NULL，arch 层进入 bad_area → 向用户态发 <strong>SIGSEGV</strong>。</li>
  <li>若扩展成功，返回用户态重试指令，再次访问同一地址时已落在栈 VMA 内，走正常缺页：<strong>mm/memory.c</strong> <code class="language-plaintext highlighter-rouge">handle_mm_fault()</code> → <code class="language-plaintext highlighter-rouge">__handle_mm_fault()</code> → <code class="language-plaintext highlighter-rouge">handle_pte_fault()</code> → <code class="language-plaintext highlighter-rouge">do_anonymous_page()</code>，分配物理页并建立 PTE。</li>
</ol>

<p>因此：<strong>「224 次缺页就崩溃」</strong> 表示整次进程运行共发生 224 次缺页（包括栈、代码段、库、guard 等）；<strong>最后一次（或临界一次）</strong> 是访问到了<strong>不允许扩展的区域</strong>（超过 RLIMIT_STACK 或进入 guard gap），内核拒绝扩展并发 SIGSEGV，而不是「第 224 次缺页时多分配了一页栈」。</p>

<h2 id="六页缓存物理页与缺页次数">六、页缓存、物理页与缺页次数</h2>

<ul>
  <li><strong>物理页</strong>：进程退出后，部分物理页可能仍留在系统（匿名页回收策略、文件页的 page cache）。新进程再次运行同一程序时，可能复用这些物理页，但<strong>页表是 per-process 的</strong>，进程退出后页表销毁，新进程必须重新建立虚拟地址到物理页的映射，因此仍会触发缺页。</li>
  <li><strong>栈是匿名映射</strong>：栈不对应文件，不能像文件映射那样用 (inode, offset) 做 page cache 的键。第二次运行缺页减少，主要来自<strong>代码段、共享库等文件映射</strong>的缓存命中，以及系统中仍有可复用的物理页被新进程映射；栈区本身每个进程独立，只是若系统未立刻回收，新进程可能复用刚释放的物理页，缺页数会少一些。</li>
  <li>内核的页缓存与物理页复用是<strong>全局的</strong>，不按进程或程序名区分；多进程可共享同一物理页（如代码段、共享库），体现的是「尽量共享」的设计。</li>
</ul>

<h2 id="七arm64-与页大小thp">七、ARM64 与页大小、THP</h2>

<p>在 <strong>ARM64</strong> 上，许多配置使用 <strong>16KB</strong> 甚至 <strong>64KB</strong> 的页，且可能启用透明大页（THP），同样大小的栈所需<strong>页数</strong>远少于 x86-64 的 4KB 页，因此<strong>同一次运行</strong>的缺页次数会少很多（例如首次运行就只看到约 226 次）。这与「第二次运行因缓存而减少」是不同原因：前者是<strong>架构与页大小</strong>，后者是<strong>物理页/页表复用</strong>。</p>

<h2 id="八如何复现与对照内核">八、如何复现与对照内核</h2>

<ul>
  <li>查看栈限制与页大小：<code class="language-plaintext highlighter-rouge">ulimit -s</code>、<code class="language-plaintext highlighter-rouge">getconf PAGESIZE</code>。</li>
  <li>用 <a href="https://github.com/liweinan/stack-vs-heap-benchmark">stack-vs-heap-benchmark</a> 复现：<code class="language-plaintext highlighter-rouge">make stack_overflow_test</code>，<code class="language-plaintext highlighter-rouge">perf stat -e page-faults ./stack_overflow_test crash</code>。该程序先递归消耗约 6–7MB 栈，再在剩余空间内用汇编每次 push 8KB 直到触顶，便于观察「总栈接近 8MB 时的一次失败扩展」与缺页计数的关系。</li>
  <li>对照阅读内核（以你本机路径为准，例如 <code class="language-plaintext highlighter-rouge">/Users/weli/works/linux</code>）：
    <ul>
      <li><strong>mm/mmap_lock.c</strong>：<code class="language-plaintext highlighter-rouge">lock_mm_and_find_vma()</code> 中 <code class="language-plaintext highlighter-rouge">find_vma</code> 与 <code class="language-plaintext highlighter-rouge">expand_stack_locked()</code> 的调用；</li>
      <li><strong>mm/mmap.c</strong>：<code class="language-plaintext highlighter-rouge">stack_guard_gap</code>、<code class="language-plaintext highlighter-rouge">expand_stack_locked()</code>、<code class="language-plaintext highlighter-rouge">expand_stack()</code>；</li>
      <li><strong>mm/vma.c</strong>：<code class="language-plaintext highlighter-rouge">expand_downwards()</code>、<code class="language-plaintext highlighter-rouge">acct_stack_growth()</code> 及 <code class="language-plaintext highlighter-rouge">rlimit(RLIMIT_STACK)</code> 检查；</li>
      <li><strong>mm/memory.c</strong>：<code class="language-plaintext highlighter-rouge">handle_mm_fault()</code>、<code class="language-plaintext highlighter-rouge">__handle_mm_fault()</code>、<code class="language-plaintext highlighter-rouge">do_anonymous_page()</code>；</li>
      <li><strong>arch/x86/mm/fault.c</strong>：<code class="language-plaintext highlighter-rouge">do_user_addr_fault()</code> 及对 <code class="language-plaintext highlighter-rouge">lock_mm_and_find_vma()</code> 的调用。</li>
    </ul>
  </li>
</ul>

<h2 id="九结论">九、结论</h2>

<ul>
  <li>用户栈在内核中是一个 <strong>VM_GROWSDOWN</strong> 的 VMA；扩展时受 <strong>rlimit(RLIMIT_STACK)</strong> 和 <strong>stack_guard_gap</strong> 约束。</li>
  <li>缺页时，若地址落在栈 VMA 之下，内核通过 <strong>expand_stack_locked → expand_downwards → acct_stack_growth</strong> 决定是否扩展；超过 rlimit 或违反 guard gap 则拒绝扩展，本次缺页无法解析，进程收到 <strong>SIGSEGV</strong>。</li>
  <li>缺页次数 = 本次运行中「首次触及」的页数（栈、代码、库、guard 等合计），与「栈总用量」相关但不等同；触顶由<strong>扩展被内核拒绝</strong>决定，而非缺页计数达到某个值。</li>
  <li>第二次运行缺页减少主要来自物理页/文件映射的复用；ARM64 下首次运行缺页就较少则主要来自更大页与 THP。</li>
</ul>

<p>若你希望把栈溢出、缺页与 rlimit 的结论落实到具体可跑的程序上，可用上述 benchmark 项目配合本机内核源码一起对照。</p>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">OpenShift ccoctl 与 AWS STS 短期凭证：从原理到实践</title><link href="https://weinan.io/2026/03/16/openshift-ccoctl-sts-credentials.html" rel="alternate" type="text/html" title="OpenShift ccoctl 与 AWS STS 短期凭证：从原理到实践" /><published>2026-03-16T00:00:00+00:00</published><updated>2026-03-16T00:00:00+00:00</updated><id>https://weinan.io/2026/03/16/openshift-ccoctl-sts-credentials</id><content type="html" xml:base="https://weinan.io/2026/03/16/openshift-ccoctl-sts-credentials.html"><![CDATA[<style>
/* Mermaid diagram container */
.mermaid-container {
  position: relative;
  display: block;
  cursor: pointer;
  transition: opacity 0.2s;
  max-width: 100%;
  margin: 20px 0;
  overflow-x: auto;
}

.mermaid-container:hover {
  opacity: 0.8;
}

.mermaid-container svg {
  max-width: 100%;
  height: auto;
  display: block;
}

/* Modal overlay */
.mermaid-modal {
  display: none;
  position: fixed;
  z-index: 9999;
  left: 0;
  top: 0;
  width: 100%;
  height: 100%;
  background-color: rgba(0, 0, 0, 0.9);
  animation: fadeIn 0.3s;
}

.mermaid-modal.active {
  display: flex;
  align-items: center;
  justify-content: center;
}

@keyframes fadeIn {
  from { opacity: 0; }
  to { opacity: 1; }
}

/* Modal content */
.mermaid-modal-content {
  position: relative;
  width: 90vw;
  height: 90vh;
  overflow: hidden;
  background: white;
  padding: 20px;
  border-radius: 8px;
  box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
  display: flex;
  align-items: center;
  justify-content: center;
}

.mermaid-modal-diagram {
  transform-origin: center center;
  transition: transform 0.2s ease;
  display: inline-block;
  min-width: 100%;
  cursor: grab;
  user-select: none;
}

.mermaid-modal-diagram.dragging {
  cursor: grabbing;
  transition: none;
}

.mermaid-modal-diagram svg {
  width: 100%;
  height: auto;
  display: block;
  pointer-events: none;
}

/* Control buttons */
.mermaid-controls {
  position: absolute;
  top: 10px;
  right: 10px;
  display: flex;
  gap: 8px;
  z-index: 10000;
}

.mermaid-btn {
  background: rgba(255, 255, 255, 0.9);
  border: 1px solid #ddd;
  border-radius: 4px;
  padding: 8px 12px;
  cursor: pointer;
  font-size: 14px;
  transition: background 0.2s;
  color: #333;
  font-weight: 500;
}

.mermaid-btn:hover {
  background: white;
  box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}

/* Close button */
.mermaid-close {
  background: #f44336;
  color: white;
  border: none;
}

.mermaid-close:hover {
  background: #d32f2f;
}

/* Zoom indicator */
.mermaid-zoom-level {
  position: absolute;
  bottom: 20px;
  left: 20px;
  background: rgba(0, 0, 0, 0.7);
  color: white;
  padding: 6px 12px;
  border-radius: 4px;
  font-size: 14px;
  z-index: 10000;
}
</style>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';

  mermaid.initialize({
    startOnLoad: false,
    theme: 'default',
    securityLevel: 'loose',
    htmlLabels: true,
    themeVariables: {
      fontSize: '14px'
    }
  });

  let currentZoom = 1;
  let currentModal = null;
  let isDragging = false;
  let startX = 0;
  let startY = 0;
  let translateX = 0;
  let translateY = 0;

  // Create modal HTML
  function createModal() {
    const modal = document.createElement('div');
    modal.className = 'mermaid-modal';
    modal.innerHTML = `
      <div class="mermaid-controls">
        <button class="mermaid-btn zoom-in">放大 +</button>
        <button class="mermaid-btn zoom-out">缩小 -</button>
        <button class="mermaid-btn zoom-reset">重置</button>
        <button class="mermaid-btn mermaid-close">关闭 ✕</button>
      </div>
      <div class="mermaid-modal-content">
        <div class="mermaid-modal-diagram"></div>
      </div>
      <div class="mermaid-zoom-level">100%</div>
    `;
    document.body.appendChild(modal);
    return modal;
  }

  // Show modal with diagram
  function showModal(diagramContent) {
    if (!currentModal) {
      currentModal = createModal();
      setupModalEvents();
    }

    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.innerHTML = diagramContent;

    // Remove any width/height attributes from SVG to make it responsive
    const svg = modalDiagram.querySelector('svg');
    if (svg) {
      svg.removeAttribute('width');
      svg.removeAttribute('height');
      svg.style.width = '100%';
      svg.style.height = 'auto';
    }

    // Setup drag functionality
    setupDrag(modalDiagram);

    currentModal.classList.add('active');
    currentZoom = 1;
    resetPosition();
    updateZoom();
    document.body.style.overflow = 'hidden';
  }

  // Hide modal
  function hideModal() {
    if (currentModal) {
      currentModal.classList.remove('active');
      document.body.style.overflow = '';
    }
  }

  // Update zoom level
  function updateZoom() {
    if (!currentModal) return;
    const diagram = currentModal.querySelector('.mermaid-modal-diagram');
    const zoomLevel = currentModal.querySelector('.mermaid-zoom-level');
    diagram.style.transform = `translate(${translateX}px, ${translateY}px) scale(${currentZoom})`;
    zoomLevel.textContent = `${Math.round(currentZoom * 100)}%`;
  }

  // Reset position when zoom changes
  function resetPosition() {
    translateX = 0;
    translateY = 0;
  }

  // Setup drag functionality
  function setupDrag(element) {
    element.addEventListener('mousedown', startDrag);
    element.addEventListener('touchstart', startDrag);
  }

  function startDrag(e) {
    if (e.type === 'mousedown' && e.button !== 0) return; // Only left click

    isDragging = true;
    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.classList.add('dragging');

    if (e.type === 'touchstart') {
      startX = e.touches[0].clientX - translateX;
      startY = e.touches[0].clientY - translateY;
    } else {
      startX = e.clientX - translateX;
      startY = e.clientY - translateY;
    }

    document.addEventListener('mousemove', drag);
    document.addEventListener('touchmove', drag);
    document.addEventListener('mouseup', stopDrag);
    document.addEventListener('touchend', stopDrag);
  }

  function drag(e) {
    if (!isDragging) return;
    e.preventDefault();

    if (e.type === 'touchmove') {
      translateX = e.touches[0].clientX - startX;
      translateY = e.touches[0].clientY - startY;
    } else {
      translateX = e.clientX - startX;
      translateY = e.clientY - startY;
    }

    updateZoom();
  }

  function stopDrag() {
    isDragging = false;
    const modalDiagram = currentModal?.querySelector('.mermaid-modal-diagram');
    if (modalDiagram) {
      modalDiagram.classList.remove('dragging');
    }
    document.removeEventListener('mousemove', drag);
    document.removeEventListener('touchmove', drag);
    document.removeEventListener('mouseup', stopDrag);
    document.removeEventListener('touchend', stopDrag);
  }

  // Setup modal event listeners
  function setupModalEvents() {
    if (!currentModal) return;

    // Close button
    currentModal.querySelector('.mermaid-close').addEventListener('click', hideModal);

    // Zoom buttons
    currentModal.querySelector('.zoom-in').addEventListener('click', () => {
      currentZoom = Math.min(currentZoom + 0.25, 3);
      updateZoom();
    });

    currentModal.querySelector('.zoom-out').addEventListener('click', () => {
      currentZoom = Math.max(currentZoom - 0.25, 0.5);
      updateZoom();
    });

    currentModal.querySelector('.zoom-reset').addEventListener('click', () => {
      currentZoom = 1;
      resetPosition();
      updateZoom();
    });

    // Close on background click
    currentModal.addEventListener('click', (e) => {
      if (e.target === currentModal) {
        hideModal();
      }
    });

    // Close on ESC key
    document.addEventListener('keydown', (e) => {
      if (e.key === 'Escape' && currentModal.classList.contains('active')) {
        hideModal();
      }
    });
  }

  // Convert Jekyll-rendered code blocks to mermaid divs
  document.addEventListener('DOMContentLoaded', async function() {
    const codeBlocks = document.querySelectorAll('code.language-mermaid');

    for (const codeBlock of codeBlocks) {
      const pre = codeBlock.parentElement;
      const container = document.createElement('div');
      container.className = 'mermaid-container';

      const mermaidDiv = document.createElement('div');
      mermaidDiv.className = 'mermaid';
      mermaidDiv.textContent = codeBlock.textContent;

      container.appendChild(mermaidDiv);
      pre.replaceWith(container);
    }

    // Render all mermaid diagrams
    try {
      await mermaid.run({
        querySelector: '.mermaid'
      });
      console.log('Mermaid diagrams rendered successfully');
    } catch (error) {
      console.error('Mermaid rendering error:', error);
    }

    // Add click handlers to rendered diagrams
    document.querySelectorAll('.mermaid-container').forEach((container, index) => {
      // Find the rendered SVG inside the container
      const svg = container.querySelector('svg');
      if (!svg) {
        console.warn(`No SVG found in container ${index}`);
        return;
      }

      // Make the container clickable
      container.style.cursor = 'pointer';
      container.title = '点击查看大图';

      container.addEventListener('click', function(e) {
        e.preventDefault();
        e.stopPropagation();

        // Clone the SVG for the modal
        const svgClone = svg.cloneNode(true);
        const tempDiv = document.createElement('div');
        tempDiv.appendChild(svgClone);

        console.log('Opening modal for diagram', index);
        showModal(tempDiv.innerHTML);
      });

      console.log(`Click handler added to diagram ${index}`);
    });
  });
</script>

<p><code class="language-plaintext highlighter-rouge">ccoctl</code> 是 OpenShift 的云凭据操作符 (CCO) 实用程序，其主要用途是在<strong>手动模式</strong>下，为各个集群组件在云提供商上<strong>创建和管理精细化的短期权限凭证</strong>，从而避免在集群中存储高权限的长期凭证，提升集群安全性。</p>

<p>简单来说，它允许您为 OpenShift 的每个组件（如镜像仓库、存储驱动、Ingress Controller 等）分别创建独立的、最小权限的云账号，而不是使用一个拥有全局权限的管理员账号。</p>

<h2 id="一为什么需要-ccoctl与默认模式的对比">一、为什么需要 ccoctl？与默认模式的对比</h2>

<h3 id="核心作用">核心作用</h3>

<p><code class="language-plaintext highlighter-rouge">ccoctl</code> 主要用在需要<strong>最高安全标准</strong>的场景。它将云凭证的管理从集群内部转移到集群外部，实现了更严格的权限控制：</p>

<ul>
  <li><strong>实现短期凭证</strong>：为 AWS、GCP 等云平台配置基于 OIDC 的短期凭证（如 AWS STS、GCP Workload Identity）。集群组件使用这些临时令牌来访问云 API，凭证会自动轮换，风险更低。</li>
  <li><strong>避免存储管理员凭证</strong>：在手动模式下，集群的 <code class="language-plaintext highlighter-rouge">kube-system</code> 命名空间中不会存储高权限的管理员级云凭证，大大降低了凭证泄露的风险。</li>
  <li><strong>管理长期凭证</strong>：对于 IBM Cloud 或 Nutanix 等平台，<code class="language-plaintext highlighter-rouge">ccoctl</code> 也用于在安装过程中配置由外部管理的长期凭证。</li>
  <li><strong>清理资源</strong>：在集群卸载后，可以使用 <code class="language-plaintext highlighter-rouge">ccoctl</code> 来删除它在安装时创建的云资源（如 IAM 角色、OIDC 提供商和 S3 存储桶）。</li>
</ul>

<p><strong>简单来说</strong>：不使用 <code class="language-plaintext highlighter-rouge">ccoctl</code> 的安装过程更简单快捷，但安全性较低；使用 <code class="language-plaintext highlighter-rouge">ccoctl</code> 的过程更复杂，但安全性最高，符合企业级安全最佳实践。</p>

<h3 id="两种方式对比">两种方式对比</h3>

<table>
  <thead>
    <tr>
      <th style="text-align: left">对比维度</th>
      <th style="text-align: left">使用 <code class="language-plaintext highlighter-rouge">ccoctl</code> (手动模式 + 短期凭证)</th>
      <th style="text-align: left">不使用 <code class="language-plaintext highlighter-rouge">ccoctl</code> (默认 Mint 模式)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>核心机制</strong></td>
      <td style="text-align: left">基于 <strong>STS</strong> 的<strong>短期、动态令牌</strong>。集群组件通过 ServiceAccount 扮演 IAM 角色，自动获取定期刷新的临时凭证。</td>
      <td style="text-align: left">基于<strong>长期 Access Key</strong>。CCO 使用高权限的管理员凭证，为其他组件<strong>动态创建</strong>低权限的长期用户。</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>安全性</strong></td>
      <td style="text-align: left"><strong>最高</strong>。集群内部不存储任何长期有效的高风险凭证。</td>
      <td style="text-align: left"><strong>较高，但存在风险</strong>。高权限的管理员凭证在安装后默认会存储在 <code class="language-plaintext highlighter-rouge">kube-system</code> 命名空间中。</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>安装流程</strong></td>
      <td style="text-align: left"><strong>复杂</strong>。安装前需要手动执行 <code class="language-plaintext highlighter-rouge">ccoctl</code>，预先创建 OIDC、IAM 角色等，并将生成的清单提供给安装程序。</td>
      <td style="text-align: left"><strong>简单、自动化</strong>。只需在 <code class="language-plaintext highlighter-rouge">install-config.yaml</code> 中配置云凭证即可。</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>运维负担</strong></td>
      <td style="text-align: left">升级时若权限要求未变通常无需额外操作；权限有更新时需用 <code class="language-plaintext highlighter-rouge">ccoctl</code> 更新角色。</td>
      <td style="text-align: left">升级前需检查新版本 CredentialsRequest，确保管理员凭证权限充足。</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>集群销毁</strong></td>
      <td style="text-align: left">需使用 <code class="language-plaintext highlighter-rouge">ccoctl aws delete</code> 等<strong>手动清理</strong>预先创建的 IAM 和 OIDC 资源。</td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">openshift-install destroy cluster</code> 即可<strong>自动清理</strong>。</td>
    </tr>
  </tbody>
</table>

<p><strong>建议</strong>：有严格安全合规要求或希望采用最小权限原则时选用 <code class="language-plaintext highlighter-rouge">ccoctl</code>；开发测试、POC 或优先便利性时，默认 Mint 模式即可。</p>

<hr />

<h2 id="二如何使用-ccoctl">二、如何使用 ccoctl</h2>

<h3 id="获取-ccoctl-二进制">获取 ccoctl 二进制</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">RELEASE_IMAGE</span><span class="o">=</span><span class="si">$(</span>./openshift-install version | <span class="nb">awk</span> <span class="s1">'/release image/ {print $3}'</span><span class="si">)</span>
<span class="nv">CCO_IMAGE</span><span class="o">=</span><span class="si">$(</span>oc adm release info <span class="nt">--image-for</span><span class="o">=</span><span class="s1">'cloud-credential-operator'</span> <span class="nv">$RELEASE_IMAGE</span> <span class="nt">-a</span> ~/.pull-secret<span class="si">)</span>
oc image extract <span class="nv">$CCO_IMAGE</span> <span class="nt">--file</span><span class="o">=</span><span class="s2">"/usr/bin/ccoctl.rhel8"</span> <span class="nt">-a</span> ~/.pull-secret
<span class="nb">chmod </span>775 ccoctl.rhel8
./ccoctl <span class="nt">--help</span>
</code></pre></div></div>

<h3 id="主要场景为-aws-sts-集群创建资源">主要场景：为 AWS STS 集群创建资源</h3>

<ol>
  <li><strong>创建密钥对</strong>：<code class="language-plaintext highlighter-rouge">./ccoctl aws create-key-pair</code></li>
  <li><strong>创建 OIDC 身份提供商和 S3 存储桶</strong>：<code class="language-plaintext highlighter-rouge">./ccoctl aws create-identity-provider --name=&lt;cluster-name&gt; --region=&lt;aws-region&gt; --public-key-file=&lt;path-to-public-key&gt;</code></li>
  <li><strong>提取 CredentialsRequests</strong>：<code class="language-plaintext highlighter-rouge">oc adm release extract --credentials-requests --cloud=aws --to=./credrequests &lt;your-release-image&gt;</code></li>
  <li><strong>为每个组件创建 IAM 角色</strong>：<code class="language-plaintext highlighter-rouge">./ccoctl aws create-iam-roles --name=&lt;cluster-name&gt; --region=&lt;aws-region&gt; --credentials-requests-dir=./credrequests --identity-provider-arn=&lt;arn-of-oidc-provider&gt;</code></li>
</ol>

<p>完成后将生成的 manifest 复制到安装目录的 <code class="language-plaintext highlighter-rouge">manifests</code> 和 <code class="language-plaintext highlighter-rouge">tls</code> 目录。集群卸载后清理：<code class="language-plaintext highlighter-rouge">./ccoctl aws delete --name=&lt;cluster-name&gt; --region=&lt;aws-region&gt;</code>。</p>

<hr />

<h2 id="三sts-工作流程从准备到运行时">三、STS 工作流程：从准备到运行时</h2>

<p>这是云原生安全的最佳实践之一，结合 <strong>OIDC 身份联邦</strong>、<strong>Kubernetes ServiceAccount</strong> 和 <strong>云 IAM 角色</strong>，以 <strong>AWS STS</strong> 为例说明。</p>

<h3 id="核心架构概览双向信任链">核心架构概览（双向信任链）</h3>

<ol>
  <li><strong>OpenShift 集群信任 AWS</strong>：集群通过 OIDC 提供商对外宣称「我是谁」。</li>
  <li><strong>AWS 信任 OpenShift</strong>：IAM 角色配置为只信任特定的 OpenShift ServiceAccount。</li>
  <li><strong>组件自动换证</strong>：组件通过扮演角色获取临时令牌，无需人工干预。</li>
</ol>

<h3 id="前期准备ccoctl-搭建">前期准备（ccoctl 搭建）</h3>

<ol>
  <li><strong>创建 OIDC 提供商</strong>：<code class="language-plaintext highlighter-rouge">ccoctl</code> 在 AWS 上创建公钥端点（通常放在 S3），AWS 用其验证集群签发的 ServiceAccount 令牌。</li>
  <li><strong>创建 IAM 角色与信任策略</strong>：为每个需要云权限的组件创建一个 IAM 角色，信任策略只允许特定的 OpenShift ServiceAccount 扮演该角色。示例：</li>
</ol>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"Effect"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Allow"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"Principal"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"Federated"</span><span class="p">:</span><span class="w"> </span><span class="s2">"arn:aws:iam::123456789:oidc-provider/&lt;s3-bucket-name&gt;"</span><span class="w"> </span><span class="p">},</span><span class="w">
  </span><span class="nl">"Action"</span><span class="p">:</span><span class="w"> </span><span class="s2">"sts:AssumeRoleWithWebIdentity"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"Condition"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"StringEquals"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"&lt;s3-bucket-name&gt;:sub"</span><span class="p">:</span><span class="w"> </span><span class="s2">"system:serviceaccount:openshift-ingress:router"</span><span class="w"> </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<h3 id="集群运行时自动获取凭证">集群运行时（自动获取凭证）</h3>

<p>以 Ingress Controller Pod 为例：</p>

<ol>
  <li><strong>Pod 挂载带注解的 ServiceAccount</strong>：例如 <code class="language-plaintext highlighter-rouge">sts.amazonaws.com/role-arn: "arn:aws:iam::123456789:role/openshift-ingress-role"</code>，无 AWS Secret。</li>
  <li><strong>API Server 签发 JWT</strong>：Kubelet 向 API Server 请求为该 ServiceAccount 签发 JWT，签名私钥即 <code class="language-plaintext highlighter-rouge">ccoctl</code> 生成的那对密钥中的私钥。</li>
  <li><strong>Pod 向 AWS STS 发起 AssumeRoleWithWebIdentity</strong>：SDK 自动携带 JWT 与 Role ARN。</li>
  <li><strong>STS 验证并颁发临时凭证</strong>：验证 JWT 签名（用 OIDC 公钥）、校验 <code class="language-plaintext highlighter-rouge">sub</code> 与信任策略，通过后返回 <code class="language-plaintext highlighter-rouge">AccessKeyId</code>、<code class="language-plaintext highlighter-rouge">SecretAccessKey</code>、<code class="language-plaintext highlighter-rouge">SessionToken</code>（通常 1 小时有效）。</li>
  <li><strong>Pod 使用临时凭证调用 AWS API</strong>，过期前 SDK 自动用同一 JWT 换新凭证，对应用透明。</li>
</ol>

<h3 id="为什么这个模式更安全">为什么这个模式更安全？</h3>

<ul>
  <li><strong>无长期凭证</strong>：集群内没有永久有效的 AccessKey/SecretKey。</li>
  <li><strong>权限最小化</strong>：每个组件只能拿到自己 Role 的权限。</li>
  <li><strong>凭证自动轮换</strong>：泄露的临时凭证在 1 小时内失效。</li>
  <li><strong>身份绑定</strong>：凭证与特定 Pod/ServiceAccount 绑定，无法被集群外冒用。</li>
</ul>

<hr />

<h2 id="四技术细节公钥iam-role-数量与-sarole-关系">四、技术细节：公钥、IAM Role 数量与 SA/Role 关系</h2>

<h3 id="公钥端点签名验证不是数据加密">公钥端点：签名验证，不是数据加密</h3>

<p>使用的是<strong>私钥签名、公钥验签</strong>的数字签名过程：</p>

<ul>
  <li><strong>ccoctl</strong>：生成密钥对；<strong>私钥</strong>由集群 API Server 保管并用于<strong>签发</strong> JWT；<strong>公钥</strong>上传到 S3（OIDC 公钥端点）。</li>
  <li><strong>流程</strong>：API Server 用私钥对 JWT 签名 → AWS STS 从 OIDC 端点取公钥验签，确认令牌来自可信集群且未被篡改。核心是<strong>身份真实性验证</strong>，不是传输保密。</li>
</ul>

<h3 id="iam-role-数量约-1015-个">IAM Role 数量：约 10–15 个</h3>

<p><code class="language-plaintext highlighter-rouge">ccoctl</code> 为每个需要调用云 API 的组件创建一个独立 IAM Role，<strong>不使用</strong> IAM User。典型组件包括：Cluster API Provider、Image Registry、Ingress Controller、Storage (CSI)、Machine Config Operator、Cloud Network Config 等。</p>

<h3 id="serviceaccount-与-iam-role扮演与被扮演">ServiceAccount 与 IAM Role：扮演与被扮演</h3>

<ul>
  <li><strong>ServiceAccount</strong>：集群内「谁」在请求。</li>
  <li><strong>IAM Role</strong>：云上「可以做什么」的权限集合。</li>
  <li><strong>信任策略</strong>：规定只允许特定 OIDC 端点、特定 ServiceAccount（如 <code class="language-plaintext highlighter-rouge">openshift-ingress:router</code>）来扮演该 Role。SA 拿 JWT 来「敲门」，Role 验证通过后才允许暂时扮演。</li>
</ul>

<h3 id="整体关系与流程图">整体关系与流程图</h3>

<pre><code class="language-mermaid">graph TD
    subgraph pre_ccoctl["安装前 (由 ccoctl 执行)"]
        A[ccoctl] --&gt; B1(生成密钥对)
        A --&gt; B2(为每个组件创建 IAM Role)
        A --&gt; B3(上传公钥到 S3 桶)
        A --&gt; B4(在 AWS 创建 OIDC IdP&lt;br/&gt;指向 S3 桶)
    end

    subgraph aws_cloud["AWS 云平台"]
        S3["S3 桶 (公钥端点)"] --&gt; OIDC["AWS OIDC 身份提供商"]
        subgraph iam["IAM"]
            direction LR
            Role_Ingress["IAM Role Ingress&lt;br/&gt;信任策略: 限定 ServiceAccount"]
            Role_Registry["IAM Role Registry&lt;br/&gt;信任策略: 限定 ServiceAccount"]
        end
    end

    subgraph ocp["OpenShift 集群"]
        direction TB
        API["Kubernetes API Server&lt;br/&gt;持有 私钥"] --&gt; SA_Ingress["ServiceAccount: router&lt;br/&gt;Annotation: role-arn=Role_Ingress"]
        SA_Registry["ServiceAccount: registry&lt;br/&gt;Annotation: role-arn=Role_Registry"]
        Pod_Ingress["Pod: Ingress Controller"] --&gt;|挂载| SA_Ingress
        Pod_Registry["Pod: Image Registry"] --&gt;|挂载| SA_Registry
    end

    subgraph runtime["运行时 (自动流程)"]
        direction LR
        Req1["Pod 请求 AWS API"] --&gt; Req2["SDK 自动读取 JWT 令牌"]
        Req2 --&gt; Req3["SDK 向 STS 发送 AssumeRoleWithWebIdentity 请求&lt;br/&gt;携带 JWT 令牌和 IAM Role ARN"]
    end

    subgraph sts_flow["AWS STS 验证与响应"]
        Val1["STS 接收请求"] --&gt; Val2["根据 JWT 的 iss 字段&lt;br/&gt;找到 OIDC IdP"]
        Val2 --&gt; Val3["OIDC IdP 从 S3 获取公钥&lt;br/&gt;验证 JWT 签名"]
        Val3 --&gt; Val4{验证通过?}
        Val4 --&gt;|是| Val5["校验 JWT 的 sub 字段&lt;br/&gt;是否匹配 IAM Role 信任策略"]
        Val5 --&gt;|是| Val6["STS 生成临时凭证&lt;br/&gt;(AK/SK/Token) 返回给 Pod"]
    end

    Pod_Ingress -.-&gt; Req1
    Val6 -.-&gt; Pod_Ingress
</code></pre>

<p>要点：私钥在集群内签名，公钥在 S3 供 AWS 验签；每个组件一个 Role；SA 通过 Annotation 与 Role 信任策略建立绑定；运行时 Pod 用 JWT 向 STS 申请扮演 Role 并获得临时凭证。</p>

<hr />

<h2 id="五jwt-令牌-vs-sts-临时凭证">五、JWT 令牌 vs STS 临时凭证</h2>

<p><strong>关系概括</strong>：JWT 是「身份证」，STS 临时凭证是「通行证」。Pod 先亮出身份证证明「我是谁」，再换取能真正调用云 API 的通行证。</p>

<h3 id="对比表">对比表</h3>

<table>
  <thead>
    <tr>
      <th style="text-align: left">维度</th>
      <th style="text-align: left">JWT 令牌</th>
      <th style="text-align: left">STS 临时凭证</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>颁发者</strong></td>
      <td style="text-align: left">Kubernetes API Server</td>
      <td style="text-align: left">AWS STS</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>用途</strong></td>
      <td style="text-align: left">向 AWS 证明身份（某某 ServiceAccount）</td>
      <td style="text-align: left">向 AWS 服务证明权限（有权调用哪些 API）</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>包含内容</strong></td>
      <td style="text-align: left">集群身份、Namespace、SA 名称、过期时间</td>
      <td style="text-align: left">临时 AccessKey、SecretKey、SessionToken、过期时间</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>有效期</strong></td>
      <td style="text-align: left">通常 1 小时（可配置）</td>
      <td style="text-align: left">通常 1 小时</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>是否直接调用 AWS API</strong></td>
      <td style="text-align: left">❌ 不能</td>
      <td style="text-align: left">✅ 能</td>
    </tr>
  </tbody>
</table>

<h3 id="类比机场安检">类比：机场安检</h3>

<table>
  <thead>
    <tr>
      <th style="text-align: left">现实场景</th>
      <th style="text-align: left">OpenShift + AWS</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">身份证</td>
      <td style="text-align: left"><strong>JWT 令牌</strong></td>
    </tr>
    <tr>
      <td style="text-align: left">公安局</td>
      <td style="text-align: left">Kubernetes API Server</td>
    </tr>
    <tr>
      <td style="text-align: left">登机牌</td>
      <td style="text-align: left"><strong>STS 临时凭证</strong></td>
    </tr>
    <tr>
      <td style="text-align: left">用登机牌登机</td>
      <td style="text-align: left">用临时凭证调用 AWS API</td>
    </tr>
  </tbody>
</table>

<h3 id="jwt-与-sts-关系mermaid">JWT 与 STS 关系（Mermaid）</h3>

<pre><code class="language-mermaid">graph TB
    subgraph "OpenShift 集群内部"
        Pod[Pod/容器]
        SA[ServiceAccount&lt;br/&gt;router]
        JWT[JWT 令牌&lt;br/&gt;身份证明]
        API[Kubernetes API Server&lt;br/&gt;持有私钥]
        Pod --&gt; SA
        SA -.-&gt;|挂载| JWT
        API --&gt;|用私钥签发| JWT
    end

    subgraph "AWS STS 验证过程"
        direction TB
        Step1[接收 AssumeRoleWithWebIdentity 请求&lt;br/&gt;携带 JWT 令牌 + IAM Role ARN]
        Step2[从 OIDC 端点获取公钥]
        Step3[用公钥验证 JWT 签名]
        Step4[检查 JWT.sub 是否匹配&lt;br/&gt;IAM Role 信任策略]
        Step5[验证通过 → 生成临时凭证]
        Step1 --&gt; Step2 --&gt; Step3 --&gt; Step4 --&gt; Step5
    end

    subgraph "STS 临时凭证"
        Temp_Cred[AccessKeyId + SecretAccessKey + SessionToken + Expiration]
    end

    JWT -.-&gt;|提交身份证明| Step1
    Step5 --&gt;|返回| Temp_Cred
    Temp_Cred --&gt;|SDK 缓存| Pod
</code></pre>

<h3 id="时序流程">时序流程</h3>

<pre><code class="language-mermaid">sequenceDiagram
    participant P as Pod
    participant K as K8s API Server
    participant STS as AWS STS
    participant S3 as S3/其他AWS服务

    K-&gt;&gt;P: 用私钥签发 JWT 并挂载到 Pod
    P-&gt;&gt;STS: AssumeRoleWithWebIdentity(JWT + Role ARN)
    STS-&gt;&gt;STS: 从 OIDC 获取公钥、验签、检查 sub
    STS--&gt;&gt;P: 返回临时凭证
    P-&gt;&gt;S3: 调用 API（带临时凭证）
    Note over P: 1 小时后 SDK 用同一 JWT 换新凭证
</code></pre>

<h3 id="三者的层级关系">三者的层级关系</h3>

<pre><code class="language-mermaid">graph RL
    subgraph "第4层：云资源操作"
        API_Calls[AWS API 调用&lt;br/&gt;S3/EC2/ELB]
    end
    subgraph "第3层：临时权限凭证"
        Temp_Cred[STS临时凭证&lt;br/&gt;1小时有效期]
    end
    subgraph "第2层：集群内身份"
        JWT[JWT令牌&lt;br/&gt;K8s API Server 签发]
    end
    subgraph "第1层：底层基础设施"
        KeyPair[密钥对&lt;br/&gt;私钥: K8s / 公钥: S3]
        IAM_Role[IAM Role&lt;br/&gt;权限策略+信任策略]
    end
    KeyPair --&gt;|私钥签发| JWT
    KeyPair --&gt;|公钥验证| Temp_Cred
    IAM_Role --&gt;|信任策略允许| Temp_Cred
    JWT --&gt;|证明身份换取| Temp_Cred
    Temp_Cred --&gt;|授权执行| API_Calls
</code></pre>

<h3 id="核心关系总结图示">核心关系总结（图示）</h3>

<pre><code class="language-mermaid">graph TD
    subgraph "JWT令牌 vs STS临时凭证"
        A[JWT令牌] --&gt;|作用| A1["证明身份&lt;br/&gt;'我是openshift-ingress:router'"]
        A --&gt;|颁发者| A2["Kubernetes API Server"]
        A --&gt;|验证者| A3["AWS STS"]
        A --&gt;|生命周期| A4["Pod生命周期&lt;br/&gt;只要Pod在就有效"]
        B[STS临时凭证] --&gt;|作用| B1["授权操作&lt;br/&gt;'我可以创建负载均衡器'"]
        B --&gt;|颁发者| B2["AWS STS"]
        B --&gt;|验证者| B3["AWS各服务&lt;br/&gt;(S3/EC2/ELB等)"]
        B --&gt;|生命周期| B4["1小时&lt;br/&gt;自动轮换"]
        A -.-&gt;|换取| B
    end
</code></pre>

<p><strong>一句话</strong>：JWT 管「你是谁」，临时凭证管「你能做什么」；两者职责分明，流程自动化。</p>

<hr />

<h2 id="六iam-role-的两种核心策略与权限来源">六、IAM Role 的两种核心策略与权限来源</h2>

<p><strong>重要</strong>：这些 IAM Role 的权限<strong>与 IAM User 无关</strong>，来自<strong>角色自身附加的权限策略</strong>。</p>

<h3 id="双重策略结构">双重策略结构</h3>

<p>每个 IAM Role 包含两个独立策略：</p>

<pre><code class="language-mermaid">graph TB
    subgraph IAM角色 [IAM Role]
        direction TB
        TP[信任策略 Trust Policy&lt;br/&gt;定义：谁可以扮演这个角色]
        PP[权限策略 Permission Policy&lt;br/&gt;定义：扮演后可以做什么]
    end
    TP --&gt; 验证[验证：你是谁？]
    PP --&gt; 授权[授权：你能做什么？]
</code></pre>

<ul>
  <li><strong>信任策略</strong>：只允许来自特定 OIDC 提供商、且 JWT 的 <code class="language-plaintext highlighter-rouge">sub</code> 为特定 ServiceAccount 的请求者扮演该角色。</li>
  <li><strong>权限策略</strong>：定义扮演后能执行哪些 AWS 操作（如 Image Registry 的 S3 操作）。</li>
</ul>

<p>在 ccoctl 模式下<strong>完全不使用 IAM User</strong>：ccoctl 为每个组件创建 IAM Role，直接附加权限策略与信任策略；Pod 通过 JWT 扮演角色，获得的是<strong>角色自身的权限</strong>。</p>

<h3 id="信任策略的完整工作机制">信任策略的完整工作机制</h3>

<pre><code class="language-mermaid">sequenceDiagram
    participant Pod as Pod (Ingress)
    participant STS as AWS STS
    participant Role as IAM Role (Ingress)

    Pod-&gt;&gt;STS: AssumeRoleWithWebIdentity(JWT + Role ARN)
    Note over STS: STS 验证 JWT 签名（用 OIDC 公钥）
    STS-&gt;&gt;Role: 读取 Role 的信任策略
    Note over STS: 检查 JWT 身份是否匹配信任策略
    alt 验证通过
        STS--&gt;&gt;Pod: 返回临时凭证
        Pod-&gt;&gt;AWS API: 用临时凭证调用 API
    else 验证失败
        STS--&gt;&gt;Pod: 拒绝请求
    end
</code></pre>

<p>信任策略在扮演阶段由 STS 验证；权限策略在实际调用各 AWS 服务时验证；两者独立且缺一不可。</p>

<h3 id="ccoctl-的权力从哪来">ccoctl 的权力从哪来？</h3>

<p><code class="language-plaintext highlighter-rouge">ccoctl</code> 的权力是<strong>被赋予</strong>的：运行 <code class="language-plaintext highlighter-rouge">ccoctl</code> 的人（或系统）必须提供一个具有足够 AWS 权限的 IAM 用户或角色（例如能调用 <code class="language-plaintext highlighter-rouge">iam:CreateRole</code>、<code class="language-plaintext highlighter-rouge">iam:CreateOpenIDConnectProvider</code>、<code class="language-plaintext highlighter-rouge">s3:PutObject</code> 等）。有了该凭证后，<code class="language-plaintext highlighter-rouge">ccoctl</code> 按 OpenShift 的 CredentialsRequest 为每个组件创建 IAM Role，并附加权限策略与信任策略。</p>

<pre><code class="language-mermaid">graph LR
    A[运行 ccoctl] --&gt; B[读取 CredentialsRequest]
    B --&gt; C[为每个组件准备 IAM Role]
    C --&gt; D[创建 IAM Role]
    D --&gt; E[附加权限策略]
    D --&gt; F[配置信任策略]
</code></pre>

<hr />

<h2 id="七运行态集群的权限闭环与安全实践">七、运行态集群的权限闭环与安全实践</h2>

<p>用于运行 <code class="language-plaintext highlighter-rouge">ccoctl</code> 的高权限用户，在集群安装完成后<strong>可以彻底退出</strong>：集群运行时<strong>不需要、也不会用到</strong>该用户的凭证。</p>

<h3 id="运行态权限闭环">运行态权限闭环</h3>

<pre><code class="language-mermaid">graph TB
    subgraph "安装阶段 (ccoctl 执行)"
        Admin[管理员持有&lt;br/&gt;高权限 IAM User]
        Admin --&gt;|执行 ccoctl| Create[创建所有 IAM Role&lt;br/&gt;和 OIDC 提供商]
        Create --&gt;|完成后| Done[高权限用户凭证&lt;br/&gt;可以安全删除或封存]
    end
    subgraph "运行阶段 (集群运行时)"
        Pod[Pod] --&gt;|JWT 令牌| STS[AWS STS]
        STS --&gt;|验证通过后颁发| Temp[临时凭证]
        Temp --&gt;|权限来自| Role[IAM Role 自身的权限策略]
        Role --&gt;|不涉及| AdminUser[安装时的高权限用户]
    end
    Done -.-&gt;|不再使用| AdminUser
</code></pre>

<p>原因简要说明：权限已固化在各 IAM Role 的权限策略与信任策略中；集群组件通过 JWT → STS → 扮演 Role → 获得临时凭证 → 调用 API，整条链无需安装时的高权限用户；临时凭证的权限来自被扮演的 Role，而非创建 Role 的用户。</p>

<h3 id="安全最佳实践安装后清理">安全最佳实践：安装后清理</h3>

<pre><code class="language-mermaid">graph LR
    subgraph "安装完成后"
        A[高权限 IAM User] --&gt; B{选择处理方式}
        B --&gt; C[彻底删除该用户]
        B --&gt; D[禁用 Access Key]
        B --&gt; E[轮换并封存凭证&lt;br/&gt;仅用于灾难恢复]
    end
    subgraph "运行态集群"
        F[所有组件] --&gt; G[只使用临时凭证]
        H[管理员日常操作] --&gt; I[使用更低权限的&lt;br/&gt;只读或审计账号]
    end
</code></pre>

<p>建议：安装成功后立即删除或禁用该高权限用户的 Access Key；日常管理使用只读或审计账号；必要时可用 STS 临时凭证来运行 <code class="language-plaintext highlighter-rouge">ccoctl</code>，避免长期高权限用户。</p>

<h3 id="与-mint-模式对比">与 Mint 模式对比</h3>

<table>
  <thead>
    <tr>
      <th style="text-align: left">对比项</th>
      <th style="text-align: left">Mint 模式</th>
      <th style="text-align: left">ccoctl + STS 模式</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">安装时高权限用户</td>
      <td style="text-align: left">安装后默认<strong>保留</strong>在 <code class="language-plaintext highlighter-rouge">kube-system</code> Secret 中</td>
      <td style="text-align: left">安装后<strong>可安全删除</strong></td>
    </tr>
    <tr>
      <td style="text-align: left">凭证类型</td>
      <td style="text-align: left">长期 AccessKey/SecretKey</td>
      <td style="text-align: left">1 小时自动轮换的临时凭证</td>
    </tr>
    <tr>
      <td style="text-align: left">泄露风险</td>
      <td style="text-align: left">攻击者可提取长期凭证</td>
      <td style="text-align: left">仅能获得短期凭证，且无法提取长期凭证</td>
    </tr>
    <tr>
      <td style="text-align: left">权限范围</td>
      <td style="text-align: left">常为全局管理员权限</td>
      <td style="text-align: left">每组件最小必要权限</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="八kube-apiserver-与-oidc">八、kube-apiserver 与 OIDC</h2>

<p>JWT 的签发与公钥提供由 <strong>kube-apiserver</strong> 完成：</p>

<ol>
  <li><strong>持有私钥</strong>：加载 <code class="language-plaintext highlighter-rouge">ccoctl</code> 生成并交给集群的私钥（如 <code class="language-plaintext highlighter-rouge">/etc/kubernetes/pki/sa.key</code>）。</li>
  <li><strong>签发 JWT</strong>：当 Pod 挂载 ServiceAccount 并请求令牌时，使用 TokenRequest API 签发<strong>绑定服务账户令牌</strong>（Bound Service Account Token）。</li>
  <li><strong>提供公钥端点</strong>：通过 <code class="language-plaintext highlighter-rouge">/.well-known/openid-configuration</code>、<code class="language-plaintext highlighter-rouge">/openid/v1/jwks</code> 等对外提供公钥，供 AWS STS 验签。</li>
</ol>

<pre><code class="language-mermaid">graph LR
    subgraph OpenShift 集群
        API[kube-apiserver]
        KeyPair[密钥对&lt;br/&gt;私钥: sa.key&lt;br/&gt;公钥: sa.pub]
        SA[ServiceAccount]
        Pod[Pod]
        Token[JWT 令牌]
        API -- 持有 --&gt; KeyPair
        SA -- 请求令牌 --&gt; API
        API -- 使用私钥签发 --&gt; Token
        Token -- 挂载到 --&gt; Pod
    end
    subgraph 外部
        STS[AWS STS]
        JWKS_Endpoint[JWKS 公钥端点]
    end
    API -- 提供公钥 --&gt; JWKS_Endpoint
    Pod -- 发送 JWT 令牌 --&gt; STS
    STS -- 从端点获取公钥、验签 --&gt; STS
</code></pre>

<p>kubelet 会监控挂载的短期令牌有效期，在过期前通过 TokenRequest API 向 apiserver 请求新令牌，实现无缝轮换。</p>

<hr />

<h2 id="九aws-对-oidc-的支持与标准化">九、AWS 对 OIDC 的支持与标准化</h2>

<p>AWS <strong>主动实现了 OIDC 开放标准</strong>，才能与 Kubernetes/OpenShift 无缝集成。</p>

<h3 id="aws-对-oidc-的支持">AWS 对 OIDC 的支持</h3>

<pre><code class="language-mermaid">graph TB
    subgraph oidc_open_std["开放标准 OIDC"]
        OIDC_Core["OIDC 核心规范"]
        OIDC_Core --&gt;|定义| JWKS[JWKS 公钥格式]
        OIDC_Core --&gt;|定义| JWT[JWT 令牌结构]
    end
    subgraph k8s_ocp["Kubernetes/OpenShift"]
        K8s[实现 OIDC 身份提供商]
        K8s --&gt;|提供| K8s_JWKS[公钥端点]
        K8s --&gt;|签发| K8s_JWT[JWT 令牌]
    end
    subgraph aws_fed["AWS"]
        AWS_Federation["AWS 实现 OIDC 身份联邦"]
        AWS_Federation --&gt;|支持| IAM_OIDC[IAM OIDC 身份提供商]
        AWS_Federation --&gt;|支持| STS_OIDC[STS AssumeRoleWithWebIdentity]
        AWS_Federation --&gt;|验证| AWS_Validation[OIDC 令牌验证]
    end
    K8s_JWKS --&gt; IAM_OIDC
    K8s_JWT --&gt; STS_OIDC
    STS_OIDC --&gt; AWS_Validation
</code></pre>

<p>AWS 的三件关键实现：① IAM OIDC 身份提供商（ccoctl 在 AWS 上创建）；② STS <code class="language-plaintext highlighter-rouge">AssumeRoleWithWebIdentity</code> API；③ IAM Role 信任策略中的 OIDC 条件键（如 <code class="language-plaintext highlighter-rouge">sub</code>、<code class="language-plaintext highlighter-rouge">aud</code>、<code class="language-plaintext highlighter-rouge">iss</code>）。GCP、Azure、阿里云等也支持类似 OIDC 联邦，逻辑一致：集群提供 OIDC 端点 → 云厂商创建 OIDC IdP → 角色配置信任策略 → Pod 用 JWT 换临时凭证。</p>

<h3 id="标准化的价值">标准化的价值</h3>

<pre><code class="language-mermaid">graph TD
    subgraph k8s_clusters["Kubernetes 集群"]
        Cluster1[OpenShift 集群]
        Cluster2[原生 K8s]
        Cluster3[其他发行版]
    end
    subgraph oidc_norm["OIDC 标准"]
        OIDC_Spec["OIDC 核心规范"]
    end
    subgraph cloud_vendors["云厂商"]
        Cloud_AWS[AWS]
        Cloud_GCP[GCP]
        Cloud_Azure[Azure]
    end
    Cluster1 --&gt; OIDC_Spec
    Cluster2 --&gt; OIDC_Spec
    Cluster3 --&gt; OIDC_Spec
    OIDC_Spec --&gt; Cloud_AWS
    OIDC_Spec --&gt; Cloud_GCP
    OIDC_Spec --&gt; Cloud_Azure
</code></pre>

<hr />

<h2 id="十短期-vs-长期凭证与数据格式">十、短期 vs 长期凭证与数据格式</h2>

<h3 id="短期与长期凭证对比">短期与长期凭证对比</h3>

<table>
  <thead>
    <tr>
      <th style="text-align: left">维度</th>
      <th style="text-align: left">普通版本 (Mint/手动)</th>
      <th style="text-align: left">STS 版本 (ccoctl)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left"><strong>凭证类型</strong></td>
      <td style="text-align: left"><strong>长期</strong> AccessKey/SecretKey</td>
      <td style="text-align: left"><strong>短期</strong> STS 临时凭证</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>凭证来源</strong></td>
      <td style="text-align: left">CCO 用管理员凭证创建 IAM <strong>User</strong></td>
      <td style="text-align: left">Pod 用 JWT <strong>扮演</strong> IAM <strong>Role</strong></td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>有效期</strong></td>
      <td style="text-align: left"><strong>永久有效</strong>（除非手动轮转）</td>
      <td style="text-align: left"><strong>1 小时</strong>，自动轮换</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>集群内凭证</strong></td>
      <td style="text-align: left">存在于 <code class="language-plaintext highlighter-rouge">kube-system</code> Secret</td>
      <td style="text-align: left"><strong>零</strong>长期凭证</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>凭证数量</strong></td>
      <td style="text-align: left">11+ 个（1 个高权限 + 约 10 个组件用户）</td>
      <td style="text-align: left">0 个用户，约 10 个可扮演的 Role</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>泄露影响</strong></td>
      <td style="text-align: left">严重且持久</td>
      <td style="text-align: left">有限且短暂（1 小时内失效）</td>
    </tr>
  </tbody>
</table>

<p>安装阶段 STS 版本需要更多权限（创建 OIDC、IAM Role 等）是一次性「建设成本」；运行阶段普通版本长期存在的多凭证才是安全命门。STS 用安装时短暂的「多」，换运行时永久的「少」和「短」。</p>

<h3 id="数据格式对比">数据格式对比</h3>

<ul>
  <li><strong>长期凭证</strong>：2 个字段 — <code class="language-plaintext highlighter-rouge">AccessKeyId</code>（以 <code class="language-plaintext highlighter-rouge">AKIA</code> 开头）、<code class="language-plaintext highlighter-rouge">SecretAccessKey</code>；无过期时间。</li>
  <li><strong>临时凭证</strong>：3 个字段 — <code class="language-plaintext highlighter-rouge">AccessKeyId</code>（以 <code class="language-plaintext highlighter-rouge">ASIA</code> 开头）、<code class="language-plaintext highlighter-rouge">SecretAccessKey</code>、<strong><code class="language-plaintext highlighter-rouge">SessionToken</code></strong>，以及 <code class="language-plaintext highlighter-rouge">Expiration</code>。调用 AWS API 时<strong>必须同时携带</strong>三者；缺少 <code class="language-plaintext highlighter-rouge">SessionToken</code> 会被拒绝。</li>
</ul>

<p><code class="language-plaintext highlighter-rouge">SessionToken</code> 是 STS 临时凭证的关键：证明凭证由 STS 合法颁发、在有效期内且未超出权限范围。</p>

<h3 id="sessiontoken-的验证">SessionToken 的验证</h3>

<p>验证是「接力」的：<strong>签发</strong>由 STS 在 AssumeRoleWithWebIdentity 时完成；<strong>每次 API 调用</strong>时，目标服务（如 S3、EC2）将凭证转交 AWS 统一认证系统，检查 SessionToken 是否合法、未过期、未吊销，并评估权限与条件。这样既能及时吊销，又保证审计与动态条件评估有效。</p>

<p>每次 API 调用时，AWS 内部的验证流程可概括为：</p>

<pre><code class="language-mermaid">graph TD
    A[收到API请求] --&gt; B[提取凭证 AccessKey + SessionToken]
    B --&gt; C{SessionToken存在?}
    C --&gt;|否| D[当作长期凭证处理]
    C --&gt;|是| E[查询STS服务端状态]
    E --&gt; F{凭证状态}
    F --&gt;|已过期| G[拒绝请求 ExpiredToken]
    F --&gt;|已吊销| H[拒绝请求 AccessDenied]
    F --&gt;|有效| I[继续验证]
    I --&gt; J[验证请求签名]
    J --&gt; K{签名正确?}
    K --&gt;|否| L[拒绝请求]
    K --&gt;|是| M[评估IAM权限]
    M --&gt; N{操作允许?}
    N --&gt;|否| O[拒绝请求 AccessDenied]
    N --&gt;|是| P[执行操作]
</code></pre>

<h3 id="sessiontoken-验证时序">SessionToken 验证时序</h3>

<pre><code class="language-mermaid">sequenceDiagram
    participant P as Pod
    participant STS as AWS STS
    participant S3 as AWS S3
    participant Auth as AWS 统一认证系统

    P-&gt;&gt;STS: AssumeRoleWithWebIdentity(JWT)
    STS--&gt;&gt;P: 返回临时凭证(含SessionToken)
    STS-&gt;&gt;Auth: 同步凭证状态(有效期/权限)

    P-&gt;&gt;S3: PutObject(带临时凭证)
    S3-&gt;&gt;Auth: 请求验证凭证
    Auth-&gt;&gt;Auth: 检查SessionToken有效性及权限
    Auth--&gt;&gt;S3: 验证结果
    S3--&gt;&gt;P: 操作结果
</code></pre>

<hr />

<h2 id="十一缓存与验证不会每次调用-assumerole">十一、缓存与验证：不会每次调用 AssumeRole</h2>

<p>集群<strong>不会每次访问都调用 AssumeRole</strong>。凭证使用是<strong>一次 AssumeRole，多次使用</strong>，并有过期前自动刷新。</p>

<h3 id="缓存机制">缓存机制</h3>

<pre><code class="language-mermaid">graph TD
    subgraph "第1层：Pod 内 SDK 缓存"
        A[Pod 首次调用 AWS API] --&gt; B[SDK 检查内存缓存]
        B --&gt;|无有效凭证| C[调用 AssumeRoleWithWebIdentity]
        C --&gt; D[STS 返回临时凭证，有效期 1 小时]
        D --&gt; E[SDK 将凭证缓存到内存]
        E --&gt; F[使用凭证调用目标 API]
    end
    subgraph "第2层：后续 API 调用"
        G[后续 API 请求] --&gt; H[SDK 检查内存缓存]
        H --&gt;|有有效凭证| I[直接使用缓存凭证]
        I --&gt; J[调用目标 API]
    end
    subgraph "第3层：过期前自动刷新"
        K[凭证剩余约 5 分钟] --&gt; L[SDK 异步刷新]
        L --&gt; M[后台 AssumeRoleWithWebIdentity]
        M --&gt; N[更新内存缓存]
        N --&gt; O[对应用完全透明]
    end
</code></pre>

<p>因此：<strong>是否每次访问都 AssumeRole？</strong> 否，只有首次（或过期后）才调用。<strong>凭证用多久？</strong> 1 小时，SDK 在过期前约 5 分钟自动刷新。<strong>性能影响？</strong> 与长期凭证无异，绝大多数请求命中缓存。</p>

<h3 id="何时会重新-assumerole">何时会重新 AssumeRole？</h3>

<pre><code class="language-mermaid">graph LR
    A[触发重新 AssumeRole 的场景] --&gt; B[首次启动]
    A --&gt; C[凭证自然过期]
    A --&gt; D[Pod 重建]
    A --&gt; E[SDK 刷新失败后重试]
    A --&gt; F[强制刷新配置]
    B --&gt; G[需要新凭证]
    C --&gt; G
    D --&gt; G
    E --&gt; G
    F --&gt; G
    G --&gt; H[调用 AssumeRole]
</code></pre>

<p>即：首次启动、凭证自然过期、Pod 重建、SDK 刷新失败后重试、或显式配置强制刷新时。</p>

<h3 id="获取-vs-验证">获取 vs 验证</h3>

<p><strong>凭证获取</strong>有缓存（约 1 小时一次，由 SDK 管理）；<strong>凭证验证</strong>每次 API 调用都会进行（目标服务向 AWS 认证系统验证 SessionToken、签名与权限）。这样既能及时吊销、满足动态策略与审计，又通过服务端缓存和边缘节点将单次验证延迟控制在很低水平。</p>

<pre><code class="language-mermaid">graph TB
    subgraph "凭证生命周期"
        Get[获取凭证 AssumeRole] --&gt; Cache[缓存凭证 Pod 内存]
        Cache --&gt; Use1[第1次调用]
        Cache --&gt; Use2[第2次调用]
        Cache --&gt; Use3[第N次调用]
    end
    subgraph "每次调用时的验证"
        Use1 --&gt; Validate1[AWS 服务验证 SessionToken]
        Use2 --&gt; Validate2[AWS 服务验证 SessionToken]
        Use3 --&gt; ValidateN[AWS 服务验证 SessionToken]
    end
</code></pre>

<hr />

<h2 id="十二完整链条总览数据与端点">十二、完整链条总览：数据与端点</h2>

<h3 id="阶段概览">阶段概览</h3>

<ul>
  <li><strong>阶段 1（ccoctl）</strong>：提取 CredentialsRequest、生成密钥对、向 S3 上传 OIDC 配置与公钥、创建 OIDC 提供商、为每个组件创建 IAM Role（含信任策略与权限策略）、输出 Secret YAML。</li>
  <li><strong>阶段 2（运行时）</strong>：Pod 挂载 SA → apiserver 签发 JWT → Pod 首次调用时 SDK 向 STS 发送 AssumeRoleWithWebIdentity(JWT + Role ARN) → STS 从 OIDC 取公钥验签、检查 sub → 返回临时凭证 → Pod 用临时凭证调用各 AWS 服务。</li>
  <li><strong>阶段 3（每次调用）</strong>：目标服务将凭证交 AWS 统一认证验证 SessionToken、权限与条件。</li>
</ul>

<h3 id="完整时序图">完整时序图</h3>

<pre><code class="language-mermaid">sequenceDiagram
    participant Admin as 管理员
    participant CCO as ccoctl
    participant S3 as AWS S3
    participant IAM as AWS IAM
    participant OIDC as AWS OIDC Provider
    participant API as kube-apiserver
    participant Pod as Pod (Ingress)
    participant STS as AWS STS
    participant Service as AWS Service (ELB)

    Note over Admin,Service: 阶段1：安装准备
    Admin-&gt;&gt;CCO: 提取 CredentialsRequest、生成密钥对
    CCO-&gt;&gt;S3: PUT OIDC 配置与公钥
    CCO-&gt;&gt;IAM: CreateOpenIDConnectProvider、CreateRole、PutRolePolicy
    CCO-&gt;&gt;Admin: 输出 Secret YAML
    Admin-&gt;&gt;API: oc apply manifests

    Note over Admin,Service: 阶段2：集群运行时
    Pod-&gt;&gt;API: 挂载 ServiceAccount
    API--&gt;&gt;Pod: JWT 令牌
    Pod-&gt;&gt;STS: AssumeRoleWithWebIdentity(JWT + Role ARN)
    STS-&gt;&gt;OIDC: 获取公钥
    STS-&gt;&gt;STS: 验签、检查 sub
    STS--&gt;&gt;Pod: 临时凭证
    Pod-&gt;&gt;Service: 调用 AWS API(带临时凭证)
    Service-&gt;&gt;STS: 验证 SessionToken
    Service--&gt;&gt;Pod: 返回结果

    Note over Pod,Service: 过期前 SDK 后台用同一 JWT 换新凭证
</code></pre>

<h3 id="关键端点">关键端点</h3>

<table>
  <thead>
    <tr>
      <th style="text-align: left">阶段</th>
      <th style="text-align: left">端点类型</th>
      <th style="text-align: left">用途</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">安装</td>
      <td style="text-align: left">S3</td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">/.well-known/openid-configuration</code>、公钥（如 keys.json）</td>
    </tr>
    <tr>
      <td style="text-align: left">安装</td>
      <td style="text-align: left">IAM</td>
      <td style="text-align: left">创建 OIDC 提供商、IAM Role</td>
    </tr>
    <tr>
      <td style="text-align: left">运行时</td>
      <td style="text-align: left">kube-apiserver</td>
      <td style="text-align: left">签发 JWT（TokenRequest API）</td>
    </tr>
    <tr>
      <td style="text-align: left">运行时</td>
      <td style="text-align: left">STS</td>
      <td style="text-align: left"><code class="language-plaintext highlighter-rouge">AssumeRoleWithWebIdentity</code></td>
    </tr>
    <tr>
      <td style="text-align: left">运行时</td>
      <td style="text-align: left">S3/EC2/ELB 等</td>
      <td style="text-align: left">业务 API，每次请求验证 SessionToken</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="总结">总结</h2>

<ul>
  <li><strong>ccoctl</strong> 在手动模式下为各组件在云上创建并管理精细化、短期权限凭证，避免在集群内存储高权限长期凭证。</li>
  <li><strong>STS 流程</strong>：OIDC + ServiceAccount + IAM Role 形成双向信任；Pod 用 JWT 向 STS 证明身份，换取 1 小时有效的临时凭证；公钥用于验签，不涉及数据加密。</li>
  <li><strong>IAM Role</strong> 的权限来自角色自身的<strong>权限策略</strong>与<strong>信任策略</strong>，与 IAM User 无关；ccoctl 的执行权限来自运行它的管理员所持凭证。</li>
  <li><strong>运行时</strong>不需要安装时的高权限用户，建议安装后删除或禁用该用户，日常使用低权限账号。</li>
  <li><strong>JWT</strong> 管身份，<strong>STS 临时凭证</strong>管权限；临时凭证含 <strong>SessionToken</strong>，调用 API 必须携带；<strong>获取</strong>有 SDK 缓存（约 1 小时一次），<strong>验证</strong>每次请求都会进行。</li>
  <li><strong>kube-apiserver</strong> 签发 JWT 并暴露公钥；<strong>AWS</strong> 通过 OIDC 标准与 Kubernetes 对接，实现跨系统信任传递。</li>
</ul>

<p>整体可概括为：<strong>一次创建（ccoctl），多次使用（SDK 缓存），每次验证（SessionToken）</strong>，是 OpenShift 在公有云上实现最小权限与无长期凭证的典型生产级方式。</p>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">perf 与 eBPF：关系与「埋点」思路的演进</title><link href="https://weinan.io/2026/03/13/perf-ebpf-relationship-and-probing.html" rel="alternate" type="text/html" title="perf 与 eBPF：关系与「埋点」思路的演进" /><published>2026-03-13T00:00:00+00:00</published><updated>2026-03-13T00:00:00+00:00</updated><id>https://weinan.io/2026/03/13/perf-ebpf-relationship-and-probing</id><content type="html" xml:base="https://weinan.io/2026/03/13/perf-ebpf-relationship-and-probing.html"><![CDATA[<style>
/* Mermaid diagram container */
.mermaid-container {
  position: relative;
  display: block;
  cursor: pointer;
  transition: opacity 0.2s;
  max-width: 100%;
  margin: 20px 0;
  overflow-x: auto;
}

.mermaid-container:hover {
  opacity: 0.8;
}

.mermaid-container svg {
  max-width: 100%;
  height: auto;
  display: block;
}

/* Modal overlay */
.mermaid-modal {
  display: none;
  position: fixed;
  z-index: 9999;
  left: 0;
  top: 0;
  width: 100%;
  height: 100%;
  background-color: rgba(0, 0, 0, 0.9);
  animation: fadeIn 0.3s;
}

.mermaid-modal.active {
  display: flex;
  align-items: center;
  justify-content: center;
}

@keyframes fadeIn {
  from { opacity: 0; }
  to { opacity: 1; }
}

/* Modal content */
.mermaid-modal-content {
  position: relative;
  width: 90vw;
  height: 90vh;
  overflow: hidden;
  background: white;
  padding: 20px;
  border-radius: 8px;
  box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
  display: flex;
  align-items: center;
  justify-content: center;
}

.mermaid-modal-diagram {
  transform-origin: center center;
  transition: transform 0.2s ease;
  display: inline-block;
  min-width: 100%;
  cursor: grab;
  user-select: none;
}

.mermaid-modal-diagram.dragging {
  cursor: grabbing;
  transition: none;
}

.mermaid-modal-diagram svg {
  width: 100%;
  height: auto;
  display: block;
  pointer-events: none;
}

/* Control buttons */
.mermaid-controls {
  position: absolute;
  top: 10px;
  right: 10px;
  display: flex;
  gap: 8px;
  z-index: 10000;
}

.mermaid-btn {
  background: rgba(255, 255, 255, 0.9);
  border: 1px solid #ddd;
  border-radius: 4px;
  padding: 8px 12px;
  cursor: pointer;
  font-size: 14px;
  transition: background 0.2s;
  color: #333;
  font-weight: 500;
}

.mermaid-btn:hover {
  background: white;
  box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}

/* Close button */
.mermaid-close {
  background: #f44336;
  color: white;
  border: none;
}

.mermaid-close:hover {
  background: #d32f2f;
}

/* Zoom indicator */
.mermaid-zoom-level {
  position: absolute;
  bottom: 20px;
  left: 20px;
  background: rgba(0, 0, 0, 0.7);
  color: white;
  padding: 6px 12px;
  border-radius: 4px;
  font-size: 14px;
  z-index: 10000;
}
</style>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';

  mermaid.initialize({
    startOnLoad: false,
    theme: 'default',
    securityLevel: 'loose',
    htmlLabels: true,
    themeVariables: {
      fontSize: '14px'
    }
  });

  let currentZoom = 1;
  let currentModal = null;
  let isDragging = false;
  let startX = 0;
  let startY = 0;
  let translateX = 0;
  let translateY = 0;

  // Create modal HTML
  function createModal() {
    const modal = document.createElement('div');
    modal.className = 'mermaid-modal';
    modal.innerHTML = `
      <div class="mermaid-controls">
        <button class="mermaid-btn zoom-in">放大 +</button>
        <button class="mermaid-btn zoom-out">缩小 -</button>
        <button class="mermaid-btn zoom-reset">重置</button>
        <button class="mermaid-btn mermaid-close">关闭 ✕</button>
      </div>
      <div class="mermaid-modal-content">
        <div class="mermaid-modal-diagram"></div>
      </div>
      <div class="mermaid-zoom-level">100%</div>
    `;
    document.body.appendChild(modal);
    return modal;
  }

  // Show modal with diagram
  function showModal(diagramContent) {
    if (!currentModal) {
      currentModal = createModal();
      setupModalEvents();
    }

    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.innerHTML = diagramContent;

    // Remove any width/height attributes from SVG to make it responsive
    const svg = modalDiagram.querySelector('svg');
    if (svg) {
      svg.removeAttribute('width');
      svg.removeAttribute('height');
      svg.style.width = '100%';
      svg.style.height = 'auto';
    }

    // Setup drag functionality
    setupDrag(modalDiagram);

    currentModal.classList.add('active');
    currentZoom = 1;
    resetPosition();
    updateZoom();
    document.body.style.overflow = 'hidden';
  }

  // Hide modal
  function hideModal() {
    if (currentModal) {
      currentModal.classList.remove('active');
      document.body.style.overflow = '';
    }
  }

  // Update zoom level
  function updateZoom() {
    if (!currentModal) return;
    const diagram = currentModal.querySelector('.mermaid-modal-diagram');
    const zoomLevel = currentModal.querySelector('.mermaid-zoom-level');
    diagram.style.transform = `translate(${translateX}px, ${translateY}px) scale(${currentZoom})`;
    zoomLevel.textContent = `${Math.round(currentZoom * 100)}%`;
  }

  // Reset position when zoom changes
  function resetPosition() {
    translateX = 0;
    translateY = 0;
  }

  // Setup drag functionality
  function setupDrag(element) {
    element.addEventListener('mousedown', startDrag);
    element.addEventListener('touchstart', startDrag);
  }

  function startDrag(e) {
    if (e.type === 'mousedown' && e.button !== 0) return; // Only left click

    isDragging = true;
    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.classList.add('dragging');

    if (e.type === 'touchstart') {
      startX = e.touches[0].clientX - translateX;
      startY = e.touches[0].clientY - translateY;
    } else {
      startX = e.clientX - translateX;
      startY = e.clientY - translateY;
    }

    document.addEventListener('mousemove', drag);
    document.addEventListener('touchmove', drag);
    document.addEventListener('mouseup', stopDrag);
    document.addEventListener('touchend', stopDrag);
  }

  function drag(e) {
    if (!isDragging) return;
    e.preventDefault();

    if (e.type === 'touchmove') {
      translateX = e.touches[0].clientX - startX;
      translateY = e.touches[0].clientY - startY;
    } else {
      translateX = e.clientX - startX;
      translateY = e.clientY - startY;
    }

    updateZoom();
  }

  function stopDrag() {
    isDragging = false;
    const modalDiagram = currentModal?.querySelector('.mermaid-modal-diagram');
    if (modalDiagram) {
      modalDiagram.classList.remove('dragging');
    }
    document.removeEventListener('mousemove', drag);
    document.removeEventListener('touchmove', drag);
    document.removeEventListener('mouseup', stopDrag);
    document.removeEventListener('touchend', stopDrag);
  }

  // Setup modal event listeners
  function setupModalEvents() {
    if (!currentModal) return;

    // Close button
    currentModal.querySelector('.mermaid-close').addEventListener('click', hideModal);

    // Zoom buttons
    currentModal.querySelector('.zoom-in').addEventListener('click', () => {
      currentZoom = Math.min(currentZoom + 0.25, 3);
      updateZoom();
    });

    currentModal.querySelector('.zoom-out').addEventListener('click', () => {
      currentZoom = Math.max(currentZoom - 0.25, 0.5);
      updateZoom();
    });

    currentModal.querySelector('.zoom-reset').addEventListener('click', () => {
      currentZoom = 1;
      resetPosition();
      updateZoom();
    });

    // Close on background click
    currentModal.addEventListener('click', (e) => {
      if (e.target === currentModal) {
        hideModal();
      }
    });

    // Close on ESC key
    document.addEventListener('keydown', (e) => {
      if (e.key === 'Escape' && currentModal.classList.contains('active')) {
        hideModal();
      }
    });
  }

  // Convert Jekyll-rendered code blocks to mermaid divs
  document.addEventListener('DOMContentLoaded', async function() {
    const codeBlocks = document.querySelectorAll('code.language-mermaid');

    for (const codeBlock of codeBlocks) {
      const pre = codeBlock.parentElement;
      const container = document.createElement('div');
      container.className = 'mermaid-container';

      const mermaidDiv = document.createElement('div');
      mermaidDiv.className = 'mermaid';
      mermaidDiv.textContent = codeBlock.textContent;

      container.appendChild(mermaidDiv);
      pre.replaceWith(container);
    }

    // Render all mermaid diagrams
    try {
      await mermaid.run({
        querySelector: '.mermaid'
      });
      console.log('Mermaid diagrams rendered successfully');
    } catch (error) {
      console.error('Mermaid rendering error:', error);
    }

    // Add click handlers to rendered diagrams
    document.querySelectorAll('.mermaid-container').forEach((container, index) => {
      // Find the rendered SVG inside the container
      const svg = container.querySelector('svg');
      if (!svg) {
        console.warn(`No SVG found in container ${index}`);
        return;
      }

      // Make the container clickable
      container.style.cursor = 'pointer';
      container.title = '点击查看大图';

      container.addEventListener('click', function(e) {
        e.preventDefault();
        e.stopPropagation();

        // Clone the SVG for the modal
        const svgClone = svg.cloneNode(true);
        const tempDiv = document.createElement('div');
        tempDiv.appendChild(svgClone);

        console.log('Opening modal for diagram', index);
        showModal(tempDiv.innerHTML);
      });

      console.log(`Click handler added to diagram ${index}`);
    });
  });
</script>

<p>perf 子系统和 eBPF 并非两个孤立的子系统，而是<strong>共享基础设施、互相协作</strong>的伙伴。本文从两者在内核中的协作关系出发，再对比它们在「埋点」与数据处理思路上的本质区别，并对照主线内核代码做简要核对。</p>

<h2 id="一perf-与-ebpf-的紧密关系">一、perf 与 eBPF 的紧密关系</h2>

<h3 id="1-共享内核基础设施perf_events-是基石">1. 共享内核基础设施：perf_events 是基石</h3>

<p>eBPF 的很多核心功能都建立在 <code class="language-plaintext highlighter-rouge">perf_events</code> 子系统提供的机制之上；<code class="language-plaintext highlighter-rouge">perf_events</code> 为 eBPF 的高效数据输出和硬件性能计数读取提供了通道。</p>

<h4 id="数据输出通道bpf_map_type_perf_event_array">数据输出通道：BPF_MAP_TYPE_PERF_EVENT_ARRAY</h4>

<p>当 eBPF 程序需要向用户空间发送大量数据时（例如追踪每次系统调用的参数），通常不直接操作文件或网络，而是通过一类特殊的 eBPF Map——<code class="language-plaintext highlighter-rouge">BPF_MAP_TYPE_PERF_EVENT_ARRAY</code>。</p>

<ul>
  <li><strong>工作原理</strong>：该 Map 的每个元素对应一个 <code class="language-plaintext highlighter-rouge">perf_event</code> 的文件描述符。eBPF 程序通过辅助函数 <code class="language-plaintext highlighter-rouge">bpf_perf_event_output()</code> 把数据写入该 Map，内核会将这些数据写入对应 perf 事件的<strong>环形缓冲区（ring buffer）</strong>。</li>
  <li><strong>优势</strong>：复用 perf 子系统的内核-用户空间数据传输机制，实现无锁、高性能的数据通路，无需为 eBPF 再实现一套类似设施。</li>
</ul>

<p>内核中该 Map 类型的实现位于 <code class="language-plaintext highlighter-rouge">kernel/bpf/arraymap.c</code>，例如 <code class="language-plaintext highlighter-rouge">perf_event_array_map_ops</code>（<code class="language-plaintext highlighter-rouge">perf_event_fd_array_get_ptr</code> 等）负责将 Map 中的 fd 解析为 <code class="language-plaintext highlighter-rouge">perf_event</code> 指针并与 ring buffer 关联；<code class="language-plaintext highlighter-rouge">kernel/bpf/verifier.c</code> 与 <code class="language-plaintext highlighter-rouge">kernel/bpf/syscall.c</code> 中对 <code class="language-plaintext highlighter-rouge">BPF_MAP_TYPE_PERF_EVENT_ARRAY</code> 的校验与更新逻辑也与之对应。</p>

<h4 id="读取性能计数器bpf_perf_event_read-系列">读取性能计数器：bpf_perf_event_read 系列</h4>

<p>eBPF 程序还可以通过 perf 子系统读取性能数据。辅助函数 <code class="language-plaintext highlighter-rouge">bpf_perf_event_read()</code> 和 <code class="language-plaintext highlighter-rouge">bpf_perf_event_read_value()</code> 用于读取由 <code class="language-plaintext highlighter-rouge">perf_events</code> 管理的硬件性能计数器（如 CPU 周期、缓存未命中等）的值，从而在 eBPF 中把自定义追踪逻辑与底层硬件性能数据结合。例如 <code class="language-plaintext highlighter-rouge">tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c</code>、<code class="language-plaintext highlighter-rouge">bperf_leader.bpf.c</code> 中就有对 <code class="language-plaintext highlighter-rouge">bpf_perf_event_read_value()</code> 的典型用法。</p>

<h3 id="2-程序类型协作bpf_prog_type_perf_event">2. 程序类型协作：BPF_PROG_TYPE_PERF_EVENT</h3>

<p>内核定义了专门的 eBPF 程序类型 <code class="language-plaintext highlighter-rouge">BPF_PROG_TYPE_PERF_EVENT</code>，允许将 eBPF 程序直接附加到某个 perf 事件上。</p>

<ul>
  <li><strong>工作方式</strong>：通过 <code class="language-plaintext highlighter-rouge">perf_event_open()</code> 创建 perf 事件时，可以指定一个 eBPF 程序作为该事件的<strong>溢出处理函数（overflow handler）</strong>。当事件触发（例如性能计数器达到采样周期，或 tracepoint 被命中）时，内核会调用该 eBPF 程序。相关逻辑见 <code class="language-plaintext highlighter-rouge">kernel/events/core.c</code> 中的 <code class="language-plaintext highlighter-rouge">bpf_overflow_handler</code> 以及 <code class="language-plaintext highlighter-rouge">perf_event_attach_bpf_prog()</code>；<code class="language-plaintext highlighter-rouge">kernel/trace/bpf_trace.c</code> 中则实现了 <code class="language-plaintext highlighter-rouge">perf_event_attach_bpf_prog()</code> 的具体附加流程。</li>
  <li><strong>应用场景</strong>：可用于自定义、低开销的采样与分析，例如按 CPU 周期采样时在 eBPF 中记录调用栈或做过滤聚合，比传统 perf 采样更灵活。</li>
</ul>

<h3 id="3-用户空间工具整合从bpf-事件到bpf-脚手架">3. 用户空间工具整合：从「BPF 事件」到「BPF 脚手架」</h3>

<p>在用户空间工具 <code class="language-plaintext highlighter-rouge">perf</code> 中，与 eBPF 的集成方式也在演进。</p>

<ul>
  <li><strong>过去</strong>：<code class="language-plaintext highlighter-rouge">perf</code> 曾提供「BPF 事件」机制，允许将编译好的 eBPF 对象文件作为事件加载，但使用和维护成本较高。</li>
  <li><strong>现在</strong>：<code class="language-plaintext highlighter-rouge">perf</code> 更多采用 <strong>BPF skeleton</strong>（libbpf 生成的脚手架）来加载和附加 eBPF 程序。例如 <code class="language-plaintext highlighter-rouge">perf trace</code> 使用 <code class="language-plaintext highlighter-rouge">tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c</code> 等实现系统调用参数增强；<code class="language-plaintext highlighter-rouge">off_cpu.bpf.c</code>、<code class="language-plaintext highlighter-rouge">bpf_prog_profiler.bpf.c</code>、<code class="language-plaintext highlighter-rouge">bperf_leader.bpf.c</code> 等均使用 <code class="language-plaintext highlighter-rouge">BPF_MAP_TYPE_PERF_EVENT_ARRAY</code> 与 <code class="language-plaintext highlighter-rouge">bpf_perf_event_output()</code> / <code class="language-plaintext highlighter-rouge">bpf_perf_event_read_value()</code>，与内核实现一致。</li>
</ul>

<h3 id="4-安全与权限统一cap_perfmon">4. 安全与权限统一：CAP_PERFMON</h3>

<p>从权限模型看，perf 与 eBPF 的追踪能力由同一 capability 约束。内核在 <code class="language-plaintext highlighter-rouge">include/uapi/linux/capability.h</code> 中定义：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*
 * Allow system performance and observability privileged operations
 * using perf_events, i915_perf and other kernel subsystems
 */</span>
<span class="cp">#define CAP_PERFMON    38
</span></code></pre></div></div>

<p>同一文件中注释说明：<strong>CAP_PERFMON</strong> 与 <strong>CAP_BPF</strong> 共同用于放宽对追踪类 BPF 程序的限制（如指针转整数、部分 speculation 加固的绕过、<code class="language-plaintext highlighter-rouge">bpf_probe_read</code> / <code class="language-plaintext highlighter-rouge">bpf_trace_printk</code> 等），且「CAP_PERFMON and CAP_BPF are required to load tracing programs」。因此，拥有 <code class="language-plaintext highlighter-rouge">CAP_PERFMON</code> 的进程既可以做 perf 采样，也可以在具备 CAP_BPF 等条件下加载用于追踪的 eBPF 程序，两者在权限上统一。</p>

<h3 id="小结关系总览">小结：关系总览</h3>

<table>
  <thead>
    <tr>
      <th>关系层面</th>
      <th>描述</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>基础设施共享</strong></td>
      <td>eBPF 依赖 perf 的<strong>环形缓冲区</strong>和<strong>硬件计数器</strong>，通过 <code class="language-plaintext highlighter-rouge">BPF_MAP_TYPE_PERF_EVENT_ARRAY</code> 和 <code class="language-plaintext highlighter-rouge">bpf_perf_event_read</code> 等实现高效数据交互。</td>
    </tr>
    <tr>
      <td><strong>程序类型协作</strong></td>
      <td><code class="language-plaintext highlighter-rouge">BPF_PROG_TYPE_PERF_EVENT</code> 允许将 eBPF 程序作为 perf 事件的溢出处理器，实现自定义采样逻辑。</td>
    </tr>
    <tr>
      <td><strong>工具整合</strong></td>
      <td><code class="language-plaintext highlighter-rouge">perf</code> 从早期的「BPF 事件」演进为使用 libbpf 的 <strong>BPF skeleton</strong> 加载 eBPF 程序。</td>
    </tr>
    <tr>
      <td><strong>安全模型</strong></td>
      <td><code class="language-plaintext highlighter-rouge">CAP_PERFMON</code> 与 <code class="language-plaintext highlighter-rouge">CAP_BPF</code> 共同控制对 perf_events 与 eBPF 追踪能力的访问。</td>
    </tr>
  </tbody>
</table>

<p>下图概括 eBPF 与 perf 在内核中的架构关系、挂载点及数据通道（用户空间工具、系统调用、eBPF 核心与 Map、动态/静态探针、perf_events 子系统及与硬件的交互）：</p>

<pre><code class="language-mermaid">graph TB
    subgraph Userspace["用户空间 (Userspace)"]
        Tools["perf CLI / bpftrace / BCC"]
        Libs["libbpf / libbcc"]
    end

    subgraph Kernel["内核空间 (Kernel Space)"]
        subgraph Syscall["系统调用层"]
            BPF_Syscall["bpf() 系统调用"]
            Perf_Syscall["perf_event_open() 系统调用"]
        end

        subgraph BPF_Core["eBPF核心虚拟机"]
            Verifier["验证器 (Verifier)"]
            JIT["JIT编译器"]
            Helper["辅助函数 (Helper Funcs)"]
        end

        subgraph BPF_Maps["eBPF Map存储系统"]
            Hash_Map["Hash Map"]
            Array_Map["Array Map"]
            Perf_Array["Perf Event Array"]
            Ring_Buffer["Ring Buffer Map"]
        end

        subgraph BPF_Hooks["eBPF程序挂载点"]
            subgraph Dynamic["动态探针"]
                Kprobe["kprobe (内核函数)"]
                Uprobe["uprobe (用户函数)"]
            end

            subgraph Static["静态探针"]
                Tracepoint["tracepoint (内核静态点)"]
                USDT["USDT (用户静态点)"]
            end

            subgraph Network["网络钩子"]
                XDP["XDP (网卡驱动层)"]
                TC["TC (协议栈)"]
                Socket["Socket Filter"]
            end

            subgraph Perf_Collab["Perf协作层"]
                Perf_Event_Prog["BPF_PROG_TYPE_PERF_EVENT"]
            end
        end

        subgraph Perf_Subsystem["perf_events子系统"]
            Perf_RingBuffer["环形缓冲区 (Ring Buffer)"]
            Perf_PMU["硬件PMU计数器"]
            Perf_Events["软件事件计数"]
            Perf_Tracepoint["tracepoint管理"]
        end
    end

    subgraph Hardware["硬件层"]
        CPU["CPU (含PMU)"]
        NIC["网卡"]
        Memory["内存"]
    end

    Tools --&gt; Libs
    Libs --&gt; BPF_Syscall
    Libs --&gt; Perf_Syscall

    BPF_Syscall --&gt; BPF_Core
    Perf_Syscall --&gt; Perf_Subsystem

    BPF_Core --&gt; BPF_Maps
    BPF_Core --&gt; BPF_Hooks

    BPF_Hooks -.-&gt; Perf_Event_Prog
    Perf_Event_Prog --&gt; Perf_Subsystem

    Perf_Array -.-&gt; Perf_RingBuffer
    Ring_Buffer -.-&gt; Perf_RingBuffer

    Kprobe -.-&gt; |"动态插桩"| Kernel_Funcs["内核任意函数"]
    Uprobe -.-&gt; |"动态插桩"| User_Funcs["用户态任意函数"]
    Tracepoint -.-&gt; |"静态预埋"| Kernel_Points["内核预定义点"]
    USDT -.-&gt; |"静态预埋"| User_Points["用户态预定义点"]

    XDP -.-&gt; |"最早阶段"| NIC
    TC -.-&gt; |"协议栈入口"| Network_Stack["内核协议栈"]

    Perf_PMU --&gt; CPU
    Perf_PMU --&gt; Memory

    BPF_Maps --&gt; |"数据输出"| Libs
    Perf_RingBuffer --&gt; |"性能数据"| Libs
</code></pre>

<p>（若站点支持 Mermaid 渲染，上图会显示为流程图；否则会显示为代码块。）</p>

<p>下图从<strong>用户空间工具生态</strong>视角概括 perf、BCC、bpftrace、libbpf 如何通过 <code class="language-plaintext highlighter-rouge">bpf()</code> 系统调用进入内核 eBPF 子系统并最终作用到硬件：</p>

<pre><code class="language-mermaid">flowchart TD
    subgraph Userspace [用户空间工具生态]
        direction TB
        Tools["perf / 系统工具"] --&gt; |"直接调用"| Syscall
        BCC["BCC工具集&lt;br/&gt;(BPF Compiler Collection)"] --&gt; |"封装复杂逻辑&lt;br/&gt;提供70+现成工具"| Syscall
        bpftrace["bpftrace&lt;br/&gt;(高阶层级语言)"] --&gt; |"基于BCC/libbpf&lt;br/&gt;提供脚本语言"| Syscall
        Libbpf["libbpf&lt;br/&gt;(C库，支持CO-RE)"] --&gt; |"轻量级库&lt;br/&gt;直接控制"| Syscall
    end

    subgraph Kernel [内核空间]
        Syscall["bpf() 系统调用"]
        Syscall --&gt; BPFSubsys["eBPF子系统&lt;br/&gt;(验证器、JIT、辅助函数)"]
        BPFSubsys --&gt; Hooks["挂载点&lt;br/&gt;(kprobe/uprobe/tracepoint/...)"]
    end

    subgraph Hardware [硬件]
        CPU["CPU (含PMU)"]
        Mem["内存"]
        Dev["设备"]
    end

    Hooks --&gt; Hardware

    style BCC fill:#e1f5fe,stroke:#01579b
    style bpftrace fill:#fff3e0,stroke:#e65100
    style Libbpf fill:#f3e5f5,stroke:#4a148c
    style Syscall fill:#e8e8e8,stroke:#666
</code></pre>

<hr />

<h2 id="二埋点思路的演进预制传感器-vs-可编程探头">二、「埋点」思路的演进：预制传感器 vs 可编程探头</h2>

<p>「埋点」是两者工作的基础，但<strong>埋点方式与后续数据处理思路有本质区别</strong>。</p>

<ul>
  <li><strong>传统方式（含 perf 的多数功能）</strong>：像在内核里预先装好一批<strong>固定的、功能单一的传感器</strong>，需要什么数据就去读对应传感器的读数。</li>
  <li><strong>eBPF 方式</strong>：像提供一种可<strong>安全、动态挂载并可编程的探头</strong>，可以自己决定测什么、怎么测、以及在内核里做哪些初步处理。</li>
</ul>

<h3 id="1-什么是埋点">1. 什么是「埋点」？</h3>

<p>无论是 perf 还是 eBPF，核心都是在<strong>内核（及用户态）关键路径上放置探测点</strong>，在事件发生时（系统调用、网络包、函数调用等）采集信息。这些探测点是可观测性的数据源。</p>

<h3 id="2-perf-的埋点思路预制传感器">2. perf 的埋点思路：预制传感器</h3>

<p>perf 主要利用<strong>已有</strong>的事件源与埋点：</p>

<ul>
  <li><strong>硬件事件</strong>：利用 CPU 的 <strong>PMU（Performance Monitoring Unit）</strong> 等硬件计数器，统计周期、缓存未命中、分支预测失败等；perf 负责配置与读取。</li>
  <li><strong>软件事件</strong>：内核维护的统计（如上下文切换、缺页等），perf 直接读取。</li>
  <li><strong>Tracepoints（静态埋点）</strong>：内核在关键路径上预先放置的静态探测点（系统调用入口/出口、调度、文件系统等），位置和格式在编译期确定；perf 通过启用这些 tracepoint 采集数据。</li>
</ul>

<p>perf 的角色更接近「仪表盘操作员」：知道所有预制传感器在哪里、如何读，并以较低开销（尤其是采样）汇总成报告。</p>

<h3 id="3-kprobe-与-uprobe机制与内核支持">3. kprobe 与 uprobe：机制与内核支持</h3>

<p>eBPF 的「动态埋点」能力建立在内核的 <strong>kprobe</strong> 与 <strong>uprobe</strong> 机制之上。二者允许在<strong>不重新编译内核或目标程序</strong>的前提下，在运行时把探测点挂在任意内核函数或用户态地址上，下面结合主线内核代码说明其含义与实现要点。</p>

<h4 id="kprobekernel-probe">kprobe（Kernel Probe）</h4>

<p><strong>kprobe</strong> 用于在内核任意函数（或指定偏移）处插入探测。调用方只需提供<strong>符号名</strong>（如 <code class="language-plaintext highlighter-rouge">do_sys_open</code>）或「模块 + 偏移」；内核在<strong>注册时</strong>通过 <code class="language-plaintext highlighter-rouge">kallsyms_lookup_name()</code>（见 <code class="language-plaintext highlighter-rouge">kernel/kprobes.c</code>）解析出该符号的地址，无需在编译期固定探测位置。</p>

<ul>
  <li><strong>为何是「动态」</strong>：探测地址在 <strong>register_kprobe()</strong> 时才确定。内核维护符号表（kallsyms），可加载模块的符号在模块加载后也可解析；因此可以在不改源码、不重启的前提下，对当前运行内核的任意已导出或可见符号下 probe。</li>
  <li><strong>内核做了哪些支持</strong>：
    <ul>
      <li><strong>插桩方式</strong>：在探测地址处把<strong>第一条指令</strong>替换为架构相关的断点指令（如 x86 的 <strong>INT3</strong>，arm64 的 <strong>BRK</strong>）。<code class="language-plaintext highlighter-rouge">arch_arm_kprobe()</code> / <code class="language-plaintext highlighter-rouge">arch_disarm_kprobe()</code> 负责写入/恢复（见 <code class="language-plaintext highlighter-rouge">arch/x86/kernel/kprobes/core.c</code>：<code class="language-plaintext highlighter-rouge">text_poke(p-&gt;addr, &amp;int3, 1)</code> 与恢复 <code class="language-plaintext highlighter-rouge">p-&gt;opcode</code>）。</li>
      <li><strong>原始指令执行</strong>：断点命中后，先执行注册的 handler（如 eBPF 程序），再<strong>单步执行被替换掉的那条指令</strong>。内核在可执行内存中为每条 kprobe 分配「指令槽」（<code class="language-plaintext highlighter-rouge">struct kprobe_insn_page</code>，见 <code class="language-plaintext highlighter-rouge">kernel/kprobes.c</code>），把原始指令拷贝到槽中执行，避免在运行时代码上直接执行可能受限于可执行页、相对寻址等约束。</li>
      <li><strong>优化路径（CONFIG_OPTPROBES）</strong>：部分架构还可将「断点 + 单步」优化为「跳转指令」，减少单步与 cache 失效的开销。</li>
    </ul>
  </li>
</ul>

<p>相关定义与流程集中在 <code class="language-plaintext highlighter-rouge">kernel/kprobes.c</code>（通用逻辑、哈希表 <code class="language-plaintext highlighter-rouge">kprobe_table</code>、注册/卸载）、<code class="language-plaintext highlighter-rouge">include/linux/kprobes.h</code>（<code class="language-plaintext highlighter-rouge">struct kprobe</code>：<code class="language-plaintext highlighter-rouge">addr</code>、<code class="language-plaintext highlighter-rouge">symbol_name</code>、<code class="language-plaintext highlighter-rouge">offset</code>、<code class="language-plaintext highlighter-rouge">opcode</code>、<code class="language-plaintext highlighter-rouge">ainsn</code> 等），以及各架构的 <code class="language-plaintext highlighter-rouge">arch/*/kernel/kprobes/</code>（如 <code class="language-plaintext highlighter-rouge">arch_arm_kprobe</code>、指令槽与单步）。</p>

<h4 id="uprobeuser-space-probe">uprobe（User-space Probe）</h4>

<p><strong>uprobe</strong> 用于在用户态程序的指定<strong>虚拟地址</strong>处插入探测。通常用「可执行文件 inode + 文件内偏移」或「path + offset」描述位置；同一偏移可对应多个已映射该文件的进程，内核会按 <strong>mmap</strong> 在各自地址空间写入断点。</p>

<ul>
  <li><strong>为何是「动态」</strong>：探测的「文件 + 偏移」在<strong>注册 uprobe 时</strong>指定，无需重新编译或替换用户程序。只要目标进程已将该文件映射为可执行，内核会在其对应 VMA 的虚拟地址上安装断点；新 fork 的进程若映射同一文件，也会在首次访问时通过 <strong>MMU notifier</strong> 等路径被插入断点（见 <code class="language-plaintext highlighter-rouge">kernel/events/uprobes.c</code> 中的 <code class="language-plaintext highlighter-rouge">install_breakpoint</code>、<code class="language-plaintext highlighter-rouge">set_swbp</code>）。</li>
  <li><strong>内核做了哪些支持</strong>：
    <ul>
      <li><strong>插桩方式</strong>：在用户空间对应页上写入架构的<strong>软断点</strong>（如 x86 的 INT3）。<code class="language-plaintext highlighter-rouge">set_swbp()</code> 通过 <code class="language-plaintext highlighter-rouge">uprobe_write_opcode()</code> 把断点写进目标 VMA；卸载时 <code class="language-plaintext highlighter-rouge">set_orig_insn()</code> 恢复原指令（<code class="language-plaintext highlighter-rouge">kernel/events/uprobes.c</code>）。</li>
      <li><strong>原始指令执行（XOL）</strong>：用户态不能像内核那样随意在任意可执行页单步「一条指令」而不影响相邻指令，因此 uprobe 使用 <strong>XOL（Execute Out of Line）</strong>：为每个被探测的进程维护一块<strong>专用可执行映射</strong>（<code class="language-plaintext highlighter-rouge">struct xol_area</code>，名如 <code class="language-plaintext highlighter-rouge">[uprobes]</code>），把「被替换掉的那条指令」拷贝到 XOL 槽中执行，执行完再回到原流程。见 <code class="language-plaintext highlighter-rouge">kernel/events/uprobes.c</code> 中的 <code class="language-plaintext highlighter-rouge">xol_area</code>、<code class="language-plaintext highlighter-rouge">xol_fault</code>、<code class="language-plaintext highlighter-rouge">xol_add_vma</code> 以及 <code class="language-plaintext highlighter-rouge">arch_uprobe_analyze_insn()</code> 对指令的分析与 ixol 的生成。</li>
    </ul>
  </li>
</ul>

<p>uprobe 的消费者通过 <code class="language-plaintext highlighter-rouge">struct uprobe_consumer</code>（<code class="language-plaintext highlighter-rouge">handler</code>、<code class="language-plaintext highlighter-rouge">ret_handler</code>、<code class="language-plaintext highlighter-rouge">filter</code>）挂到 <code class="language-plaintext highlighter-rouge">struct uprobe</code> 上；eBPF 等会复用这套基础设施，把 BPF 程序作为 consumer 挂到同一 uprobe。</p>

<h4 id="小结动态的含义与依赖">小结：动态的含义与依赖</h4>

<table>
  <thead>
    <tr>
      <th>机制</th>
      <th>探测对象</th>
      <th>「动态」体现</th>
      <th>内核关键支持</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>kprobe</strong></td>
      <td>内核函数（符号或地址）</td>
      <td>地址在 <strong>register_kprobe</strong> 时由 kallsyms 等解析，无需编译期埋点</td>
      <td>断点替换（arch_arm/disarm）、指令槽单步、可选跳转优化</td>
    </tr>
    <tr>
      <td><strong>uprobe</strong></td>
      <td>用户态（文件 + 偏移 → 各进程 VMA）</td>
      <td>在 <strong>register_uprobe</strong> 时指定 offset，按 mmap 在运行时插入断点</td>
      <td>用户态页写断点（set_swbp/set_orig_insn）、XOL 执行原指令</td>
    </tr>
  </tbody>
</table>

<p>eBPF 的 kprobe/uprobe 程序类型（如 <code class="language-plaintext highlighter-rouge">BPF_PROG_TYPE_KPROBE</code>）即是在上述机制之上，把「断点命中后的处理」换成经 verifier 校验的 BPF 字节码，从而在保持动态性的同时提供可编程、安全的内核/用户态探测能力<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">1</a></sup><sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">2</a></sup><sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">3</a></sup>。</p>

<h3 id="4-ebpf-的埋点思路可编程探头">4. eBPF 的埋点思路：可编程探头</h3>

<p>eBPF 在「埋点」上的不同在于<strong>动态与可编程</strong>：</p>

<ul>
  <li><strong>动态埋点（kprobe / uprobe）</strong>：若内核或应用没有现成探测点，eBPF 可以在<strong>任意内核函数（kprobe）或用户态函数（uprobe）</strong> 入口/出口动态挂载探测逻辑，无需改内核源码或重新部署固定 tracepoint（其机制见上一小节）。</li>
  <li><strong>复用现有埋点</strong>：eBPF 也可挂到现有 tracepoint 上；与 perf 不同的是，触发时不仅可以读预定义数据，还可以<strong>执行自定义逻辑</strong>做过滤、聚合、计算。</li>
  <li><strong>处理下放</strong>：perf 通常把原始或轻度聚合数据经 ring buffer 传到用户空间再由 <code class="language-plaintext highlighter-rouge">perf</code> 分析；eBPF 则允许<strong>把一部分处理逻辑放在内核</strong>（例如只统计延迟 &gt; 100ms 的请求、或在内核里算好直方图），仅把结果或关键数据交给用户空间，减少数据拷贝与上下文切换。</li>
</ul>

<h3 id="对比总结">对比总结</h3>

<table>
  <thead>
    <tr>
      <th>特性</th>
      <th>perf</th>
      <th>eBPF</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>埋点类型</strong></td>
      <td>主要依赖<strong>预制</strong>的硬件事件、软件事件和静态 tracepoint。</td>
      <td>既可用预制 tracepoint，更核心的是<strong>动态</strong> kprobe/uprobe。</td>
    </tr>
    <tr>
      <td><strong>数据处理</strong></td>
      <td>主要在<strong>用户空间</strong>；内核负责采集和输出原始/轻度聚合数据。</td>
      <td><strong>内核与用户空间协同</strong>；可在内核执行聚合、过滤、统计，只下发结果或关键数据。</td>
    </tr>
    <tr>
      <td><strong>灵活性</strong></td>
      <td>相对固定，只能获取预设格式的数据。</td>
      <td>高；可访问函数上下文、参数、返回值，并按需编写处理逻辑。</td>
    </tr>
    <tr>
      <td><strong>编程模型</strong></td>
      <td>通过命令行参数与预定义事件配置。</td>
      <td>用 C 等编写小程序，经内核验证后执行。</td>
    </tr>
  </tbody>
</table>

<p>因此，两者都建立在「埋点」之上，但 <strong>eBPF 的突破在于：在埋点之上增加了动态创建探测点、以及在内核中安全执行自定义处理逻辑的能力</strong>，从「读仪表」演进到「可编程探头」。</p>

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/kprobes.c">kernel/kprobes.c</a> - kprobe 通用逻辑：注册/卸载、kallsyms 解析、指令槽与 arm/disarm <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/kprobes/core.c">arch/x86/kernel/kprobes/core.c - arch_arm_kprobe / arch_disarm_kprobe</a> - x86 上 kprobe 断点写入与恢复（INT3 / text_poke） <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/events/uprobes.c">kernel/events/uprobes.c</a> - uprobe 实现：set_swbp/set_orig_insn、XOL（xol_area）、install_breakpoint <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Linux 内核 Rust 代码中 unsafe 使用场景统计分析</title><link href="https://weinan.io/2026/03/04/kernel-rust-unsafe-usage-analysis.html" rel="alternate" type="text/html" title="Linux 内核 Rust 代码中 unsafe 使用场景统计分析" /><published>2026-03-04T00:00:00+00:00</published><updated>2026-03-04T00:00:00+00:00</updated><id>https://weinan.io/2026/03/04/kernel-rust-unsafe-usage-analysis</id><content type="html" xml:base="https://weinan.io/2026/03/04/kernel-rust-unsafe-usage-analysis.html"><![CDATA[<p>与「只有调用 C 才需要 unsafe」的常见误解不同，但凡涉及硬件或与内核/硬件边界交互（如驱动、MMIO、DMA），在 Rust 里几乎必然要使用 <code class="language-plaintext highlighter-rouge">unsafe</code>，这与是否通过 FFI 调 C 无必然关系——例如 Embassy 等纯 Rust 裸机/驱动生态里，硬件相关操作同样大量集中在 <code class="language-plaintext highlighter-rouge">unsafe</code> 中。本文基于对主线 Linux 内核 <code class="language-plaintext highlighter-rouge">rust/</code> 目录的统计与代码抽样，归纳当前内核 Rust 中 <code class="language-plaintext highlighter-rouge">unsafe</code> 的实际使用场景，并辅以真实内核代码说明。</p>

<h2 id="统计概览">统计概览</h2>

<p>对主线内核 <code class="language-plaintext highlighter-rouge">rust/</code> 树使用 <code class="language-plaintext highlighter-rouge">cloc</code>、<code class="language-plaintext highlighter-rouge">ripgrep</code>（rg）统计的结果如下。</p>

<table>
  <thead>
    <tr>
      <th>项目</th>
      <th>数量</th>
      <th>说明</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Rust 源文件数</strong></td>
      <td>130</td>
      <td><code class="language-plaintext highlighter-rouge">find . -name '*.rs' \| wc -l</code></td>
    </tr>
    <tr>
      <td><strong>Rust 代码行数</strong></td>
      <td>16 987</td>
      <td>cloc 统计的 code 行（不含 3 471 blank、17 039 comment）</td>
    </tr>
    <tr>
      <td><strong><code class="language-plaintext highlighter-rouge">unsafe</code> 出现总次数</strong></td>
      <td>1 891</td>
      <td><code class="language-plaintext highlighter-rouge">rg -c '\bunsafe\b'</code> 各文件计数之和</td>
    </tr>
    <tr>
      <td><strong><code class="language-plaintext highlighter-rouge">unsafe { ... }</code> 块</strong></td>
      <td>1 252</td>
      <td><code class="language-plaintext highlighter-rouge">rg -c 'unsafe\s*\{'</code></td>
    </tr>
    <tr>
      <td><strong><code class="language-plaintext highlighter-rouge">unsafe fn</code></strong></td>
      <td>268</td>
      <td>含 <code class="language-plaintext highlighter-rouge">unsafe fn</code> 声明与 trait 中的 <code class="language-plaintext highlighter-rouge">unsafe fn</code></td>
    </tr>
    <tr>
      <td><strong><code class="language-plaintext highlighter-rouge">unsafe impl</code></strong></td>
      <td>90</td>
      <td> </td>
    </tr>
    <tr>
      <td><strong><code class="language-plaintext highlighter-rouge">unsafe trait</code></strong></td>
      <td>30</td>
      <td> </td>
    </tr>
    <tr>
      <td><strong><code class="language-plaintext highlighter-rouge">unsafe fn</code> / <code class="language-plaintext highlighter-rouge">unsafe impl</code> / <code class="language-plaintext highlighter-rouge">unsafe trait</code> 合计</strong></td>
      <td>388</td>
      <td>268 + 90 + 30</td>
    </tr>
    <tr>
      <td><strong><code class="language-plaintext highlighter-rouge">// SAFETY:</code> 注释</strong></td>
      <td>1 413</td>
      <td><code class="language-plaintext highlighter-rouge">rg -c '// SAFETY:'</code></td>
    </tr>
  </tbody>
</table>

<p>约 75% 的 <code class="language-plaintext highlighter-rouge">unsafe</code> 使用配有 <code class="language-plaintext highlighter-rouge">// SAFETY:</code> 说明（1 413 / 1 891 ≈ 74.7%），便于审查与维护。</p>

<h2 id="使用场景分类">使用场景分类</h2>

<h3 id="1-调用-c-内核-apiffi--bindings">1. 调用 C 内核 API（FFI / bindings）</h3>

<p>通过 bindgen 生成的 C 内核 API 在 Rust 侧一律通过 <code class="language-plaintext highlighter-rouge">bindings::</code> 调用，且这些调用均出现在 <code class="language-plaintext highlighter-rouge">unsafe</code> 块或 <code class="language-plaintext highlighter-rouge">unsafe fn</code> 内。统计显示 <strong><code class="language-plaintext highlighter-rouge">bindings::</code> 出现约 1062 次</strong>，是 <code class="language-plaintext highlighter-rouge">unsafe</code> 的一大来源。</p>

<p>典型用法：取得 C 结构体指针、解引用其字段作为参数，再调用 C 函数。例如 PHY 寄存器读写的纯「FFI + 裸指针解引用」：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/net/phy/reg.rs（节选）</span>
<span class="k">impl</span> <span class="n">Register</span> <span class="k">for</span> <span class="n">C22</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">read</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Device</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">u16</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">phydev</span> <span class="o">=</span> <span class="n">dev</span><span class="na">.0</span><span class="nf">.get</span><span class="p">();</span>
        <span class="c1">// SAFETY: `phydev` is pointing to a valid object by the type invariant of `Device`.</span>
        <span class="c1">// So it's just an FFI call, open code of `phy_read()` with a valid `phy_device` pointer</span>
        <span class="k">let</span> <span class="n">ret</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span>
            <span class="nn">bindings</span><span class="p">::</span><span class="nf">mdiobus_read</span><span class="p">((</span><span class="o">*</span><span class="n">phydev</span><span class="p">)</span><span class="py">.mdio.bus</span><span class="p">,</span> <span class="p">(</span><span class="o">*</span><span class="n">phydev</span><span class="p">)</span><span class="py">.mdio.addr</span><span class="p">,</span> <span class="k">self</span><span class="na">.0</span><span class="nf">.into</span><span class="p">())</span>
        <span class="p">};</span>
        <span class="nf">to_result</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="nf">Ok</span><span class="p">(</span><span class="n">ret</span> <span class="k">as</span> <span class="nb">u16</span><span class="p">)</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">write</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Device</span><span class="p">,</span> <span class="n">val</span><span class="p">:</span> <span class="nb">u16</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">phydev</span> <span class="o">=</span> <span class="n">dev</span><span class="na">.0</span><span class="nf">.get</span><span class="p">();</span>
        <span class="c1">// SAFETY: ... (同上)</span>
        <span class="nf">to_result</span><span class="p">(</span><span class="k">unsafe</span> <span class="p">{</span>
            <span class="nn">bindings</span><span class="p">::</span><span class="nf">mdiobus_write</span><span class="p">((</span><span class="o">*</span><span class="n">phydev</span><span class="p">)</span><span class="py">.mdio.bus</span><span class="p">,</span> <span class="p">(</span><span class="o">*</span><span class="n">phydev</span><span class="p">)</span><span class="py">.mdio.addr</span><span class="p">,</span> <span class="k">self</span><span class="na">.0</span><span class="nf">.into</span><span class="p">(),</span> <span class="n">val</span><span class="p">)</span>
        <span class="p">})</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>这里 <code class="language-plaintext highlighter-rouge">unsafe</code> 同时覆盖：<strong>对 C 指针的解引用</strong>（<code class="language-plaintext highlighter-rouge">(*phydev).mdio.bus</code>）和 <strong>FFI 调用</strong>（<code class="language-plaintext highlighter-rouge">bindings::mdiobus_read</code> / <code class="language-plaintext highlighter-rouge">mdiobus_write</code>）。也就是说，与硬件打交道的驱动路径上，即便逻辑是「读/写寄存器」，在 Rust 侧也会体现为「裸指针 + C API」，因而必然落在 <code class="language-plaintext highlighter-rouge">unsafe</code> 内。</p>

<h3 id="2-硬件与并发语义volatile-与-read_once--write_once">2. 硬件与并发语义：volatile 与 READ_ONCE / WRITE_ONCE</h3>

<p>与「硬件或外部可写内存」的交互常需要 volatile 或与内核 READ_ONCE/WRITE_ONCE 等价的语义；这类操作在 Rust 中同样必须放在 <code class="language-plaintext highlighter-rouge">unsafe</code> 里，且<strong>与是否调用 C 无关</strong>——纯 Rust 的 MMIO/寄存器访问（如 Embassy 中的实现）也是如此。</p>

<p><strong>（1）文件描述符标志：对应 READ_ONCE</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/fs/file.rs（节选）</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">flags</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="c1">// This `read_volatile` is intended to correspond to a READ_ONCE call.</span>
    <span class="c1">//</span>
    <span class="c1">// SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.</span>
    <span class="c1">//</span>
    <span class="c1">// FIXME(read_once): Replace with `read_once` when available on the Rust side.</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">core</span><span class="p">::</span><span class="nn">ptr</span><span class="p">::</span><span class="nd">addr_of!</span><span class="p">((</span><span class="o">*</span><span class="k">self</span><span class="nf">.as_ptr</span><span class="p">())</span><span class="py">.f_flags</span><span class="p">)</span><span class="nf">.read_volatile</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>此处用 <code class="language-plaintext highlighter-rouge">read_volatile</code> 表达「可能与其他执行上下文共享的字段」的读，避免编译器优化导致的数据竞争未定义行为，语义上对应 C 侧的 <code class="language-plaintext highlighter-rouge">READ_ONCE</code>。</p>

<p><strong>（2）DMA 一致内存：与硬件/用户态竞态</strong></p>

<p>DMA 或与设备/用户态共享的内存，读写同样需要「单次访问不拆、不优化掉」的语义。内核在 <code class="language-plaintext highlighter-rouge">dma.rs</code> 中通过 <code class="language-plaintext highlighter-rouge">read_volatile</code> / <code class="language-plaintext highlighter-rouge">write_volatile</code> 实现，并明确注释其与 READ_ONCE/WRITE_ONCE 的对应关系及适用范围：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/dma.rs（节选）</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="n">field_read</span><span class="o">&lt;</span><span class="n">F</span><span class="p">:</span> <span class="n">FromBytes</span><span class="o">&gt;</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">field</span><span class="p">:</span> <span class="o">*</span><span class="k">const</span> <span class="n">F</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">F</span> <span class="p">{</span>
    <span class="c1">// SAFETY:</span>
    <span class="c1">// - By the safety requirements field is valid.</span>
    <span class="c1">// - Using read_volatile() here is not sound as per the usual rules, the usage here is</span>
    <span class="c1">// a special exception with the following notes in place. When dealing with a potential</span>
    <span class="c1">// race from a hardware or code outside kernel (e.g. user-space program), we need that</span>
    <span class="c1">// read on a valid memory is not UB. Currently read_volatile() is used for this, and the</span>
    <span class="c1">// rationale behind is that it should generate the same code as READ_ONCE() which the</span>
    <span class="c1">// kernel already relies on to avoid UB on data races. Note that the usage of</span>
    <span class="c1">// read_volatile() is limited to this particular case, it cannot be used to prevent</span>
    <span class="c1">// the UB caused by racing between two kernel functions nor do they provide atomicity.</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="n">field</span><span class="nf">.read_volatile</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="n">field_write</span><span class="o">&lt;</span><span class="n">F</span><span class="p">:</span> <span class="n">AsBytes</span><span class="o">&gt;</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">field</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">F</span><span class="p">,</span> <span class="n">val</span><span class="p">:</span> <span class="n">F</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// SAFETY: ... (与 READ_ONCE 对应地，此处对应 WRITE_ONCE)</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="n">field</span><span class="nf">.write_volatile</span><span class="p">(</span><span class="n">val</span><span class="p">)</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>可见：<strong>只要涉及「硬件或内核外部的竞态」，就需要这类 volatile 访问，并因此使用 <code class="language-plaintext highlighter-rouge">unsafe</code></strong>，与是否经过 C 代码无关。</p>

<h3 id="3-mmio--ioremap资源映射与释放">3. MMIO / ioremap：资源映射与释放</h3>

<p>内存映射 I/O（MMIO）是驱动访问设备寄存器的常见方式。内核 Rust 侧对 <code class="language-plaintext highlighter-rouge">ioremap</code> / <code class="language-plaintext highlighter-rouge">iounmap</code> 的封装同样在 <code class="language-plaintext highlighter-rouge">unsafe</code> 中完成，并配有 SAFETY 注释说明前置条件：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/io/mem.rs（节选）</span>
<span class="k">fn</span> <span class="nf">ioremap</span><span class="p">(</span><span class="n">resource</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Resource</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="c1">// ...</span>
    <span class="k">let</span> <span class="n">addr</span> <span class="o">=</span> <span class="k">if</span> <span class="n">resource</span><span class="nf">.flags</span><span class="p">()</span><span class="nf">.contains</span><span class="p">(</span><span class="nn">io</span><span class="p">::</span><span class="nn">resource</span><span class="p">::</span><span class="nn">Flags</span><span class="p">::</span><span class="n">IORESOURCE_MEM_NONPOSTED</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// SAFETY:</span>
        <span class="c1">// - `res_start` and `size` are read from a presumably valid `struct resource`.</span>
        <span class="c1">// - `size` is known not to be zero at this point.</span>
        <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">bindings</span><span class="p">::</span><span class="nf">ioremap_np</span><span class="p">(</span><span class="n">res_start</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span> <span class="p">}</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">bindings</span><span class="p">::</span><span class="nf">ioremap</span><span class="p">(</span><span class="n">res_start</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span> <span class="p">}</span>
    <span class="p">};</span>
    <span class="c1">// ...</span>
<span class="p">}</span>

<span class="k">impl</span><span class="o">&lt;</span><span class="k">const</span> <span class="n">SIZE</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span> <span class="nb">Drop</span> <span class="k">for</span> <span class="n">IoMem</span><span class="o">&lt;</span><span class="n">SIZE</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">drop</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// SAFETY: Safe as by the invariant of `Io`.</span>
        <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">bindings</span><span class="p">::</span><span class="nf">iounmap</span><span class="p">(</span><span class="k">self</span><span class="py">.io</span><span class="nf">.addr</span><span class="p">()</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">c_void</span><span class="p">)</span> <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>这里既有 <strong>FFI（调用 C 的 ioremap/iounmap）</strong>，也有 <strong>对「映射得到的地址」所代表的 I/O 内存的访问约定</strong>，二者都属于与硬件打交道的边界，因此用 <code class="language-plaintext highlighter-rouge">unsafe</code> 是必然的。</p>

<h3 id="4-裸指针与内存操作">4. 裸指针与内存操作</h3>

<p>除上述 FFI 与 volatile 外，内核 Rust 中还有大量「裸指针解引用、<code class="language-plaintext highlighter-rouge">ptr::read</code>/<code class="language-plaintext highlighter-rouge">ptr::write</code>、<code class="language-plaintext highlighter-rouge">drop_in_place</code>、<code class="language-plaintext highlighter-rouge">addr_of!</code>」等用法，分布在：</p>

<ul>
  <li><strong>pin-init</strong>：未初始化/固定内存的初始化与析构；</li>
  <li><strong>kernel/alloc</strong>：自定义分配器、KBox、kvec 等；</li>
  <li><strong>kernel/sync/arc</strong>、<strong>kernel/list</strong>、<strong>kernel/rbtree</strong> 等：与 C 结构或内核生命周期绑定的共享/链表/树。</li>
</ul>

<p>这些同样不依赖「是否调 C」：只要涉及未初始化内存、自管理指针或与 C 结构布局的互操作，就需要在 <code class="language-plaintext highlighter-rouge">unsafe</code> 中手动维护不变式。</p>

<h3 id="5-其他pintransmutesendsync">5. 其他：Pin、transmute、Send/Sync</h3>

<ul>
  <li><strong>Pin::new_unchecked</strong>、<strong>NonNull::new_unchecked</strong>、pin-init 的闭包初始化等：用于在保证不移动或初始化顺序的前提下构造对象，约 30+ 处。</li>
  <li><strong>transmute / transmute_copy</strong>：与 C 类型或 ABI 的互转、内部表示转换，约 35 处。</li>
  <li><strong>unsafe impl Send / Sync</strong>：为内部含裸指针或 FFI 句柄的类型标注可跨线程传递或共享，约 90+ 处。</li>
</ul>

<p>它们都与「和硬件或 C 边界交互」时的生命周期、布局、并发约定直接相关，是内核 Rust 中 <code class="language-plaintext highlighter-rouge">unsafe</code> 的组成部分，而不是「可选的风格问题」。</p>

<h2 id="按子系统的分布约">按子系统的分布（约）</h2>

<table>
  <thead>
    <tr>
      <th>子系统</th>
      <th><code class="language-plaintext highlighter-rouge">unsafe</code> 次数</th>
      <th>说明</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>kernel/</strong>（整体）</td>
      <td>1644</td>
      <td>含下列子目录</td>
    </tr>
    <tr>
      <td>kernel/sync</td>
      <td>142</td>
      <td>锁、Arc、RCU、completion 等</td>
    </tr>
    <tr>
      <td>kernel/alloc</td>
      <td>109</td>
      <td>分配器、KBox、kvec 等</td>
    </tr>
    <tr>
      <td>kernel/drm</td>
      <td>72</td>
      <td>DRM 驱动、GEM、ioctl 等</td>
    </tr>
    <tr>
      <td>kernel/net</td>
      <td>56</td>
      <td>网络、PHY 寄存器等</td>
    </tr>
    <tr>
      <td>kernel/block</td>
      <td>47</td>
      <td>块层、request、gen_disk 等</td>
    </tr>
    <tr>
      <td>kernel/device</td>
      <td>31</td>
      <td>设备模型、property 等</td>
    </tr>
    <tr>
      <td>kernel/io</td>
      <td>19</td>
      <td>ioremap、I/O 资源、mem 等</td>
    </tr>
  </tbody>
</table>

<p>驱动与硬件相关模块（net、block、drm、io、device 等）中 <code class="language-plaintext highlighter-rouge">unsafe</code> 密集，与「但凡和硬件扯上关系就需要 unsafe」的直观一致；sync/alloc 则多为并发与内存管理抽象本身的边界。</p>

<h2 id="小结">小结</h2>

<ul>
  <li><strong>「和硬件扯上关系就要 unsafe」</strong>：内核 Rust 的现状与之相符。MMIO（io/mem）、PHY 寄存器（net/phy）、DMA 读写（dma.rs）、以及大量通过 <code class="language-plaintext highlighter-rouge">bindings::</code> 调用的 C 驱动 API，都位于 <code class="language-plaintext highlighter-rouge">unsafe</code> 中；驱动/硬件路径几乎必然触及 <code class="language-plaintext highlighter-rouge">unsafe</code>。</li>
  <li><strong>「和是否调用 C 无关」</strong>：
    <ul>
      <li>调用 C（<code class="language-plaintext highlighter-rouge">bindings::</code>）约 1062 处，占 <code class="language-plaintext highlighter-rouge">unsafe</code> 比例很高。</li>
      <li>但 <strong>volatile 访问</strong>（file.rs、dma.rs）、<strong>裸指针解引用</strong>、<strong>Pin/初始化</strong>、<strong>Send/Sync</strong>、<strong>transmute</strong> 等，很多并不依赖「调 C」，而是<strong>内核的硬件与内存模型</strong>本身就需要在 Rust 中通过 <code class="language-plaintext highlighter-rouge">unsafe</code> 表达。<br />
因此：既有大量「因调 C 而 unsafe」，也有大量「因硬件/并发/内存边界而 unsafe」；与 Embassy 等纯 Rust 驱动/裸机生态一致——<strong>与硬件或底层边界打交道的代码，即使用纯 Rust 写，unsafe 仍会集中在这些边界上</strong>。</li>
    </ul>
  </li>
</ul>

<p>统计基于主线内核 <code class="language-plaintext highlighter-rouge">rust/</code> 树，代码片段取自同一树中的实际文件（见文中路径注释）<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>。</p>

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://docs.kernel.org/rust/general-information.html">Linux Kernel - Rust support</a> - 内核 Rust 支持与目录结构说明 <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://rust-for-linux.com/">Rust for Linux</a> - 内核内 Rust 支持项目与文档 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[与「只有调用 C 才需要 unsafe」的常见误解不同，但凡涉及硬件或与内核/硬件边界交互（如驱动、MMIO、DMA），在 Rust 里几乎必然要使用 unsafe，这与是否通过 FFI 调 C 无必然关系——例如 Embassy 等纯 Rust 裸机/驱动生态里，硬件相关操作同样大量集中在 unsafe 中。本文基于对主线 Linux 内核 rust/ 目录的统计与代码抽样，归纳当前内核 Rust 中 unsafe 的实际使用场景，并辅以真实内核代码说明。]]></summary></entry><entry><title type="html">用户态锁与内核：谁在管理「等待」与 futex</title><link href="https://weinan.io/2026/03/02/userspace-locks-and-kernel-futex.html" rel="alternate" type="text/html" title="用户态锁与内核：谁在管理「等待」与 futex" /><published>2026-03-02T00:00:00+00:00</published><updated>2026-03-02T00:00:00+00:00</updated><id>https://weinan.io/2026/03/02/userspace-locks-and-kernel-futex</id><content type="html" xml:base="https://weinan.io/2026/03/02/userspace-locks-and-kernel-futex.html"><![CDATA[<p>从底层实现看，<strong>用户态（userspace）的锁机制，其核心的阻塞与唤醒功能，最终依赖于内核提供的同步原语</strong>。可以用一个比喻理解：用户态的锁像大楼里每个房间的门锁（轻便、快速），内核的同步则像大楼的主门与安防（全局、负责调度）。多数时候大家只用房间门锁（用户态原子操作或自旋），但当线程需要「离开大楼」或「被叫醒」时，必须经过主门——即通过系统调用进入内核。本文说明这一依赖关系、<strong>futex（Fast Userspace Mutex）</strong> 如何作为桥梁，并辅以 Linux 内核源码与参考文献；关于锁的误用如何导致性能问题，见本博客<a href="https://weinan.io/2026/03/01/why-language-speed-is-misleading.html">《为什么「语言速度」是伪命题》</a>中的「锁的误用与性能」一节<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>。</p>

<h2 id="1-谁在管理等待">1. 谁在管理「等待」？</h2>

<p>用户态程序<strong>无法直接控制 CPU 的调度</strong>，只有内核才有权暂停一个线程（让出 CPU）并在未来某刻恢复它。内核能获得这一能力，依赖两类入口：<strong>系统调用</strong>（线程主动进入内核，例如调用 <code class="language-plaintext highlighter-rouge">futex</code> 后在内核里执行 <code class="language-plaintext highlighter-rouge">schedule()</code> 让出 CPU）与<strong>定时中断</strong>（周期性的时钟中断让内核有机会更新运行时间、设置「需要调度」标志，从而在返回用户态前或下次进入内核时执行 <code class="language-plaintext highlighter-rouge">schedule()</code>，实现抢占或时间片轮转）。定时中断路径在 Linux 上的实现大致为：时钟事件驱动 <strong><code class="language-plaintext highlighter-rouge">tick_periodic()</code></strong>（传统周期 tick）或 <strong><code class="language-plaintext highlighter-rouge">tick_nohz_handler()</code></strong>（高分辨率/动态 tick）→ <strong><code class="language-plaintext highlighter-rouge">update_process_times()</code></strong>（<code class="language-plaintext highlighter-rouge">kernel/time/timer.c</code>）→ <strong><code class="language-plaintext highlighter-rouge">sched_tick()</code></strong>（<code class="language-plaintext highlighter-rouge">kernel/sched/core.c</code>）；<code class="language-plaintext highlighter-rouge">sched_tick()</code> 的注释写明 “This function gets called by the timer code, with HZ frequency”，在其中更新 runqueue 时钟、调用当前任务所属调度类的 <strong><code class="language-plaintext highlighter-rouge">task_tick</code></strong>，并可能调用 <strong><code class="language-plaintext highlighter-rouge">resched_curr()</code></strong> 标记需要重新调度，从而在适当时机触发 <strong><code class="language-plaintext highlighter-rouge">__schedule()</code></strong> 切换任务<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">2</a></sup>。</p>

<ul>
  <li><strong>若锁被占用且等待时间可能较长</strong>：线程需要<strong>阻塞</strong>——主动放弃 CPU、进入睡眠，直到锁被释放。这个「让出 CPU 并睡眠」的动作必须通过内核提供的系统调用来完成，在 Linux 上即 <strong><code class="language-plaintext highlighter-rouge">futex</code></strong> 等<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup>。</li>
  <li><strong>若锁只被短暂占用</strong>：线程可以选择<strong>自旋</strong>，即原地循环检查锁状态，不进入内核；线程一直占着 CPU。这仅适用于多核且持锁时间极短的场景，否则会浪费 CPU。</li>
</ul>

<p>因此：<strong>能「睡下去」和「被唤醒」的锁，一定依赖内核。</strong></p>

<h2 id="2-关键桥梁futex-fast-userspace-mutex">2. 关键桥梁：futex (Fast Userspace Mutex)</h2>

<p>在现代 Linux 上，几乎所有高性能用户态锁（如 NPTL 的 <code class="language-plaintext highlighter-rouge">pthread_mutex</code>、<code class="language-plaintext highlighter-rouge">pthread_cond</code>）底层都依赖 <strong>futex</strong>。其设计哲学正是「大部分时间在用户态解决，必要时才进内核」<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup><sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup>。</p>

<h3 id="21-无竞争时fast-path">2.1 无竞争时（Fast Path）</h3>

<p>线程尝试加锁时，若锁空闲，只需在<strong>用户态</strong>用一条原子指令（如 CAS）把锁变量从 0 改为 1。<strong>全程无系统调用，极快。</strong></p>

<h3 id="22-有竞争时slow-path">2.2 有竞争时（Slow Path）</h3>

<ol>
  <li><strong>用户态</strong>：尝试加锁的线程发现锁已被占用，将自身标记为「等待」，然后调用 <strong><code class="language-plaintext highlighter-rouge">futex</code> 系统调用</strong>进入内核。</li>
  <li><strong>内核态</strong>：内核把该线程放入与该 futex 对应的<strong>等待队列</strong>，并调度其他线程运行，当前线程阻塞。</li>
  <li><strong>释放与唤醒</strong>：持锁线程释放时，在用户态用原子指令把锁变量改回 0，并检查是否有等待者；若有，再调用 <strong><code class="language-plaintext highlighter-rouge">futex</code></strong> 通知内核唤醒。</li>
  <li><strong>内核响应</strong>：内核从等待队列中唤醒被阻塞的线程，该线程得以继续运行并再次尝试获取锁。</li>
</ol>

<p>因此，<strong>futex 本质上是内核提供的「等待队列管理器」</strong>，锁的值（0/1）由用户态维护，阻塞与唤醒由内核完成。内核实现见 <strong><code class="language-plaintext highlighter-rouge">kernel/futex/</code></strong>：系统调用入口为 <strong><code class="language-plaintext highlighter-rouge">SYSCALL_DEFINE6(futex, ...)</code></strong>，根据 <code class="language-plaintext highlighter-rouge">op</code> 分发到 <strong><code class="language-plaintext highlighter-rouge">futex_wait</code></strong> / <strong><code class="language-plaintext highlighter-rouge">futex_wake</code></strong> 等<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup><sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">5</a></sup>。</p>

<h2 id="3-cpu-层面的锁机制原子指令与内存序">3. CPU 层面的锁机制：原子指令与内存序</h2>

<p>用户态「无竞争时一条原子指令加锁」依赖 <strong>CPU 提供的原子读-改-写（RMW）与内存序保证</strong>；否则多核下既无法保证互斥，也无法保证临界区内的写对其他核可见。以下为两种常见架构的要点与权威出处。</p>

<h3 id="31-x86lock-前缀与原子性">3.1 x86：LOCK 前缀与原子性</h3>

<p>在 x86 上，<strong>LOCK</strong> 前缀（opcode F0）可使特定指令在多核下<strong>原子</strong>执行：目标为内存操作数时，会断言 LOCK# 信号（或等价机制），使该次读-改-写不可被其他 CPU 打断。可加 LOCK 的指令包括 <strong>CMPXCHG</strong>（比较并交换）、<strong>XCHG</strong>（与内存交换）、<strong>ADD/SUB/INC/DEC</strong> 等；<strong>XCHG</strong> 在目标为内存时即使不加前缀也会具有锁语义。现代 x86（P6 及以后）对已缓存的地址通常采用 <strong>cache locking</strong>（依赖 MESI 等缓存一致性协议），而非锁总线，从而减少延迟<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">6</a></sup>。</p>

<p>LOCK 前缀还带来<strong>内存序</strong>效果：带 LOCK 的指令与其它 LOCK 指令之间存在全序；普通 load/store 不能与 LOCK 指令重排。因此「加锁」可用带 acquire 语义的原子操作（如 CMPXCHG 成功后相当于 acquire），「解锁」用带 release 语义的写（如原子 store 0），能保证临界区内的修改在解锁后对其它核可见、且其它核的修改在加锁后对本核可见。详见 Intel SDM Vol 3A 第 8 章（Multiple-Processor Management）及 Vol 2A 对 LOCK 的说明<sup id="fnref:7:1" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">6</a></sup>。</p>

<h3 id="32-arm独占加载存储ldxrstxr与-exclusive-monitor">3.2 ARM：独占加载/存储（LDXR/STXR）与 Exclusive Monitor</h3>

<p>ARM 没有像 x86 那样的「单条指令原子 RMW」，而是用 <strong>Load-Exclusive / Store-Exclusive</strong> 实现：<strong>LDXR</strong>（Load Exclusive Register）从某地址加载并让该地址被本核的 <strong>exclusive monitor</strong> 标记；<strong>STXR</strong>（Store Exclusive Register）仅在该地址仍被本核独占时写入并返回 0，否则写入失败、返回非 0，由软件重试。这样一对 LDXR + STXR 可实现「读-改-写」的原子性，是用户态自旋锁、CAS 等的基础。ARMv8 还提供 <strong>LDAXR/STLXR</strong> 等带 <strong>acquire/release</strong> 语义的变种，在实现 mutex 时保证临界区前后的可见性<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">7</a></sup>。</p>

<p>Exclusive monitor 是硬件状态：若其它 CPU 在该地址上产生了 store 或其它使独占失效的访问，当前核的 STXR 会失败，从而避免多核同时写。软件需保证在 LDXR 与 STXR 之间不插入会破坏独占性的操作（如显式访问该地址、某些系统寄存器或 cache 维护指令）。详见 ARM 架构参考手册中「Load-Exclusive and Store-Exclusive」与「Synchronization and semaphores」<sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">7</a></sup>。</p>

<h3 id="33-与-futex-的关系">3.3 与 futex 的关系</h3>

<ul>
  <li><strong>无竞争</strong>：用户态用上述原子指令（x86 的 CMPXCHG/XCHG，ARM 的 LDXR/STXR 或 LDAXR/STLXR）完成「尝试加锁 / 解锁」，<strong>不进入内核</strong>，因此极快。</li>
  <li><strong>有竞争</strong>：原子尝试失败后，若选择阻塞，再通过 <strong>futex</strong> 系统调用进入内核、挂入等待队列。</li>
</ul>

<p>内核自身在实现 futex 的哈希桶、等待队列时，同样依赖各架构的原子与内存屏障；Linux 内核文档 <strong>atomic_t.txt</strong>、<strong>memory-barriers.txt</strong> 对原子 RMW、acquire/release 变种及与锁的配合有统一说明<sup id="fnref:8:2" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">7</a></sup>。</p>

<h2 id="4-为什么不能完全在用户态实现阻塞锁">4. 为什么不能完全在用户态实现「阻塞」锁？</h2>

<p>若完全在用户态实现，当线程拿不到锁时只有两种选择：</p>

<ol>
  <li><strong>自旋（忙等）</strong>：一直循环检查。持锁时间一长就会白占 CPU，浪费严重。</li>
  <li><strong>sleep + 轮询</strong>：调用 <code class="language-plaintext highlighter-rouge">sleep()</code> 睡一会儿再起来看。延迟不可控（可能刚睡下锁就释放了），且无法做到「锁一释放就立刻被唤醒」。</li>
</ol>

<p>要实现「锁释放时立刻唤醒」的语义，<strong>必须有一个全局的调度者管理线程状态</strong>，这个角色只能是操作系统内核。</p>

<h2 id="5-完全在用户态的锁">5. 完全在用户态的锁</h2>

<p>有，但适用场景受限：</p>

<ul>
  <li><strong>自旋锁</strong>：基于原子操作，预期持锁时间仅几条指令时可用。<strong>完全不依赖内核</strong>，代价是：若锁被长时间持有，CPU 会空转。内核与用户态都常用；用户态自旋锁不涉及 futex。</li>
  <li><strong>序列锁（seqlock）</strong> 等乐观并发：主要在用户态通过内存序与版本号完成，但冲突激烈时可能需重试或退化为等待，仍可能依赖内核。</li>
</ul>

<p>关于<strong>何时用自旋、何时用可睡眠的锁</strong>，以及<strong>粗粒度锁、持锁做 I/O</strong> 对性能的影响，见本博客<a href="https://weinan.io/2026/03/01/why-language-speed-is-misleading.html">《为什么「语言速度」是伪命题》</a>#锁的误用与性能<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>。</p>

<h2 id="6-总结">6. 总结</h2>

<ul>
  <li><strong>上层（用户态）</strong>：用原子指令快速尝试获取锁，无竞争时避免任何内核开销。</li>
  <li><strong>下层（内核）</strong>：通过 <strong>futex</strong> 等原语提供「等待队列 + 调度」，处理阻塞与唤醒。</li>
</ul>

<p><strong>用户态锁的「快」，是因为无竞争时绕过了内核；它之所以能成为通用的、可阻塞的锁，是因为有竞争时有内核的兜底。</strong></p>

<hr />

<h2 id="补充阅读自旋睡眠与-sleep-时间准确度">补充阅读：自旋、睡眠与 sleep 时间准确度</h2>

<h3 id="自旋就是在浪费-cpu-的循环">自旋就是在浪费 CPU 的循环</h3>

<p><strong>自旋（spin）</strong>即拿不到锁时不放弃 CPU，在用户态（或内核态）反复执行「读锁变量 → 判断是否可用 → 再读再判断」的循环，直到锁被释放。这段时间里 CPU 一直在跑这条循环，没有做业务逻辑，从系统角度看就是<strong>空转、浪费该核的算力</strong>。因此自旋只适合「预计很快就能拿到锁」的场景（例如持锁只有几条指令）；否则会长时间白占 CPU。自旋时常配合 <strong>PAUSE</strong>（x86）或 <strong>WFE</strong>（ARM）等指令减轻总线竞争，但本质仍是循环等待。</p>

<h3 id="cpu-如何实现sleep没有-sleep-指令靠调度与上下文切换">CPU 如何「实现」sleep：没有 sleep 指令，靠调度与上下文切换</h3>

<p>CPU <strong>没有</strong>「让某条线程 sleep」的指令。「Sleep」是操作系统用<strong>调度 + 上下文切换</strong>实现的效果：CPU 只是在执行当前被调度到的指令流。</p>

<ul>
  <li><strong>线程如何睡过去</strong>：线程在用户态执行会阻塞的操作（如 <code class="language-plaintext highlighter-rouge">futex(FUTEX_WAIT)</code>、<code class="language-plaintext highlighter-rouge">read()</code> 阻塞 fd）时发生<strong>系统调用</strong>，陷入内核。内核把对应 <strong>task</strong> 挂到<strong>等待队列</strong>，状态改为 <strong><code class="language-plaintext highlighter-rouge">TASK_INTERRUPTIBLE</code></strong> 等，<strong>不再放在 runqueue 上</strong>；随后内核调用 <strong><code class="language-plaintext highlighter-rouge">schedule()</code></strong>，做<strong>上下文切换</strong>——把当前线程的寄存器、PC、栈等存到内存，从 runqueue 选另一 task 加载回 CPU 并执行。从这一刻起，「睡着」的线程的指令不再被 CPU 执行。futex 路径上可见 <strong><code class="language-plaintext highlighter-rouge">kernel/futex/waitwake.c</code></strong> 中 <strong><code class="language-plaintext highlighter-rouge">set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE)</code></strong> 与 <strong><code class="language-plaintext highlighter-rouge">futex_do_wait()</code></strong> 内的 <strong><code class="language-plaintext highlighter-rouge">schedule()</code></strong><sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">5</a></sup><sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">8</a></sup>。</li>
  <li><strong>CPU 在做什么</strong>：当线程 A sleep 时，A 的上下文保存在内存里，CPU 去执行线程 B 或 idle。没有「sleep」这条指令，只是内核不再把 CPU 分给该线程。</li>
  <li><strong>易混淆的指令</strong>：<strong>HLT</strong>（x86）/ <strong>WFI</strong>（ARM）是 idle 任务在「完全没活可干」时用的，让整核等中断，不是「某条线程 sleep」。<strong>PAUSE</strong>（x86）是自旋等锁时用的，不是 sleep。</li>
</ul>

<h3 id="sleep-的时间准确度定时器到期由时钟定时器中断触发唤醒">Sleep 的时间准确度：定时器到期，由时钟/定时器中断触发唤醒</h3>

<p>「睡多久」由内核<strong>定时器（timer）</strong>到期保证；到期由<strong>时钟/定时器中断</strong>（或高精度 timer 回调）触发。</p>

<ul>
  <li><strong>带时间的 sleep 在内核里</strong>：例如 <code class="language-plaintext highlighter-rouge">nanosleep(2s)</code>、<code class="language-plaintext highlighter-rouge">futex_wait(..., timeout)</code> 时，内核把线程挂到等待队列，并依「当前时间 + 时长」登记一个<strong>高精度定时器（hrtimer）</strong>，到期时间即目标唤醒时间。futex 带超时等待使用 <strong><code class="language-plaintext highlighter-rouge">struct hrtimer_sleeper</code></strong>，在 <strong><code class="language-plaintext highlighter-rouge">futex_do_wait()</code></strong> 中若传入 timeout 会调用 <strong><code class="language-plaintext highlighter-rouge">hrtimer_sleeper_start_expires(timeout, HRTIMER_MODE_ABS)</code></strong>，到期后 hrtimer 回调会间接使该 task 被唤醒<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">8</a></sup>。</li>
  <li><strong>时间到了怎么醒</strong>：定时器子系统（如 hrtimer）按到期时间排序，到点由<strong>时钟/定时器中断</strong>或高精度 timer 中断（及后续 softirq）执行回调；对「sleep 到期」的 timer，回调里通过 <strong>wake_up</strong> 等把线程从等待队列移回 <strong>runqueue</strong>，设为可运行。</li>
  <li><strong>准确度</strong>：<strong>何时被唤醒（变为 runnable）</strong>由 timer 到期与中断路径保证；<strong>何时真正再次得到 CPU</strong> 还受调度延迟影响（通常为微秒到毫秒级）。高精度定时器（hrtimer）可提供微秒级分辨率；若仅用低分辨率 jiffies，到期检查受 tick 间隔限制。</li>
</ul>

<p>可参见 <strong><code class="language-plaintext highlighter-rouge">kernel/futex/waitwake.c</code></strong>（<code class="language-plaintext highlighter-rouge">futex_do_wait</code>、<code class="language-plaintext highlighter-rouge">hrtimer_sleeper_start_expires</code>、<code class="language-plaintext highlighter-rouge">set_current_state(TASK_INTERRUPTIBLE)</code>、<code class="language-plaintext highlighter-rouge">schedule()</code>）及 <strong><code class="language-plaintext highlighter-rouge">kernel/sched/core.c</code></strong>（<code class="language-plaintext highlighter-rouge">schedule()</code>/<code class="language-plaintext highlighter-rouge">__schedule()</code> 的上下文切换）<sup id="fnref:4:2" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">5</a></sup><sup id="fnref:10:2" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">8</a></sup>。</p>

<hr />

<h2 id="扩展阅读内核与接口">扩展阅读（内核与接口）</h2>

<ul>
  <li><strong>futex 系统调用</strong>：<strong><code class="language-plaintext highlighter-rouge">kernel/futex/syscalls.c</code></strong> 中 <strong><code class="language-plaintext highlighter-rouge">SYSCALL_DEFINE6(futex, ...)</code></strong> 与 <strong><code class="language-plaintext highlighter-rouge">do_futex()</code></strong>，根据 <code class="language-plaintext highlighter-rouge">op</code>（如 <code class="language-plaintext highlighter-rouge">FUTEX_WAIT</code>、<code class="language-plaintext highlighter-rouge">FUTEX_WAKE</code>）分发到 <strong><code class="language-plaintext highlighter-rouge">kernel/futex/waitwake.c</code></strong> 的 <strong><code class="language-plaintext highlighter-rouge">futex_wait()</code></strong>、<strong><code class="language-plaintext highlighter-rouge">futex_wake()</code></strong><sup id="fnref:3:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup><sup id="fnref:4:3" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">5</a></sup>。</li>
  <li><strong>等待与唤醒逻辑</strong>：<strong><code class="language-plaintext highlighter-rouge">waitwake.c</code></strong> 中 <code class="language-plaintext highlighter-rouge">futex_wait_setup()</code> 将当前任务入队，<code class="language-plaintext highlighter-rouge">__futex_wait()</code> 调用 <code class="language-plaintext highlighter-rouge">futex_do_wait()</code> 进入调度；<code class="language-plaintext highlighter-rouge">futex_wake()</code> 在哈希桶中查找等待者并 <code class="language-plaintext highlighter-rouge">wake_up_q()</code><sup id="fnref:4:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">5</a></sup><sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">9</a></sup>。</li>
  <li><strong>futex 设计</strong>：<strong><code class="language-plaintext highlighter-rouge">kernel/futex/core.c</code></strong> 文件头注释（Rusty Russell 等）对 Fast Userspace Mutex 的由来与设计有简要说明；LWN 多篇文章介绍其演进与优化<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup><sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">10</a></sup>。</li>
  <li><strong>CPU 原子与内存序</strong>：x86 LOCK 前缀与多核原子见 Intel SDM Vol 2A/Vol 3A；ARM 独占加载/存储见 ARM ARM；Linux 内核 <strong>atomic_t.txt</strong>、<strong>memory-barriers.txt</strong> 对原子 RMW 与 acquire/release 的说明<sup id="fnref:7:2" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">6</a></sup><sup id="fnref:8:3" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">7</a></sup>。</li>
  <li><strong>定时中断与调度</strong>：<strong><code class="language-plaintext highlighter-rouge">kernel/time/timer.c</code></strong> 中 <strong><code class="language-plaintext highlighter-rouge">update_process_times()</code></strong> 由时钟中断路径调用，内部调用 <strong><code class="language-plaintext highlighter-rouge">sched_tick()</code></strong>；<strong><code class="language-plaintext highlighter-rouge">kernel/time/tick-common.c</code></strong> 的 <strong><code class="language-plaintext highlighter-rouge">tick_periodic()</code></strong>、<strong><code class="language-plaintext highlighter-rouge">kernel/time/tick-sched.c</code></strong> 的 <strong><code class="language-plaintext highlighter-rouge">tick_nohz_handler()</code></strong> → <strong><code class="language-plaintext highlighter-rouge">tick_sched_handle()</code></strong> 均会调用 <strong><code class="language-plaintext highlighter-rouge">update_process_times()</code></strong>；<strong><code class="language-plaintext highlighter-rouge">kernel/sched/core.c</code></strong> 中 <strong><code class="language-plaintext highlighter-rouge">sched_tick()</code></strong> 以 HZ 频率被 timer 代码调用，负责更新 rq 时钟与 <strong><code class="language-plaintext highlighter-rouge">task_tick</code></strong>、必要时 <strong><code class="language-plaintext highlighter-rouge">resched_curr()</code></strong><sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">2</a></sup>。</li>
  <li><strong>自旋、睡眠与 sleep 时间</strong>：自旋即占 CPU 的循环等待；sleep 由内核等待队列 + <strong><code class="language-plaintext highlighter-rouge">schedule()</code></strong> 实现，无专用 CPU 指令。带超时的 sleep 依赖 <strong>hrtimer</strong> 到期，由时钟/定时器中断触发唤醒。见 <strong><code class="language-plaintext highlighter-rouge">kernel/futex/waitwake.c</code></strong>（<code class="language-plaintext highlighter-rouge">futex_do_wait</code>、<code class="language-plaintext highlighter-rouge">hrtimer_sleeper_start_expires</code>、<code class="language-plaintext highlighter-rouge">TASK_INTERRUPTIBLE</code>、<code class="language-plaintext highlighter-rouge">schedule()</code>）<sup id="fnref:10:3" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">8</a></sup>。</li>
</ul>

<hr />

<h2 id="内核代码片段与正文对应">内核代码片段（与正文对应）</h2>

<p><strong>1. futex 系统调用入口与分发</strong>（<code class="language-plaintext highlighter-rouge">kernel/futex/syscalls.c</code>）</p>

<p>用户态调用 <code class="language-plaintext highlighter-rouge">futex(uaddr, op, ...)</code> 时，内核根据 <code class="language-plaintext highlighter-rouge">op &amp; FUTEX_CMD_MASK</code> 分发到 <code class="language-plaintext highlighter-rouge">futex_wait</code> 或 <code class="language-plaintext highlighter-rouge">futex_wake</code> 等；<code class="language-plaintext highlighter-rouge">FUTEX_WAIT</code> / <code class="language-plaintext highlighter-rouge">FUTEX_WAKE</code> 走 <code class="language-plaintext highlighter-rouge">do_futex()</code><sup id="fnref:3:4" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup><sup id="fnref:4:5" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">5</a></sup>。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 kernel/futex/syscalls.c（约 84–106 行、160 行）</span>
<span class="kt">long</span> <span class="nf">do_futex</span><span class="p">(</span><span class="n">u32</span> <span class="n">__user</span> <span class="o">*</span><span class="n">uaddr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">op</span><span class="p">,</span> <span class="n">u32</span> <span class="n">val</span><span class="p">,</span> <span class="n">ktime_t</span> <span class="o">*</span><span class="n">timeout</span><span class="p">,</span>
              <span class="n">u32</span> <span class="n">__user</span> <span class="o">*</span><span class="n">uaddr2</span><span class="p">,</span> <span class="n">u32</span> <span class="n">val2</span><span class="p">,</span> <span class="n">u32</span> <span class="n">val3</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">flags</span> <span class="o">=</span> <span class="n">futex_to_flags</span><span class="p">(</span><span class="n">op</span><span class="p">);</span>
    <span class="kt">int</span> <span class="n">cmd</span> <span class="o">=</span> <span class="n">op</span> <span class="o">&amp;</span> <span class="n">FUTEX_CMD_MASK</span><span class="p">;</span>
    <span class="c1">// ...</span>
    <span class="k">switch</span> <span class="p">(</span><span class="n">cmd</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="n">FUTEX_WAIT</span><span class="p">:</span>
    <span class="k">case</span> <span class="n">FUTEX_WAIT_BITSET</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">futex_wait</span><span class="p">(</span><span class="n">uaddr</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">val</span><span class="p">,</span> <span class="n">timeout</span><span class="p">,</span> <span class="n">val3</span><span class="p">);</span>
    <span class="k">case</span> <span class="n">FUTEX_WAKE</span><span class="p">:</span>
    <span class="k">case</span> <span class="n">FUTEX_WAKE_BITSET</span><span class="p">:</span>
        <span class="k">return</span> <span class="n">futex_wake</span><span class="p">(</span><span class="n">uaddr</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="n">val</span><span class="p">,</span> <span class="n">val3</span><span class="p">);</span>
    <span class="c1">// ...</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="n">SYSCALL_DEFINE6</span><span class="p">(</span><span class="n">futex</span><span class="p">,</span> <span class="n">u32</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="n">uaddr</span><span class="p">,</span> <span class="kt">int</span><span class="p">,</span> <span class="n">op</span><span class="p">,</span> <span class="n">u32</span><span class="p">,</span> <span class="n">val</span><span class="p">,</span>
                <span class="k">const</span> <span class="k">struct</span> <span class="n">__kernel_timespec</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="n">utime</span><span class="p">,</span>
                <span class="n">u32</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="n">uaddr2</span><span class="p">,</span> <span class="n">u32</span><span class="p">,</span> <span class="n">val3</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// 超时处理等 ...</span>
    <span class="k">return</span> <span class="n">do_futex</span><span class="p">(</span><span class="n">uaddr</span><span class="p">,</span> <span class="n">op</span><span class="p">,</span> <span class="n">val</span><span class="p">,</span> <span class="n">tp</span><span class="p">,</span> <span class="n">uaddr2</span><span class="p">,</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)</span><span class="n">utime</span><span class="p">,</span> <span class="n">val3</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>2. 等待与唤醒：入队与 schedule</strong>（<code class="language-plaintext highlighter-rouge">kernel/futex/waitwake.c</code>）</p>

<p><code class="language-plaintext highlighter-rouge">__futex_wait()</code> 通过 <code class="language-plaintext highlighter-rouge">futex_wait_setup()</code> 准备并入队，再调用 <code class="language-plaintext highlighter-rouge">futex_do_wait()</code> 进入睡眠；<code class="language-plaintext highlighter-rouge">futex_wake()</code> 根据 uaddr 算哈希桶，在桶内链表中找到匹配的等待者并唤醒<sup id="fnref:4:6" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">5</a></sup><sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">9</a></sup>。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 kernel/futex/waitwake.c</span>
<span class="c1">// __futex_wait()（约 666–687 行）：准备等待、入队、进入 schedule</span>
<span class="kt">int</span> <span class="nf">__futex_wait</span><span class="p">(</span><span class="n">u32</span> <span class="n">__user</span> <span class="o">*</span><span class="n">uaddr</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">flags</span><span class="p">,</span> <span class="n">u32</span> <span class="n">val</span><span class="p">,</span>
                 <span class="k">struct</span> <span class="n">hrtimer_sleeper</span> <span class="o">*</span><span class="n">to</span><span class="p">,</span> <span class="n">u32</span> <span class="n">bitset</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">futex_q</span> <span class="n">q</span> <span class="o">=</span> <span class="n">futex_q_init</span><span class="p">;</span>
    <span class="c1">// ...</span>
    <span class="n">ret</span> <span class="o">=</span> <span class="n">futex_wait_setup</span><span class="p">(</span><span class="n">uaddr</span><span class="p">,</span> <span class="n">val</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">q</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">current</span><span class="p">);</span>  <span class="cm">/* 入队等 */</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ret</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
    <span class="n">futex_do_wait</span><span class="p">(</span><span class="o">&amp;</span><span class="n">q</span><span class="p">,</span> <span class="n">to</span><span class="p">);</span>   <span class="cm">/* 在此 schedule，让出 CPU */</span>
    <span class="c1">// ...</span>
<span class="p">}</span>

<span class="c1">// futex_wake()（约 155–199 行）：查哈希桶、唤醒 nr_wake 个等待者</span>
<span class="kt">int</span> <span class="nf">futex_wake</span><span class="p">(</span><span class="n">u32</span> <span class="n">__user</span> <span class="o">*</span><span class="n">uaddr</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">flags</span><span class="p">,</span> <span class="kt">int</span> <span class="n">nr_wake</span><span class="p">,</span> <span class="n">u32</span> <span class="n">bitset</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// get_futex_key, futex_hash 得到 hb (hash bucket)</span>
    <span class="n">spin_lock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">hb</span><span class="o">-&gt;</span><span class="n">lock</span><span class="p">);</span>
    <span class="n">plist_for_each_entry_safe</span><span class="p">(</span><span class="n">this</span><span class="p">,</span> <span class="n">next</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">hb</span><span class="o">-&gt;</span><span class="n">chain</span><span class="p">,</span> <span class="n">list</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">futex_match</span><span class="p">(</span><span class="o">&amp;</span><span class="n">this</span><span class="o">-&gt;</span><span class="n">key</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">key</span><span class="p">))</span> <span class="p">{</span>
            <span class="n">this</span><span class="o">-&gt;</span><span class="n">wake</span><span class="p">(</span><span class="o">&amp;</span><span class="n">wake_q</span><span class="p">,</span> <span class="n">this</span><span class="p">);</span>
            <span class="k">if</span> <span class="p">(</span><span class="o">++</span><span class="n">ret</span> <span class="o">&gt;=</span> <span class="n">nr_wake</span><span class="p">)</span>
                <span class="k">break</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="n">spin_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">hb</span><span class="o">-&gt;</span><span class="n">lock</span><span class="p">);</span>
    <span class="n">wake_up_q</span><span class="p">(</span><span class="o">&amp;</span><span class="n">wake_q</span><span class="p">);</span>   <span class="cm">/* 真正唤醒等待线程 */</span>
    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>3. core.c 中的设计说明</strong>（<code class="language-plaintext highlighter-rouge">kernel/futex/core.c</code>）</p>

<p>文件头注释说明 futex 的由来（Rusty Russell 等）、「hashed waitqueues」等设计，与正文「内核管理等待队列」对应<sup id="fnref:3:5" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">4</a></sup>。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// kernel/futex/core.c 文件头（约 1–32 行）</span>
<span class="cm">/*
 *  Fast Userspace Mutexes (which I call "Futexes!").
 *  (C) Rusty Russell, IBM 2002
 *  ...
 *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly enough at me...
 */</span>
</code></pre></div></div>

<hr />

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>本博客 <a href="https://weinan.io/2026/03/01/why-language-speed-is-misleading.html">为什么「语言速度」是伪命题：I/O、并发、内存与内核</a> - §1.5 锁的误用与性能：细粒度锁、持锁时间、自旋与睡眠取舍 <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p><strong>定时中断与调度</strong>：时钟中断路径调用 <strong><code class="language-plaintext highlighter-rouge">update_process_times()</code></strong>（<code class="language-plaintext highlighter-rouge">kernel/time/timer.c</code>），其内调用 <strong><code class="language-plaintext highlighter-rouge">sched_tick()</code></strong>；<code class="language-plaintext highlighter-rouge">sched_tick()</code> 在 <strong><code class="language-plaintext highlighter-rouge">kernel/sched/core.c</code></strong> 中实现，注释写明 “gets called by the timer code, with HZ frequency”，内部执行 <code class="language-plaintext highlighter-rouge">update_rq_clock(rq)</code>、<code class="language-plaintext highlighter-rouge">donor-&gt;sched_class-&gt;task_tick(rq, donor, 0)</code> 及条件性的 <code class="language-plaintext highlighter-rouge">resched_curr(rq)</code>，从而在定时中断上下文中为抢占/时间片提供入口。Tick 入口见 <strong><code class="language-plaintext highlighter-rouge">kernel/time/tick-common.c</code></strong>（<code class="language-plaintext highlighter-rouge">tick_periodic</code>）与 <strong><code class="language-plaintext highlighter-rouge">kernel/time/tick-sched.c</code></strong>（<code class="language-plaintext highlighter-rouge">tick_nohz_handler</code> → <code class="language-plaintext highlighter-rouge">tick_sched_handle</code> → <code class="language-plaintext highlighter-rouge">update_process_times</code>）。<a href="https://elixir.bootlin.com/linux/latest/source/kernel/time/timer.c">Bootlin - timer.c</a>、<a href="https://elixir.bootlin.com/linux/latest/source/kernel/sched/core.c">Bootlin - core.c</a>（搜索 sched_tick） <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://lwn.net/Articles/360699/">A futex overview and update</a> - LWN，futex 概述与无竞争 fast path、有竞争时进内核 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>Linux 内核 <strong>kernel/futex/core.c</strong>（futex 设计与 hashed waitqueues）、<strong>kernel/futex/syscalls.c</strong>（<code class="language-plaintext highlighter-rouge">SYSCALL_DEFINE6(futex,...)</code>、<code class="language-plaintext highlighter-rouge">do_futex</code>）。<a href="https://elixir.bootlin.com/linux/latest/source/kernel/futex/core.c">Bootlin - core.c</a>、<a href="https://elixir.bootlin.com/linux/latest/source/kernel/futex/syscalls.c">Bootlin - syscalls.c</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:3:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:3:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:3:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Linux 内核 <strong>kernel/futex/syscalls.c</strong>（<code class="language-plaintext highlighter-rouge">do_futex</code> 中 <code class="language-plaintext highlighter-rouge">FUTEX_WAIT</code>→<code class="language-plaintext highlighter-rouge">futex_wait</code>、<code class="language-plaintext highlighter-rouge">FUTEX_WAKE</code>→<code class="language-plaintext highlighter-rouge">futex_wake</code>）、<strong>kernel/futex/waitwake.c</strong>（<code class="language-plaintext highlighter-rouge">futex_wait</code>、<code class="language-plaintext highlighter-rouge">__futex_wait</code>、<code class="language-plaintext highlighter-rouge">futex_wake</code>、入队与 <code class="language-plaintext highlighter-rouge">wake_up_q</code>）。<a href="https://elixir.bootlin.com/linux/latest/source/kernel/futex/waitwake.c">Bootlin - waitwake.c</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:4:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:4:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:4:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:4:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:4:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><strong>Intel® 64 and IA-32 Architectures Software Developer’s Manual</strong>：Vol 2A 中 <strong>LOCK</strong>（Instruction set reference）说明 LOCK 前缀可施加的指令及多核原子性；Vol 3A 第 8 章 <strong>Multiple-Processor Management</strong> 涉及 LOCK#、总线与缓存锁定及内存序。可查 <a href="https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html">Intel SDM 索引</a> 或 <a href="https://www.felixcloutier.com/x86/lock">felixcloutier x86 LOCK</a>。 <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:7:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:7:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p><strong>ARM</strong>：架构参考手册中 <strong>Load-Exclusive and Store-Exclusive</strong>（如 LDXR/STXR、LDAXR/STLXR）与 <strong>Synchronization and semaphores</strong> 说明独占监视器与原子 RMW。<a href="https://developer.arm.com/documentation/ddi0487/latest">ARM Architecture Reference Manual</a>。<strong>Linux 内核</strong>：<strong>Documentation/atomic_t.txt</strong> 描述 atomic RMW API 与 acquire/release 变种；<strong>Documentation/memory-barriers.txt</strong> 描述内存屏障与锁的配对。<a href="https://www.kernel.org/doc/html/latest/core-api/atomic_t.html">atomic_t.txt</a>、<a href="https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-barriers.html">memory-barriers.txt</a> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:8:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:8:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p><strong>自旋、睡眠与 sleep 时间</strong>：<strong><code class="language-plaintext highlighter-rouge">kernel/futex/waitwake.c</code></strong> 中 <strong><code class="language-plaintext highlighter-rouge">futex_do_wait()</code></strong> 在传入 timeout 时调用 <strong><code class="language-plaintext highlighter-rouge">hrtimer_sleeper_start_expires(timeout, HRTIMER_MODE_ABS)</code></strong> 启动高精度定时器，随后在 <code class="language-plaintext highlighter-rouge">plist_node_empty</code> 检查通过时调用 <strong><code class="language-plaintext highlighter-rouge">schedule()</code></strong> 让出 CPU；入队前通过 <strong><code class="language-plaintext highlighter-rouge">set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE)</code></strong> 将当前任务设为可中断睡眠（见同文件约 441、659 行及 341–360 行）。定时器到期由时钟/高精度 timer 中断路径触发回调，从而唤醒该 task。<a href="https://elixir.bootlin.com/linux/latest/source/kernel/futex/waitwake.c">Bootlin - waitwake.c</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:10:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:10:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><strong>kernel/futex/waitwake.c</strong> 文件头注释：waiter 读用户态 futex 值、调用 <code class="language-plaintext highlighter-rouge">futex_wait()</code> 后入队并 <code class="language-plaintext highlighter-rouge">schedule()</code>；waker 改用户态值后调用 <code class="language-plaintext highlighter-rouge">futex_wake()</code> 在哈希桶中查找并唤醒。说明了用户态「锁变量」与内核「等待队列」的协作。 <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://lwn.net/Articles/685769/">In pursuit of faster futexes</a> - LWN，futex 性能与竞争路径优化；<a href="https://docs.kernel.org/locking/robust-futexes.html">Robust futexes - The Linux Kernel documentation</a> - 健壮 futex 与进程退出时的清理 <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[从底层实现看，用户态（userspace）的锁机制，其核心的阻塞与唤醒功能，最终依赖于内核提供的同步原语。可以用一个比喻理解：用户态的锁像大楼里每个房间的门锁（轻便、快速），内核的同步则像大楼的主门与安防（全局、负责调度）。多数时候大家只用房间门锁（用户态原子操作或自旋），但当线程需要「离开大楼」或「被叫醒」时，必须经过主门——即通过系统调用进入内核。本文说明这一依赖关系、futex（Fast Userspace Mutex） 如何作为桥梁，并辅以 Linux 内核源码与参考文献；关于锁的误用如何导致性能问题，见本博客《为什么「语言速度」是伪命题》中的「锁的误用与性能」一节1。 本博客 为什么「语言速度」是伪命题：I/O、并发、内存与内核 - §1.5 锁的误用与性能：细粒度锁、持锁时间、自旋与睡眠取舍 &#8617;]]></summary></entry><entry><title type="html">栈为什么比堆快：从分配方式到「批发-零售」链条</title><link href="https://weinan.io/2026/03/01/stack-vs-heap-why-stack-faster.html" rel="alternate" type="text/html" title="栈为什么比堆快：从分配方式到「批发-零售」链条" /><published>2026-03-01T00:00:00+00:00</published><updated>2026-03-01T00:00:00+00:00</updated><id>https://weinan.io/2026/03/01/stack-vs-heap-why-stack-faster</id><content type="html" xml:base="https://weinan.io/2026/03/01/stack-vs-heap-why-stack-faster.html"><![CDATA[<p>在同一个进程内，栈和堆使用相同的内存硬件，访问速度本身没有区别。真正的性能差异来自内核在分配和管理内存时为两者采取的不同策略。本文从分配方式、物理内存管理、缓存友好性三个角度说明原因，并借 sbrk、Slab、malloc 梳理从内核到用户态的内存「批发-零售」链条；最后讨论「栈比堆快」这一经验法则的适用边界。</p>

<h2 id="1-内存分配方式">1. 内存分配方式</h2>

<h3 id="栈近乎零成本">栈：近乎零成本</h3>

<p>栈上分配只需修改<strong>栈指针寄存器</strong>。在 x86-64 上，函数序言用 <code class="language-plaintext highlighter-rouge">sub rsp, N</code> 预留空间（如 <code class="language-plaintext highlighter-rouge">sub rsp, 0x10</code> 即 16 字节），一条 CPU 指令、不涉及内核，成本极低<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>。需要澄清：<strong><code class="language-plaintext highlighter-rouge">sub rsp, N</code> 本身不会触发任何异常</strong>，只是寄存器算术；触发缺页的是<strong>后续对该新栈空间的首次访问</strong>（见下节）。</p>

<pre><code class="language-asm">; x86-64 函数序言示例：分配 0x20 字节栈帧
    push    rbp
    mov     rbp, rsp
    sub     rsp, 0x20
</code></pre>

<h3 id="堆系统调用的开销">堆：系统调用的开销</h3>

<p>通过 <code class="language-plaintext highlighter-rouge">malloc</code> 申请内存时，若分配器内部池子不足，会通过 <strong><code class="language-plaintext highlighter-rouge">brk</code></strong> 或 <strong><code class="language-plaintext highlighter-rouge">mmap</code></strong> 等<strong>系统调用</strong>向内核申请。用户态/内核态切换带来微秒级开销，相比一条 <code class="language-plaintext highlighter-rouge">sub rsp</code> 可高出数百倍甚至更多<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>。</p>

<h2 id="2-物理内存管理">2. 物理内存管理</h2>

<h3 id="栈缺页异常与按需映射">栈：缺页异常与按需映射</h3>

<p><strong>修改指针（sub rsp, N）：只是“账面上的分配”</strong>。执行 <code class="language-plaintext highlighter-rouge">sub rsp, 0x1000</code> 时，CPU 只做寄存器运算，内核对此一无所知。进程虚拟地址空间中这段新栈区在页表里尚未映射到物理页（或映射到只读零页），只是被“预留”出一个地址范围，成本就是一条指令。</p>

<p><strong>首次访问（例如 <code class="language-plaintext highlighter-rouge">mov [rsp-8], rax</code>）：才是真正的“物理分配”</strong>。当第一次使用这片新栈空间时：(1) CPU 尝试写入该虚拟地址；(2) MMU 查页表发现该页无有效物理页框，无法完成转换；(3) MMU 触发<strong>缺页异常</strong>（#PF，x86-64 上为中断 14），CPU 转去执行内核的缺页处理（如 <code class="language-plaintext highlighter-rouge">do_page_fault()</code>）；(4) 内核从 CR2 读出故障地址，检查是否在进程合法栈区内（如 <code class="language-plaintext highlighter-rouge">mm_struct</code> 的 <code class="language-plaintext highlighter-rouge">start_stack</code> 及 <code class="language-plaintext highlighter-rouge">ulimit -s</code> 限制），若合法则分配物理页、在页表中建立映射并标为可读写；(5) 返回用户态后，原指令重试，此时已有映射，写入成功。这一过程对开发者透明，且<strong>每个页只在首次触及该页时发生一次</strong>，即<strong>惰性分配（Lazy Allocation）</strong>：只为实际使用的栈页分配物理内存，若函数分配了大数组但从未访问，就不会占用物理页<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>。</p>

<p><strong>读与零页</strong>：若首次操作是<strong>读</strong>，内核可先将该虚拟页映射到全局只读的<strong>零页</strong>；只有后续发生<strong>写</strong>时才触发写时拷贝（COW），分配真正的物理页并清零，与匿名堆区的零页机制一致。</p>

<p><strong>主线程栈与线程栈</strong>：主线程栈可在合法范围内按需增长（访问新区域触发合法缺页即可）；通过 <code class="language-plaintext highlighter-rouge">pthread_create</code> 创建的线程，其栈通常在创建时用 <code class="language-plaintext highlighter-rouge">mmap</code> 一次性映射固定大小（如 8MB），虚拟范围固定，不会像主线程那样向低地址方向动态增长，访问未映射区域仍会触发缺页并分配物理页。</p>

<p>需要强调的是：<strong>在发生缺页的那一刻</strong>，栈和堆走的是同一条内核路径（#PF → 分配物理页 → 建立映射，必要时清零），单看这一次缺页本身，<strong>栈并不比堆快</strong>。栈的「快」体现在：分配虚拟空间无需系统调用（§1）；缺页通常只在首次触及该页时发生一次，成本被摊薄；一旦物理页已常驻，栈与堆的访问就是普通内存访问，没有差别。</p>

<p><strong>类比</strong>：<code class="language-plaintext highlighter-rouge">sub rsp, N</code> 像在借书卡（页表）上登记一个新书名（虚拟地址），只是记录；<strong>首次访问</strong>像第一次去书架上取书——管理员发现书（物理页）还在仓库，于是取书、上架、更新借书卡，你才能拿到；若该地址不在进程合法地址空间内，则相当于”查无此书”，会引发 SIGSEGV 等错误。</p>

<h3 id="内核视角栈与堆的本质区别是-vma-生命周期">内核视角：栈与堆的本质区别是 VMA 生命周期</h3>

<p>从内核角度看，<strong>并不区分”栈”与”堆”</strong>，只区分<strong>虚拟内存区域（VMA）的类型和生命周期</strong>。理解这一点是理解性能差异的关键。</p>

<h4 id="栈-vma进程级生命周期">栈 VMA：进程级生命周期</h4>

<p>栈在进程启动时由内核创建（<code class="language-plaintext highlighter-rouge">fs/exec.c:setup_arg_pages()</code>），设置 <strong><code class="language-plaintext highlighter-rouge">VM_GROWSDOWN</code></strong> 标志，表明这是一个”向下增长”的区域：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 fs/exec.c:778</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">setup_arg_pages</span><span class="p">(</span><span class="k">struct</span> <span class="n">linux_binprm</span> <span class="o">*</span><span class="n">bprm</span><span class="p">,</span> <span class="p">...)</span> <span class="p">{</span>
    <span class="n">vma</span> <span class="o">=</span> <span class="n">vm_area_alloc</span><span class="p">(</span><span class="n">mm</span><span class="p">);</span>
    <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_start</span> <span class="o">=</span> <span class="n">stack_top</span> <span class="o">-</span> <span class="n">STACK_TOP_MAX</span><span class="p">;</span>  <span class="c1">// 通常 8MB</span>
    <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_end</span> <span class="o">=</span> <span class="n">stack_top</span><span class="p">;</span>
    <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_flags</span> <span class="o">=</span> <span class="n">VM_STACK_FLAGS</span> <span class="o">|</span> <span class="n">VM_GROWSDOWN</span><span class="p">;</span>  <span class="c1">// 唯一特殊标志</span>
    <span class="n">insert_vm_struct</span><span class="p">(</span><span class="n">mm</span><span class="p">,</span> <span class="n">vma</span><span class="p">);</span>
    <span class="c1">// 关键：只创建 VMA，不分配物理页</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>关键点</strong>：</p>
<ol>
  <li>VMA 在<strong>进程启动时</strong>创建，<strong>进程退出时</strong>销毁（生命周期 = 进程）</li>
  <li><code class="language-plaintext highlighter-rouge">VM_GROWSDOWN</code> 只是一个标志位，告诉内核这个 VMA 可以向低地址扩展</li>
  <li>创建时<strong>不分配任何物理页</strong>，物理页在首次访问时按需分配</li>
  <li><strong>函数调用期间，VMA 始终存在</strong>——这就是为什么栈分配不需要系统调用</li>
</ol>

<h4 id="堆-vma两种生命周期">堆 VMA：两种生命周期</h4>

<p><strong>brk 堆</strong>（小块分配）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 mm/mmap.c:115</span>
<span class="n">SYSCALL_DEFINE1</span><span class="p">(</span><span class="n">brk</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span><span class="p">,</span> <span class="n">brk</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// 扩展堆顶，可能扩展已有 VMA 或创建新 VMA</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">do_brk_flags</span><span class="p">(</span><span class="o">&amp;</span><span class="n">vmi</span><span class="p">,</span> <span class="n">brkvma</span><span class="p">,</span> <span class="n">oldbrk</span><span class="p">,</span> <span class="n">newbrk</span> <span class="o">-</span> <span class="n">oldbrk</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>
        <span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
    <span class="n">mm</span><span class="o">-&gt;</span><span class="n">brk</span> <span class="o">=</span> <span class="n">brk</span><span class="p">;</span>  <span class="c1">// 更新堆顶指针</span>
    <span class="c1">// 关键：也只修改 VMA，不分配物理页（除非 VM_LOCKED）</span>
<span class="p">}</span>
</code></pre></div></div>

<ul>
  <li><strong>生命周期</strong>：首次 <code class="language-plaintext highlighter-rouge">brk()</code> 时创建，进程退出时销毁（类似栈）</li>
  <li><strong>无特殊标志</strong>：没有 <code class="language-plaintext highlighter-rouge">VM_GROWSDOWN</code>，但 VMA 同样持久</li>
</ul>

<p><strong>mmap 堆</strong>（大块分配，通常 ≥128KB）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 mm/mmap.c:337</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="nf">do_mmap</span><span class="p">(</span><span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="n">file</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">addr</span><span class="p">,</span> <span class="p">...)</span> <span class="p">{</span>
    <span class="n">vma</span> <span class="o">=</span> <span class="n">vm_area_alloc</span><span class="p">(</span><span class="n">mm</span><span class="p">);</span>
    <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_start</span> <span class="o">=</span> <span class="n">addr</span><span class="p">;</span>
    <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_end</span> <span class="o">=</span> <span class="n">addr</span> <span class="o">+</span> <span class="n">len</span><span class="p">;</span>
    <span class="n">vma_link</span><span class="p">(</span><span class="n">mm</span><span class="p">,</span> <span class="n">vma</span><span class="p">,</span> <span class="p">...);</span>  <span class="c1">// 插入红黑树</span>
    <span class="k">return</span> <span class="n">addr</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// munmap 销毁</span>
<span class="kt">int</span> <span class="nf">do_munmap</span><span class="p">(...)</span> <span class="p">{</span>
    <span class="n">unmap_page_range</span><span class="p">(</span><span class="n">vma</span><span class="p">,</span> <span class="p">...);</span>   <span class="c1">// 删除页表项</span>
    <span class="n">free_pgtables</span><span class="p">(...);</span>            <span class="c1">// 释放页表</span>
    <span class="n">remove_vma</span><span class="p">(</span><span class="n">vma</span><span class="p">);</span>               <span class="c1">// 删除 VMA</span>
<span class="p">}</span>
</code></pre></div></div>

<ul>
  <li><strong>生命周期</strong>：每次 <code class="language-plaintext highlighter-rouge">mmap()</code> 创建，每次 <code class="language-plaintext highlighter-rouge">munmap()</code> 销毁（临时性）</li>
  <li><strong>无特殊标志</strong>：普通匿名映射</li>
  <li><strong>关键差异</strong>：每次 <code class="language-plaintext highlighter-rouge">malloc</code>/<code class="language-plaintext highlighter-rouge">free</code> 大块时都要创建/销毁 VMA，触发系统调用</li>
</ul>

<h4 id="缺页处理栈与堆完全相同">缺页处理：栈与堆完全相同</h4>

<p>无论是栈、brk 堆还是 mmap 堆，首次访问时都走同一条缺页路径：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// mm/memory.c:5022</span>
<span class="k">static</span> <span class="n">vm_fault_t</span> <span class="nf">do_anonymous_page</span><span class="p">(</span><span class="k">struct</span> <span class="n">vm_fault</span> <span class="o">*</span><span class="n">vmf</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">folio</span> <span class="o">=</span> <span class="n">alloc_anon_folio</span><span class="p">(</span><span class="n">vmf</span><span class="p">);</span>        <span class="c1">// 分配物理页</span>
    <span class="n">__folio_mark_uptodate</span><span class="p">(</span><span class="n">folio</span><span class="p">);</span>         <span class="c1">// 清零</span>
    <span class="n">entry</span> <span class="o">=</span> <span class="n">folio_mk_pte</span><span class="p">(</span><span class="n">folio</span><span class="p">,</span> <span class="p">...);</span>
    <span class="n">set_ptes</span><span class="p">(</span><span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_mm</span><span class="p">,</span> <span class="n">addr</span><span class="p">,</span> <span class="p">...);</span>      <span class="c1">// 建立页表映射</span>
    <span class="c1">// 内核不关心这是栈还是堆！处理流程完全相同</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>结论</strong>：在缺页处理层面，栈和堆<strong>没有任何区别</strong>。单次缺页的成本相同（~20-50μs），都需要分配物理页、清零、建立页表。</p>

<h4 id="性能差异的真正来源">性能差异的真正来源</h4>

<table>
  <thead>
    <tr>
      <th>维度</th>
      <th>栈 VMA</th>
      <th>brk 堆 VMA</th>
      <th>mmap 堆 VMA</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>VMA 标志</strong></td>
      <td>VM_GROWSDOWN</td>
      <td>无</td>
      <td>无</td>
    </tr>
    <tr>
      <td><strong>生命周期</strong></td>
      <td>进程级别</td>
      <td>进程级别</td>
      <td>malloc/free 级别</td>
    </tr>
    <tr>
      <td><strong>创建/销毁</strong></td>
      <td>进程启动/退出</td>
      <td>首次 brk/进程退出</td>
      <td>每次 mmap/munmap</td>
    </tr>
    <tr>
      <td><strong>页表持久性</strong></td>
      <td>持久（扩展时保留）</td>
      <td>持久（扩展时保留）</td>
      <td>临时（munmap 删除）</td>
    </tr>
    <tr>
      <td><strong>缺页处理</strong></td>
      <td>do_anonymous_page</td>
      <td>do_anonymous_page</td>
      <td>do_anonymous_page</td>
    </tr>
    <tr>
      <td><strong>运行时系统调用</strong></td>
      <td>0 次</td>
      <td>0 次（扩展后）</td>
      <td>每次分配/释放 2 次</td>
    </tr>
  </tbody>
</table>

<p><strong>性能差异不是因为内核对栈和堆的”处理方式”不同</strong>，而是：</p>
<ol>
  <li><strong>VMA 生命周期不同</strong>：栈的 VMA 在进程启动时创建，持续到进程结束；mmap 堆的 VMA 每次 malloc/free 都要创建/销毁</li>
  <li><strong>系统调用频率不同</strong>：栈分配只需改栈指针（CPU 指令），mmap 堆每次都要 <code class="language-plaintext highlighter-rouge">mmap()/munmap()</code> 系统调用</li>
  <li><strong>页表持久性不同</strong>：栈扩展（<code class="language-plaintext highlighter-rouge">expand_stack_locked</code>）只修改 VMA 范围，页表映射保留；<code class="language-plaintext highlighter-rouge">munmap</code> 会删除页表，下次 <code class="language-plaintext highlighter-rouge">mmap</code> 必须重建</li>
</ol>

<h4 id="栈的只增不减特性与物理页缓存">栈的”只增不减”特性与物理页缓存</h4>

<p><strong>VMA 层面</strong>：内核没有 <code class="language-plaintext highlighter-rouge">shrink_stack</code> 函数，栈的虚拟地址范围（<code class="language-plaintext highlighter-rouge">vma-&gt;vm_start</code> - <code class="language-plaintext highlighter-rouge">vma-&gt;vm_end</code>）在进程运行期间<strong>只增不减</strong>，永远保持历史最大值：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// 深度递归扩展栈后
VMA: [0x7FEF7000, 0x7FFFFFFF]  // 16MB

// 递归返回，rsp 上移，但 VMA 不缩小
VMA: [0x7FEF7000, 0x7FFFFFFF]  // 仍是 16MB
</code></pre></div></div>

<p><strong>物理页层面</strong>：更关键的是，函数返回后<strong>物理页默认不释放</strong>，页表映射保持不变。这是栈性能的核心优势。需要区分两种场景：</p>

<p><strong>场景 1：持续访问新页</strong>（栈也会缺页）</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>深度递归访问新栈区域：
    第 1 层 → 访问虚拟页 A → 缺页 #1
    第 2 层 → 访问虚拟页 B（新页）→ 缺页 #2
    ...
    第 100 层 → 访问虚拟页 Z（新页）→ 缺页 #100

持续访问新页时，栈也会持续缺页
</code></pre></div></div>

<p><strong>场景 2：重复访问已访问页</strong>（栈的优势）</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>第 1 次深度递归（100 层）：
    触发缺页 × 100 → 分配 100 个物理页 → 建立页表映射
    递归返回 → rsp 上移 → 但页表映射保留
    成本：100 × 30μs = 3ms

第 2-1000 次相同深度递归（100 层）：
    rsp 下移到相同虚拟地址 → 页表已有映射 → 0 次缺页
    成本：0μs  ← 物理页”缓存”在页表中

对比 mmap 堆（相同大小的重复 malloc/free）：
    第 1 次：mmap() → 缺页 × 32 → munmap() 删除页表
    第 2 次：mmap() → 缺页 × 32 → munmap() 删除页表
    ...
    1000 次迭代：1000 × (32 × 30μs) = 960ms
</code></pre></div></div>

<p>实际应用中，大部分函数调用是<strong>相同深度的重复</strong>，因此栈表现出显著的性能优势。</p>

<p>内核允许用户态通过 <code class="language-plaintext highlighter-rouge">madvise(MADV_DONTNEED)</code> 显式释放栈的物理页（保留 VMA），但<strong>默认行为是保留</strong>以优化性能。进程退出时，<code class="language-plaintext highlighter-rouge">exit_mmap()</code> 才释放所有 VMA 和物理页。</p>

<table>
  <thead>
    <tr>
      <th>维度</th>
      <th>栈</th>
      <th>mmap 堆</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>VMA 创建</td>
      <td>进程启动 1 次</td>
      <td>每次 malloc</td>
    </tr>
    <tr>
      <td>物理页分配</td>
      <td>首次访问该页</td>
      <td>每次访问</td>
    </tr>
    <tr>
      <td>物理页释放</td>
      <td><strong>默认不释放</strong></td>
      <td><strong>每次 free 都释放</strong></td>
    </tr>
    <tr>
      <td>再次访问同一页</td>
      <td><strong>无缺页</strong>（页表复用）</td>
      <td><strong>重新缺页</strong>（页表已删除）</td>
    </tr>
    <tr>
      <td>访问新页</td>
      <td><strong>缺页</strong>（首次访问）</td>
      <td><strong>缺页</strong>（首次访问）</td>
    </tr>
  </tbody>
</table>

<p>这种”懒惰”策略（VMA 不缩小、物理页不释放）正是栈性能优势的根本来源：对于<strong>重复访问的栈区域</strong>，首次缺页后物理页常驻在页表中，避免反复的分配-释放-再分配循环；但访问新的更深栈区域时，栈也会缺页。实际应用中函数调用多是相同深度的重复，因此栈表现出显著优势。</p>

<h3 id="堆mmap-与安全清零">堆：mmap 与安全清零</h3>

<p>通过 <code class="language-plaintext highlighter-rouge">mmap</code> 获取匿名内存时，内核会保证进程看到的是「零填充」：要么在缺页时分配并清零，要么先映射到全局零页，写时再分配（copy-on-write），避免读到其他进程残留数据<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup><sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">5</a></sup>。Gorman《Understanding the Linux Virtual Memory Manager》Ch4 对用户态区段的描述<sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">5</a></sup>：</p>

<blockquote>
  <p>With a process, space is simply reserved in the linear address space by pointing a page table entry to a read-only globally visible page filled with zeros. On writing, a page fault is triggered which results in a new page being allocated, filled with zeros, placed in the page table entry and marked writable.</p>
</blockquote>

<p>无论哪种方式都会在首次写时产生分配/清零或 COW 开销。<code class="language-plaintext highlighter-rouge">malloc</code> 往往通过 <code class="language-plaintext highlighter-rouge">mmap</code> 或 <code class="language-plaintext highlighter-rouge">sbrk</code> 拿到大块后再在用户态切分、复用，以摊薄这类成本。</p>

<h2 id="3-缓存友好性">3. 缓存友好性</h2>

<h3 id="栈局部性更好">栈：局部性更好</h3>

<p>栈的访问模式是典型的 LIFO，当前活跃的局部变量多集中在栈顶附近，容易落在 CPU 的 L1/L2 缓存中，命中率高。</p>

<h3 id="堆访问模式更分散">堆：访问模式更分散</h3>

<p>堆上对象由程序显式管理，链表、树等结构容易在地址空间内分散，导致缓存行利用率低、更多访问主存。</p>

<hr />

<h2 id="4-从内核到用户态批发-零售链条">4. 从内核到用户态：「批发-零售」链条</h2>

<p>结合 <strong>sbrk</strong>、<strong>Slab</strong> 和 <strong>malloc</strong>，可以把内存分配看成一条从内核到 CPU 的链条；栈之所以「快」，是因为它处在链条末端，几乎不经中间层。</p>

<h3 id="41-一级批发内核-buddy伙伴系统">4.1 一级批发：内核 Buddy（伙伴系统）</h3>

<p>物理内存以<strong>页</strong>（通常 4KB）为最小单位管理，由伙伴系统负责分配和回收：按 2^order 页块管理，不足时分裂大块、释放时与伙伴合并。粒度较粗，不适合直接满足「几十字节」的小请求<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">6</a></sup><sup id="fnref:9:2" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">5</a></sup>。</p>

<h3 id="42-二级批发内核-slab">4.2 二级批发：内核 Slab</h3>

<p><strong>Slab 分配器</strong>从伙伴系统拿到整页，再切成固定大小的对象并缓存，主要服务内核自身（如 <code class="language-plaintext highlighter-rouge">task_struct</code>、<code class="language-plaintext highlighter-rouge">inode</code> 等）。对象用完后可留在 Slab 中复用，减少对伙伴系统的调用，并缓解内碎片、提高缓存利用率<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">6</a></sup><sup id="fnref:9:3" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">5</a></sup>。</p>

<h3 id="43-用户态代理malloc-与-sbrk">4.3 用户态代理：malloc 与 sbrk</h3>

<p>用户程序通过 <strong><code class="language-plaintext highlighter-rouge">malloc</code></strong> 获取堆内存。当内部池不足时，<code class="language-plaintext highlighter-rouge">malloc</code> 会调用 <strong><code class="language-plaintext highlighter-rouge">sbrk</code></strong> 或 <strong><code class="language-plaintext highlighter-rouge">mmap</code></strong>：</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">sbrk</code></strong> 调整 program break，向内核「圈」出一块新的虚拟地址空间，本身是一次系统调用，成本较高；内核用 <code class="language-plaintext highlighter-rouge">mm-&gt;brk</code> 与 VMA 管理堆顶<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">7</a></sup><sup id="fnref:9:4" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">5</a></sup>。</li>
  <li><strong><code class="language-plaintext highlighter-rouge">malloc</code></strong> 把拿到的大块在用户态切分、合并、复用，承担「零售」角色，带来管理开销和可能的碎片。内存池、arena 等做法正是通过减少对 <code class="language-plaintext highlighter-rouge">brk</code>/<code class="language-plaintext highlighter-rouge">mmap</code> 的调用次数来降低与内核的交互成本；从系统视角看，这与「用户态与内核态壁垒」、减少系统调用的思路一致，可参见本博客<a href="https://weinan.io/2026/03/01/why-language-speed-is-misleading.html">《为什么「语言速度」是伪命题》</a><sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">8</a></sup>。</li>
</ul>

<h3 id="44-栈无中间商的自家后院">4.4 栈：无中间商的「自家后院」</h3>

<p>栈不经过上述任何一层：分配就是改栈指针，无需系统调用；物理页在首次访问时按需分配（§2），LIFO 访问模式又利于缓存。因此处在链条最末端，面向 CPU，成本最低。</p>

<h3 id="45-开销大致顺序从慢到快">4.5 开销大致顺序（从慢到快）</h3>

<table>
  <thead>
    <tr>
      <th>层级</th>
      <th>机制</th>
      <th>特点</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>最慢</td>
      <td>系统调用（sbrk/mmap）</td>
      <td>用户态/内核态切换，微秒级</td>
    </tr>
    <tr>
      <td>中等</td>
      <td>用户态堆管理（malloc/free）</td>
      <td>无模式切换，但有锁与查找</td>
    </tr>
    <tr>
      <td>较快</td>
      <td>内核 Slab（kmem_cache_alloc）</td>
      <td>内核内复用，无系统调用</td>
    </tr>
    <tr>
      <td>最快</td>
      <td>栈指针调整（sub rsp）</td>
      <td>纯用户态指令，纳秒级</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="5-栈比堆快的边界">5. 「栈比堆快」的边界</h2>

<p>单纯比较「栈和堆谁快」容易误导，因为两者不在同一维度：栈更多是「使用已就绪内存」，堆还涉及「获取」和「管理」。</p>

<h3 id="51-分配模式才是关键">5.1 分配模式才是关键</h3>

<p>若<strong>事先在堆上分配好一块内存，再反复读写</strong>，其访问速度与栈上同规模数据可以非常接近——此时差异主要在「分配方式」，而非「存储介质」。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 栈：分配 + 使用</span>
<span class="kt">void</span> <span class="nf">stack_func</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">arr</span><span class="p">[</span><span class="mi">1000</span><span class="p">];</span>   <span class="c1">// 分配：改栈指针</span>
    <span class="n">arr</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">42</span><span class="p">;</span>     <span class="c1">// 使用：普通内存访问</span>
<span class="p">}</span>

<span class="c1">// 堆：一次性分配，反复使用</span>
<span class="k">static</span> <span class="kt">int</span> <span class="o">*</span><span class="n">heap_arr</span><span class="p">;</span>

<span class="kt">void</span> <span class="nf">heap_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">heap_arr</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">1000</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">));</span>  <span class="c1">// 仅此一次有系统调用/分配器开销</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">heap_func</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">heap_arr</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">42</span><span class="p">;</span>   <span class="c1">// 使用：与栈上访问同属「已就绪内存」</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="52-堆可以模拟栈的分配模式">5.2 堆可以模拟栈的分配模式</h3>

<p>Arena、pool 等分配器本质是在堆上<strong>模拟栈</strong>：一次性向系统要一大块，用指针顺序分配，最后整体释放。在这种模式下，堆上的「分配」成本可以接近栈。</p>

<h3 id="53-值得关注的维度">5.3 值得关注的维度</h3>

<table>
  <thead>
    <tr>
      <th>维度</th>
      <th>栈</th>
      <th>堆</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>分配速度</td>
      <td>固定、极快</td>
      <td>视是否命中缓存、是否触发系统调用而定</td>
    </tr>
    <tr>
      <td>可预测性</td>
      <td>高</td>
      <td>可能受碎片、锁竞争影响</td>
    </tr>
    <tr>
      <td>适用场景</td>
      <td>小数据、生命周期与调用栈一致</td>
      <td>大数据、生命周期动态</td>
    </tr>
  </tbody>
</table>

<p>栈的「快」是用<strong>约束</strong>换来的：大小有限、生命周期必须 LIFO。堆的灵活则伴随分配与管理开销。工程上更值得关心的是：在给定场景下，应优先用栈、对象池还是堆。</p>

<h3 id="54-缺页路径上栈与堆等价但缺页频率不同">5.4 缺页路径上栈与堆等价，但缺页频率不同</h3>

<p>若只比较「第一次访问某页、触发缺页」的那条路径，栈和堆没有区别：都是 #PF → 内核分配物理页 → 映射（堆上匿名区还可能多一步清零或 COW）。因此<strong>在单次缺页场景下，栈并不比堆快</strong>（单次成本都是 ~20-50μs）。</p>

<p>「栈比堆快」指的是：</p>
<ol>
  <li><strong>分配虚拟空间的成本</strong>：栈几乎为零（改栈指针），堆可能涉及系统调用</li>
  <li><strong>缺页发生频率</strong>（典型场景）：栈访问新页时也会缺页，但实际应用中多是相同深度的重复调用，物理页默认不释放、页表映射持久保留，因此重复访问 0 次缺页；mmap 堆每次 <code class="language-plaintext highlighter-rouge">munmap</code> 删除页表，相同大小的重复分配每次都要重建页表并重新缺页</li>
  <li><strong>物理页”缓存”机制</strong>：栈在首次访问某深度后，该范围内的物理页常驻页表（除非显式释放）；堆每次 malloc/free 都要释放物理页并删除页表</li>
</ol>

<p>用数字说明典型差异：1000 次相同深度的栈调用可能只触发 1 次缺页（首次访问该深度），而 1000 次相同大小的 mmap 堆分配会触发 1000 次缺页循环。但若持续访问更深的栈区域（新页），栈也会持续缺页。</p>

<h3 id="55-用户态申请堆内存是否一定触发缺页">5.5 用户态申请堆内存是否一定触发缺页？</h3>

<p><strong>不一定。</strong> 内核源码可以验证两点：</p>

<ol>
  <li>
    <p><strong>默认情况</strong>：用户态通过 <code class="language-plaintext highlighter-rouge">brk</code>/<code class="language-plaintext highlighter-rouge">sbrk</code> 或 <code class="language-plaintext highlighter-rouge">mmap(MAP_ANONYMOUS)</code>「申请」堆内存时，内核<strong>只建立或扩展 VMA</strong>（虚拟区间），并不立刻分配物理页。<code class="language-plaintext highlighter-rouge">mm/vma.c</code> 中的 <strong><code class="language-plaintext highlighter-rouge">do_brk_flags()</code></strong> 仅做 <code class="language-plaintext highlighter-rouge">vm_area_alloc</code>、设置区间与 flags、挂入红黑树，没有任何 <code class="language-plaintext highlighter-rouge">alloc_pages</code> 或 <code class="language-plaintext highlighter-rouge">mm_populate</code>。因此物理页要等到<strong>首次访问</strong>该区间时由缺页处理程序分配，那时才会触发一次 #PF。</p>
  </li>
  <li>
    <p><strong>会预填页、从而首次访问不触发缺页的情况</strong>：</p>
    <ul>
      <li><strong><code class="language-plaintext highlighter-rouge">mmap(..., MAP_POPULATE)</code></strong>：<code class="language-plaintext highlighter-rouge">mm/mmap.c</code> 里 <code class="language-plaintext highlighter-rouge">do_mmap</code> 在成功建立映射后，若 flags 含 <code class="language-plaintext highlighter-rouge">MAP_POPULATE</code>（且非 <code class="language-plaintext highlighter-rouge">MAP_NONBLOCK</code>），会设置 <code class="language-plaintext highlighter-rouge">*populate = len</code>，返回用户态前由 <code class="language-plaintext highlighter-rouge">mm_populate(ret, populate)</code> 在内核里把页 fault in，所以用户第一次访问时页已在，不会 #PF。</li>
      <li><strong>扩展 brk 且进程曾 <code class="language-plaintext highlighter-rouge">mlockall</code>（<code class="language-plaintext highlighter-rouge">mm-&gt;def_flags &amp; VM_LOCKED</code>）</strong>：<code class="language-plaintext highlighter-rouge">mm/mmap.c</code> 中 <strong><code class="language-plaintext highlighter-rouge">SYSCALL_DEFINE1(brk, ...)</code></strong> 在 <code class="language-plaintext highlighter-rouge">do_brk_flags()</code> 成功后若 <code class="language-plaintext highlighter-rouge">mm-&gt;def_flags &amp; VM_LOCKED</code>，会调用 <code class="language-plaintext highlighter-rouge">mm_populate(oldbrk, newbrk - oldbrk)</code>，在 brk 返回前就预填新堆区间的页，用户首次访问同样不会触发缺页。</li>
    </ul>
  </li>
</ol>

<p>因此：<strong>「申请」堆内存本身通常不触发缺页；缺页发生在首次访问新区间时。</strong> 只有在使用 <code class="language-plaintext highlighter-rouge">MAP_POPULATE</code> 或 <code class="language-plaintext highlighter-rouge">VM_LOCKED</code> 时，内核会在申请路径上预填页，此时首次访问不再触发缺页（代价是 brk/mmap 变慢、可能失败）。</p>

<h3 id="56-实验验证栈增长模式对比">5.6 实验验证：栈增长模式对比</h3>

<p>为验证「持续访问新栈页会持续缺页」这一关键观察，在 <a href="https://github.com/liweinan/stack-vs-heap-benchmark">stack-vs-heap-benchmark</a> 项目中实现了对比实验（<code class="language-plaintext highlighter-rouge">src/stack_growth_comparison.c</code>），测试两种栈使用模式的缺页行为。</p>

<p><strong>实验配置</strong>：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 关键：使用 -O0 编译，禁用优化以确保真实的栈分配</span>
<span class="n">gcc</span> <span class="o">-</span><span class="n">O0</span> <span class="o">-</span><span class="n">Wall</span> <span class="o">-</span><span class="n">Wextra</span> <span class="o">-</span><span class="n">g</span> <span class="o">-</span><span class="n">o</span> <span class="n">stack_growth_comparison</span> <span class="n">src</span><span class="o">/</span><span class="n">stack_growth_comparison</span><span class="p">.</span><span class="n">c</span>

<span class="cp">#define PAGES_PER_CALL 4  // 每次调用占用 4 页（16KB）
#define ITERATIONS 100
</span>
<span class="c1">// 场景 1：固定深度重复调用（页表复用）</span>
<span class="kt">void</span> <span class="nf">fixed_depth_call</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="n">buffer</span><span class="p">[</span><span class="mi">16384</span><span class="p">];</span>  <span class="c1">// 4 页</span>
    <span class="c1">// 访问每个页的首尾字节，确保触发缺页...</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">100</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">fixed_depth_call</span><span class="p">();</span>  <span class="c1">// 每次调用相同栈位置</span>
<span class="p">}</span>

<span class="c1">// 场景 2：持续增长递归深度（持续缺页）</span>
<span class="kt">void</span> <span class="nf">growing_depth_call</span><span class="p">(</span><span class="kt">int</span> <span class="n">depth</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="n">buffer</span><span class="p">[</span><span class="mi">16384</span><span class="p">];</span>  <span class="c1">// 每层 4 页</span>
    <span class="c1">// 访问每个页...</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">depth</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="n">growing_depth_call</span><span class="p">(</span><span class="n">depth</span> <span class="o">-</span> <span class="mi">1</span><span class="p">);</span>  <span class="c1">// 递归到更深</span>
<span class="p">}</span>
<span class="n">growing_depth_call</span><span class="p">(</span><span class="mi">100</span><span class="p">);</span>  <span class="c1">// 100 层递归</span>
</code></pre></div></div>

<p><strong>实验结果</strong>（在 Docker Alpine Linux 环境中，使用 perf 统计）：</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>perf <span class="nb">stat</span> <span class="nt">-e</span> page-faults ./stack_growth_comparison

<span class="o">===</span> 场景 1: 固定深度重复调用 <span class="o">===</span>
配置: 100 次调用，每次 4 页（16KB）
预期: 第 1 次缺页 4 次，后续 99 次无缺页（页表保留）
执行时间: 0.012 ms
平均每次: 118 ns

<span class="o">===</span> 场景 2: 持续增长递归深度 <span class="o">===</span>
配置: 100 层递归，每层 4 页（16KB）
预期: 持续缺页 400 次（每层访问新页）
执行时间: 0.272 ms        ← 慢 23 倍！
平均每层: 2715 ns

Performance counter stats <span class="k">for</span> <span class="s1">'./stack_growth_comparison'</span>:

               424      page-faults    ← 接近预期 400 次（100 层 × 4 页）

       0.000999083 seconds <span class="nb">time </span>elapsed
</code></pre></div></div>

<p><strong>关键发现</strong>：</p>

<table>
  <thead>
    <tr>
      <th>场景</th>
      <th>缺页次数</th>
      <th>执行时间</th>
      <th>平均每次</th>
      <th>差异倍数</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>场景 1（固定深度）</td>
      <td>~4 次</td>
      <td>0.012 ms</td>
      <td>118 ns</td>
      <td>基准</td>
    </tr>
    <tr>
      <td>场景 2（持续增长）</td>
      <td>~400 次</td>
      <td>0.272 ms</td>
      <td>2715 ns</td>
      <td><strong>23 倍</strong></td>
    </tr>
  </tbody>
</table>

<p><strong>实验验证的核心观察</strong>：</p>

<ol>
  <li>
    <p>✅ <strong>持续访问新栈页会持续缺页</strong>：场景 2 产生 424 次缺页，接近理论预期 400 次（100 层 × 4 页/层）。多出的 ~24 次来自程序启动、库初始化及场景 1 的栈分配。</p>
  </li>
  <li>
    <p>✅ <strong>重复访问已访问区域几乎不缺页</strong>：场景 1 重复调用 100 次相同深度函数，仅首次触发约 4 次缺页，后续 99 次调用 0 次缺页。</p>
  </li>
  <li>
    <p>✅ <strong>性能差异显著</strong>：持续缺页（场景 2）比页表复用（场景 1）慢 <strong>23 倍</strong>（0.272 ms vs 0.012 ms），平均每次 2715 ns vs 118 ns。</p>
  </li>
  <li>
    <p>✅ <strong>实测单次缺页成本</strong>：从时间差计算，单次缺页成本约 (2715 - 118) ns ≈ <strong>2.6 μs</strong>，低于内核文档中提到的理论值 20-50 μs，得益于现代内核的优化（TLB 缓存、页预取、批量操作等）。</p>
  </li>
</ol>

<p><strong>结论</strong>：这个实验<strong>完美验证</strong>了「栈的快不是因为永远不缺页」这一关键观察：</p>
<ul>
  <li>访问新栈页时，栈也会持续缺页（如深度递归）</li>
  <li>栈的真正优势在于<strong>页表持久性</strong>：重复访问的区域，页表映射保留，避免像 mmap 堆那样每次 <code class="language-plaintext highlighter-rouge">munmap</code> 删除、<code class="language-plaintext highlighter-rouge">mmap</code> 重建</li>
  <li>实际应用中多是相同深度的重复调用（如场景 1），因此栈表现”快”；若应用场景是深度递归（如场景 2），栈的缺页行为与分配相同大小的堆区别不大，性能优势主要体现在无需系统调用（VMA 持久）</li>
</ul>

<hr />

<h2 id="总结">总结</h2>

<ol>
  <li><strong>同一进程内，栈和堆的「访问」速度无本质差别</strong>；差异主要来自<strong>分配方式</strong>与<strong>物理页的建立方式</strong>（栈按需缺页，堆常伴随清零或 COW）。</li>
  <li><strong>内核不区分”栈”与”堆”，只区分 VMA 的类型和生命周期</strong>：栈的特殊性仅是 <code class="language-plaintext highlighter-rouge">VM_GROWSDOWN</code> 标志；真正的性能差异来自 VMA 生命周期——栈 VMA 在进程启动时创建、进程退出时销毁（0 次运行时系统调用），mmap 堆 VMA 每次 malloc/free 都要创建/销毁（频繁系统调用）。</li>
  <li><strong>栈的”只增不减”与物理页缓存机制</strong>：VMA 范围在运行期间只增不减（没有 <code class="language-plaintext highlighter-rouge">shrink_stack</code>），更关键的是<strong>物理页默认不释放</strong>——函数返回后页表映射保持不变，这是性能核心：相同栈深度的重复访问首次缺页后，物理页”缓存”在页表中，后续访问 0 次缺页（但持续访问新的更深栈区域仍会缺页）；mmap 堆每次 <code class="language-plaintext highlighter-rouge">munmap</code> 删除页表，相同大小的重复分配每次都要重建页表并重新缺页。<strong>实验验证</strong>（§5.6）：固定深度重复调用（100 次）vs 持续增长递归（100 层），缺页次数 ~4 vs ~400，性能差异 23 倍（0.012 ms vs 0.272 ms）。</li>
  <li><strong>在缺页发生的那一刻</strong>，栈与堆走同一条内核路径（<code class="language-plaintext highlighter-rouge">do_anonymous_page</code>），单次成本相同（~20-50μs）；<strong>栈的快</strong>体现在：分配虚拟空间零成本（改栈指针）、VMA 持久（无系统调用）、页表持久（<code class="language-plaintext highlighter-rouge">expand_stack</code> 保留映射，避免反复缺页）、LIFO 带来的缓存局部性。</li>
  <li>从内核 Buddy → Slab → sbrk/mmap → malloc 到栈，是一条「批发-零售」链；栈在末端、无中间层，分配成本最低。</li>
  <li><strong>「栈比堆快」</strong>是有用的经验法则，但不是普适真理；工程上更值得关心的是「为什么快」和「在什么情况下快」，再按场景选择栈、池或堆。从选型与系统视角看，「谁快」往往不是唯一维度，I/O、并发与内存同内核的交互方式同样关键，可参见本博客<a href="https://weinan.io/2026/03/01/why-language-speed-is-misleading.html">《为什么「语言速度」是伪命题》</a><sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">8</a></sup>。</li>
</ol>

<h2 id="扩展阅读">扩展阅读</h2>

<h3 id="intel-sdm-vol3a-第-6-章">Intel SDM Vol.3A 第 6 章<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></h3>

<p>§6.14.2「64-Bit Mode Stack Frame」原文：</p>

<blockquote>
  <p>In IA-32e mode, the RSP is aligned to a 16-byte boundary before pushing the stack frame. The stack frame itself is aligned on a 16-byte boundary when the interrupt handler is called.</p>
</blockquote>

<p>§6.15「Exception and Interrupt Reference」中 <strong>Interrupt 14—Page-Fault Exception (#PF)</strong>：Exception Class 为 <strong>Fault</strong>；P=0、权限/写/保留位等触发。SDM 原文：</p>

<blockquote>
  <p>The exception handler can recover from page-not-present conditions and restart the program or task without any loss of program continuity.</p>
</blockquote>

<h3 id="mel-gormanunderstanding-the-linux-virtual-memory-manager">Mel Gorman《Understanding the Linux Virtual Memory Manager》<sup id="fnref:9:5" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">5</a></sup></h3>

<ul>
  <li><strong>Ch4 Process Address Space</strong>：<code class="language-plaintext highlighter-rouge">mm_struct</code> 中堆与栈的字段（见下内核代码）；用户态零页与写时缺页见正文 §2 引用。</li>
  <li><strong>Ch6 Physical Page Allocation</strong>：Binary Buddy、<code class="language-plaintext highlighter-rouge">free_area_t</code>（Gorman 书为 2.4/2.6 的 <code class="language-plaintext highlighter-rouge">free_list</code>+<code class="language-plaintext highlighter-rouge">map</code>）、order 分裂/合并。</li>
  <li><strong>Ch8 Slab Allocator</strong>：三目标（硬件缓存、对象缓存、内碎片）、slab coloring、<code class="language-plaintext highlighter-rouge">kmem_cache_alloc</code>、slabs_full/partial/free、per-CPU 缓存。</li>
</ul>

<h3 id="linux-内核源码代码片段与文件说明">Linux 内核源码（代码片段与文件说明）</h3>

<p><strong>1. 栈 VMA 的创建：setup_arg_pages</strong>（<code class="language-plaintext highlighter-rouge">fs/exec.c</code>）</p>

<p>进程启动时创建栈 VMA，设置 <code class="language-plaintext highlighter-rouge">VM_GROWSDOWN</code> 标志，生命周期 = 进程。关键：只创建 VMA，不分配物理页。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 fs/exec.c:778</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">setup_arg_pages</span><span class="p">(</span><span class="k">struct</span> <span class="n">linux_binprm</span> <span class="o">*</span><span class="n">bprm</span><span class="p">,</span> <span class="p">...)</span> <span class="p">{</span>
    <span class="n">vma</span> <span class="o">=</span> <span class="n">vm_area_alloc</span><span class="p">(</span><span class="n">mm</span><span class="p">);</span>
    <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_start</span> <span class="o">=</span> <span class="n">stack_top</span> <span class="o">-</span> <span class="n">STACK_TOP_MAX</span><span class="p">;</span>  <span class="c1">// 通常 8MB</span>
    <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_end</span> <span class="o">=</span> <span class="n">stack_top</span><span class="p">;</span>
    <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_flags</span> <span class="o">=</span> <span class="n">VM_STACK_FLAGS</span> <span class="o">|</span> <span class="n">VM_GROWSDOWN</span><span class="p">;</span>  <span class="c1">// 栈的唯一特殊标志</span>
    <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_page_prot</span> <span class="o">=</span> <span class="n">vm_get_page_prot</span><span class="p">(</span><span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_flags</span><span class="p">);</span>
    <span class="n">insert_vm_struct</span><span class="p">(</span><span class="n">mm</span><span class="p">,</span> <span class="n">vma</span><span class="p">);</span>
    <span class="n">mm</span><span class="o">-&gt;</span><span class="n">stack_vm</span> <span class="o">+=</span> <span class="n">vma_pages</span><span class="p">(</span><span class="n">vma</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>2. 栈扩展：expand_stack_locked</strong>（<code class="language-plaintext highlighter-rouge">mm/mmap.c</code>）</p>

<p>栈向下增长时只修改 VMA 范围，<strong>不删除页表</strong>，物理页映射保留。这是栈分配快的关键：页表持久，避免反复缺页。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 mm/mmap.c:961</span>
<span class="kt">int</span> <span class="nf">expand_stack_locked</span><span class="p">(</span><span class="k">struct</span> <span class="n">vm_area_struct</span> <span class="o">*</span><span class="n">vma</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">address</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_flags</span> <span class="o">&amp;</span> <span class="n">VM_GROWSDOWN</span><span class="p">))</span>
        <span class="k">return</span> <span class="o">-</span><span class="n">EFAULT</span><span class="p">;</span>  <span class="c1">// 检查是否是栈 VMA</span>

    <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_start</span> <span class="o">=</span> <span class="n">address</span><span class="p">;</span>  <span class="c1">// 只修改 VMA 起始地址</span>
    <span class="n">mm</span><span class="o">-&gt;</span><span class="n">stack_vm</span> <span class="o">+=</span> <span class="n">grow</span><span class="p">;</span>
    <span class="c1">// 关键：不删除页表！已分配的物理页映射保留</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>3. 缺页处理：do_anonymous_page</strong>（<code class="language-plaintext highlighter-rouge">mm/memory.c</code>）</p>

<p>栈、brk 堆、mmap 堆首次访问时都调用此函数，处理流程<strong>完全相同</strong>。单次缺页成本相同（~20-50μs），差异在于缺页<strong>频率</strong>。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 mm/memory.c:5022</span>
<span class="k">static</span> <span class="n">vm_fault_t</span> <span class="nf">do_anonymous_page</span><span class="p">(</span><span class="k">struct</span> <span class="n">vm_fault</span> <span class="o">*</span><span class="n">vmf</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">folio</span> <span class="o">=</span> <span class="n">alloc_anon_folio</span><span class="p">(</span><span class="n">vmf</span><span class="p">);</span>        <span class="c1">// 分配物理页（栈、堆相同）</span>
    <span class="n">__folio_mark_uptodate</span><span class="p">(</span><span class="n">folio</span><span class="p">);</span>         <span class="c1">// 清零（栈、堆相同）</span>
    <span class="n">entry</span> <span class="o">=</span> <span class="n">folio_mk_pte</span><span class="p">(</span><span class="n">folio</span><span class="p">,</span> <span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_page_prot</span><span class="p">);</span>
    <span class="n">set_ptes</span><span class="p">(</span><span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_mm</span><span class="p">,</span> <span class="n">addr</span><span class="p">,</span> <span class="n">vmf</span><span class="o">-&gt;</span><span class="n">pte</span><span class="p">,</span> <span class="n">entry</span><span class="p">,</span> <span class="n">nr_pages</span><span class="p">);</span>  <span class="c1">// 建立页表</span>
    <span class="n">add_mm_counter</span><span class="p">(</span><span class="n">vma</span><span class="o">-&gt;</span><span class="n">vm_mm</span><span class="p">,</span> <span class="n">MM_ANONPAGES</span><span class="p">,</span> <span class="n">nr_pages</span><span class="p">);</span>
    <span class="c1">// 内核不关心这是栈还是堆！</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>4. 进程地址空间：堆与栈的起止</strong>（<code class="language-plaintext highlighter-rouge">include/linux/mm_types.h</code>）</p>

<p><code class="language-plaintext highlighter-rouge">mm_struct</code> 中描述堆与栈的字段；<code class="language-plaintext highlighter-rouge">sys_brk</code> 通过 <code class="language-plaintext highlighter-rouge">mm-&gt;brk</code>、<code class="language-plaintext highlighter-rouge">mm-&gt;start_brk</code> 管理堆顶<sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">7</a></sup><sup id="fnref:9:6" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">5</a></sup>。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 include/linux/mm_types.h（约 1100 行起）</span>
<span class="k">struct</span> <span class="n">mm_struct</span> <span class="p">{</span>
    <span class="c1">// ...</span>
    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">start_code</span><span class="p">,</span> <span class="n">end_code</span><span class="p">,</span> <span class="n">start_data</span><span class="p">,</span> <span class="n">end_data</span><span class="p">;</span>
    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">start_brk</span><span class="p">,</span> <span class="n">brk</span><span class="p">,</span> <span class="n">start_stack</span><span class="p">;</span>   <span class="cm">/* 堆起止、栈底 */</span>
    <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">arg_start</span><span class="p">,</span> <span class="n">arg_end</span><span class="p">,</span> <span class="n">env_start</span><span class="p">,</span> <span class="n">env_end</span><span class="p">;</span>
    <span class="c1">// ...</span>
<span class="p">};</span>
</code></pre></div></div>

<p><strong>2. Buddy：zone 与 free_area</strong>（<code class="language-plaintext highlighter-rouge">include/linux/mmzone.h</code>、<code class="language-plaintext highlighter-rouge">mm/page_alloc.c</code>）</p>

<p>每 zone 有 <code class="language-plaintext highlighter-rouge">free_area[NR_PAGE_ORDERS]</code>，按 2^order 页块管理；分配入口为 <code class="language-plaintext highlighter-rouge">__alloc_pages()</code><sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">9</a></sup>。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 include/linux/mmzone.h（约 133 行）</span>
<span class="k">struct</span> <span class="n">free_area</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">list_head</span> <span class="n">free_list</span><span class="p">[</span><span class="n">MIGRATE_TYPES</span><span class="p">];</span>
    <span class="kt">unsigned</span> <span class="kt">long</span>    <span class="n">nr_free</span><span class="p">;</span>
<span class="p">};</span>

<span class="c1">// 每个 zone 含（同文件约 980 行）：</span>
<span class="c1">// struct free_area free_area[NR_PAGE_ORDERS];</span>
</code></pre></div></div>

<p><strong>3. sys_brk 系统调用</strong>（<code class="language-plaintext highlighter-rouge">mm/mmap.c</code>）</p>

<p>用户态 <code class="language-plaintext highlighter-rouge">brk</code>/<code class="language-plaintext highlighter-rouge">sbrk</code> 的内核入口；通过 <code class="language-plaintext highlighter-rouge">mm-&gt;brk</code>、<code class="language-plaintext highlighter-rouge">mm-&gt;start_brk</code> 与 VMA 扩展堆。<strong>默认只调 <code class="language-plaintext highlighter-rouge">do_brk_flags()</code> 扩展 VMA，不分配物理页</strong>；仅当 <code class="language-plaintext highlighter-rouge">mm-&gt;def_flags &amp; VM_LOCKED</code>（如进程曾 <code class="language-plaintext highlighter-rouge">mlockall</code>）时才在返回前调用 <code class="language-plaintext highlighter-rouge">mm_populate(oldbrk, newbrk - oldbrk)</code> 预填页，此时用户首次访问新区间不会触发缺页<sup id="fnref:8:2" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">7</a></sup>。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 mm/mmap.c（约 115 行起）</span>
<span class="n">SYSCALL_DEFINE1</span><span class="p">(</span><span class="n">brk</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span><span class="p">,</span> <span class="n">brk</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">mm_struct</span> <span class="o">*</span><span class="n">mm</span> <span class="o">=</span> <span class="n">current</span><span class="o">-&gt;</span><span class="n">mm</span><span class="p">;</span>
    <span class="n">bool</span> <span class="n">populate</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
    <span class="c1">// ...</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">do_brk_flags</span><span class="p">(</span><span class="o">&amp;</span><span class="n">vmi</span><span class="p">,</span> <span class="n">brkvma</span><span class="p">,</span> <span class="n">oldbrk</span><span class="p">,</span> <span class="n">newbrk</span> <span class="o">-</span> <span class="n">oldbrk</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>
        <span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
    <span class="n">mm</span><span class="o">-&gt;</span><span class="n">brk</span> <span class="o">=</span> <span class="n">brk</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">mm</span><span class="o">-&gt;</span><span class="n">def_flags</span> <span class="o">&amp;</span> <span class="n">VM_LOCKED</span><span class="p">)</span>
        <span class="n">populate</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="nl">success:</span>
    <span class="n">mmap_write_unlock</span><span class="p">(</span><span class="n">mm</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">populate</span><span class="p">)</span>
        <span class="n">mm_populate</span><span class="p">(</span><span class="n">oldbrk</span><span class="p">,</span> <span class="n">newbrk</span> <span class="o">-</span> <span class="n">oldbrk</span><span class="p">);</span>   <span class="cm">/* 仅 VM_LOCKED 时预填页 */</span>
    <span class="k">return</span> <span class="n">brk</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>4. do_brk_flags 只建 VMA</strong>（<code class="language-plaintext highlighter-rouge">mm/vma.c</code>）</p>

<p>扩展堆时仅创建/扩展匿名 VMA，不分配物理页；物理页在首次访问时由缺页处理分配。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 mm/vma.c（约 2714 行）— do_brk_flags 仅做 VMA 分配与合并，无 alloc_pages/mm_populate</span>
<span class="kt">int</span> <span class="nf">do_brk_flags</span><span class="p">(</span><span class="k">struct</span> <span class="n">vma_iterator</span> <span class="o">*</span><span class="n">vmi</span><span class="p">,</span> <span class="k">struct</span> <span class="n">vm_area_struct</span> <span class="o">*</span><span class="n">vma</span><span class="p">,</span>
                 <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">addr</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">len</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">flags</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// ... may_expand_vm, security_vm_enough_memory_mm ...</span>
    <span class="n">vma</span> <span class="o">=</span> <span class="n">vm_area_alloc</span><span class="p">(</span><span class="n">mm</span><span class="p">);</span>   <span class="cm">/* 只分配 VMA 结构 */</span>
    <span class="n">vma_set_anonymous</span><span class="p">(</span><span class="n">vma</span><span class="p">);</span>
    <span class="n">vma_set_range</span><span class="p">(</span><span class="n">vma</span><span class="p">,</span> <span class="n">addr</span><span class="p">,</span> <span class="n">addr</span> <span class="o">+</span> <span class="n">len</span><span class="p">,</span> <span class="p">...);</span>
    <span class="n">vm_flags_init</span><span class="p">(</span><span class="n">vma</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
    <span class="c1">// ... vma_iter_store_gfp, vma_link ... 无 mm_populate</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>5. Slab 分配接口</strong>（<code class="language-plaintext highlighter-rouge">mm/slub.c</code>）</p>

<p>当前默认 Slab 实现；<code class="language-plaintext highlighter-rouge">kmem_cache_alloc</code> 从指定 cache 取对象（如 <code class="language-plaintext highlighter-rouge">task_struct</code>、<code class="language-plaintext highlighter-rouge">vm_area_struct</code> 等）<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">10</a></sup>。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 mm/slub.c（约 4202 行）</span>
<span class="kt">void</span> <span class="o">*</span><span class="nf">kmem_cache_alloc_noprof</span><span class="p">(</span><span class="k">struct</span> <span class="n">kmem_cache</span> <span class="o">*</span><span class="n">s</span><span class="p">,</span> <span class="n">gfp_t</span> <span class="n">gfpflags</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">ret</span> <span class="o">=</span> <span class="n">slab_alloc_node</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">gfpflags</span><span class="p">,</span> <span class="n">NUMA_NO_NODE</span><span class="p">,</span> <span class="n">_RET_IP_</span><span class="p">,</span>
                                <span class="n">s</span><span class="o">-&gt;</span><span class="n">object_size</span><span class="p">);</span>
    <span class="n">trace_kmem_cache_alloc</span><span class="p">(</span><span class="n">_RET_IP_</span><span class="p">,</span> <span class="n">ret</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">gfpflags</span><span class="p">,</span> <span class="n">NUMA_NO_NODE</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">EXPORT_SYMBOL</span><span class="p">(</span><span class="n">kmem_cache_alloc_noprof</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>6. mmap 与 MAP_POPULATE</strong>（<code class="language-plaintext highlighter-rouge">mm/mmap.c</code>）</p>

<p>默认 <code class="language-plaintext highlighter-rouge">mmap(MAP_ANONYMOUS)</code> 只建立 VMA，不预填页；若带 <strong><code class="language-plaintext highlighter-rouge">MAP_POPULATE</code></strong>，<code class="language-plaintext highlighter-rouge">do_mmap</code> 成功后会设 <code class="language-plaintext highlighter-rouge">*populate = len</code>（约 562–565 行：<code class="language-plaintext highlighter-rouge">(flags &amp; (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE</code>），返回前在 <code class="language-plaintext highlighter-rouge">vm_mmap_pgoff</code> 里调 <code class="language-plaintext highlighter-rouge">mm_populate(ret, populate)</code>，在内核内把页 fault in，用户首次访问不再触发缺页。</p>

<p>本文引用已用 pdftotext 与本地 kernel 源码校对。</p>

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://www.sra.uni-hannover.de/Lehre/SS25/V_BSB/doc/x86-abi.html">System V ABI - AMD64 - Register and Stack Layout</a> - x86-64 调用约定与栈布局（RSP、red zone、16 字节对齐） <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://www.intel.com/content/www/us/en/content-details/868146/intel-64-and-ia-32-architectures-software-developer-s-manual-volume-3a-system-programming-guide-part-1.html">Intel® 64 and IA-32 Architectures Software Developer’s Manual, Vol. 3A</a> - 第 6 章 Interrupt and Exception Handling、§6.14.2/§6.15 #PF <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://man7.org/linux/man-pages/man2/mmap.2.html">mmap(2) - Linux manual page</a> - mmap 系统调用；<a href="https://man7.org/linux/man-pages/man2/brk.2.html">brk(2)</a> - 堆顶与 sbrk/brk <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://stackoverflow.com/questions/34042915/what-is-the-purpose-of-map-anonymous-flag-in-mmap-system-call">What is the purpose of MAP_ANONYMOUS in mmap?</a> - 匿名映射与零填充语义；匿名区采用 demand paging，读时映射零页或分配并清零，写时 COW/分配 <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p>Mel Gorman, <strong>Understanding the Linux® Virtual Memory Manager</strong>。<a href="https://www.kernel.org/doc/gorman/pdf/understand.pdf">kernel.org PDF</a>、<a href="https://www.kernel.org/doc/gorman/html/understand/">HTML 目录</a>。Ch4/6/8 见扩展阅读 <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:9:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:9:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:9:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:9:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:9:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://www.kernel.org/doc/html/latest/mm/slab.html">Memory Management - The Linux Kernel documentation</a> - Slab 分配器；<a href="https://www.kernel.org/doc/gorman/html/understand/understand025.html">Understanding the Linux Virtual Memory Manager - Slab 附录</a> - Buddy 与 Slab 概述 <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p>Linux 内核 <strong>mm/mmap.c</strong>（<code class="language-plaintext highlighter-rouge">SYSCALL_DEFINE1(brk,...)</code>、<code class="language-plaintext highlighter-rouge">mm-&gt;brk</code>/<code class="language-plaintext highlighter-rouge">mm-&gt;start_brk</code>、<code class="language-plaintext highlighter-rouge">expand_stack_locked</code>）、<strong>fs/exec.c</strong>（<code class="language-plaintext highlighter-rouge">setup_arg_pages</code> 创建栈 VMA）、<strong>mm/memory.c</strong>（<code class="language-plaintext highlighter-rouge">do_anonymous_page</code> 缺页处理）。<a href="https://elixir.bootlin.com/linux/latest/source/mm/mmap.c">Bootlin - mmap.c</a>、<a href="https://elixir.bootlin.com/linux/latest/source/fs/exec.c">exec.c</a>、<a href="https://elixir.bootlin.com/linux/latest/source/mm/memory.c">memory.c</a> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:8:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p>本博客 <a href="https://weinan.io/2026/03/01/why-language-speed-is-misleading.html">为什么「语言速度」是伪命题：I/O、并发、内存与内核</a> - 系统调用成本、内存池与 I/O 对实际性能的影响 <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p>Linux 内核 <strong>mm/page_alloc.c</strong>（<code class="language-plaintext highlighter-rouge">__alloc_pages</code>、<code class="language-plaintext highlighter-rouge">zone-&gt;free_area</code>）、<strong>include/linux/mmzone.h</strong>（<code class="language-plaintext highlighter-rouge">struct free_area</code>）。<a href="https://elixir.bootlin.com/linux/latest/source/mm/page_alloc.c">Bootlin - page_alloc.c</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p>Linux 内核 <strong>mm/slub.c</strong>（<code class="language-plaintext highlighter-rouge">kmem_cache_alloc</code>）、<strong>mm/slab.c</strong>、<strong>include/linux/sched.h</strong>。<a href="https://elixir.bootlin.com/linux/latest/source/mm/slub.c">Bootlin - slub.c</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[在同一个进程内，栈和堆使用相同的内存硬件，访问速度本身没有区别。真正的性能差异来自内核在分配和管理内存时为两者采取的不同策略。本文从分配方式、物理内存管理、缓存友好性三个角度说明原因，并借 sbrk、Slab、malloc 梳理从内核到用户态的内存「批发-零售」链条；最后讨论「栈比堆快」这一经验法则的适用边界。]]></summary></entry><entry><title type="html">为什么「语言速度」是伪命题：I/O、并发、内存与内核</title><link href="https://weinan.io/2026/03/01/why-language-speed-is-misleading.html" rel="alternate" type="text/html" title="为什么「语言速度」是伪命题：I/O、并发、内存与内核" /><published>2026-03-01T00:00:00+00:00</published><updated>2026-03-01T00:00:00+00:00</updated><id>https://weinan.io/2026/03/01/why-language-speed-is-misleading</id><content type="html" xml:base="https://weinan.io/2026/03/01/why-language-speed-is-misleading.html"><![CDATA[<p>在现代环境中，单纯比较语言的“执行速度”远远不够。一方面，<strong>现代 CPU 执行指令已经极快</strong>，各语言在“单纯执行同一条指令”的层面差异很小（纳秒级），难以成为系统瓶颈。另一方面，就像在拥挤的城市街道上比较两辆赛车的极速，意义有限——真正决定系统表现的是 <strong>I/O 如何被处理、并发如何利用多核、内存如何与内核交互</strong>，以及运行时与生态的取舍。本文从技术内因（I/O、并发、内存与系统调用）、运行时成本（VM 与 AOT）以及非技术因素三方面梳理，并辅以 Linux 内核与用户态代码示例。</p>

<h2 id="1-为什么语言速度是伪命题技术内因">1. 为什么「语言速度」是伪命题？（技术内因）</h2>

<h3 id="11-io-是天花板">1.1 I/O 是天花板</h3>

<p>绝大多数时间 CPU 在等 I/O：网络往返或磁盘读写是毫秒级，而一条加法指令是纳秒级，语言层面的“谁更快”会被 I/O 等待完全淹没。<strong>真正的差异在于：语言/框架如何做 I/O</strong>——阻塞还是非阻塞？是否用好操作系统提供的异步接口（如 <strong>epoll</strong>、<strong>io_uring</strong>）？</p>

<p><strong>epoll</strong>：一次系统调用可监听大量 fd，就绪时再处理，避免“每个连接问一次”的轮询。Linux 内核实现见 <strong><code class="language-plaintext highlighter-rouge">fs/eventpoll.c</code></strong>，入口为 <code class="language-plaintext highlighter-rouge">epoll_create1</code>、<code class="language-plaintext highlighter-rouge">epoll_ctl</code>、<code class="language-plaintext highlighter-rouge">epoll_wait</code> 等系统调用<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">2</a></sup>。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 用户态：epoll 一次 wait 可返回多个就绪 fd，减少 syscall 次数</span>
<span class="kt">int</span> <span class="n">epfd</span> <span class="o">=</span> <span class="n">epoll_create1</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="k">struct</span> <span class="n">epoll_event</span> <span class="n">ev</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">events</span> <span class="o">=</span> <span class="n">EPOLLIN</span><span class="p">,</span> <span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">fd</span> <span class="o">=</span> <span class="n">sockfd</span> <span class="p">};</span>
<span class="n">epoll_ctl</span><span class="p">(</span><span class="n">epfd</span><span class="p">,</span> <span class="n">EPOLL_CTL_ADD</span><span class="p">,</span> <span class="n">sockfd</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">ev</span><span class="p">);</span>

<span class="cp">#define MAX_EVENTS 64
</span><span class="k">struct</span> <span class="n">epoll_event</span> <span class="n">events</span><span class="p">[</span><span class="n">MAX_EVENTS</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">n</span> <span class="o">=</span> <span class="n">epoll_wait</span><span class="p">(</span><span class="n">epfd</span><span class="p">,</span> <span class="n">events</span><span class="p">,</span> <span class="n">MAX_EVENTS</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">);</span>  <span class="cm">/* 一次 syscall，多 fd */</span>
    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
        <span class="n">handle</span><span class="p">(</span><span class="n">events</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">data</span><span class="p">.</span><span class="n">fd</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>io_uring</strong>：更现代的异步 I/O 接口，提交与完成通过共享 ring buffer 与内核交互，进一步减少系统调用与拷贝。内核实现见 <strong><code class="language-plaintext highlighter-rouge">io_uring/io_uring.c</code></strong>，如 <code class="language-plaintext highlighter-rouge">SYSCALL_DEFINE2(io_uring_setup, ...)</code><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup><sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">4</a></sup>。</p>

<h3 id="12-并发模型与多核利用">1.2 并发模型与多核利用</h3>

<p>多核时代，<strong>并发模型</strong>决定能多“轻松”地压榨多核与掩盖 I/O 等待：</p>

<ul>
  <li><strong>Go</strong>：<strong>Goroutine</strong> 是极轻量的并发单位（栈起小、调度在用户态），便于写出高并发程序，从而更好利用多核并应对 I/O。</li>
  <li><strong>Java</strong>：<strong>虚拟线程</strong>（Project Loom）意在解决“每请求一线程”带来的内存与上下文切换成本。</li>
</ul>

<p>差别不在“单线程谁更快”，而在于<strong>能否用低成本抽象把并发写出来</strong>。</p>

<h3 id="13-内存管理与内核的博弈">1.3 内存管理与内核的博弈</h3>

<p>语言如何从内核要内存、何时释放，对延迟和常驻内存影响很大：</p>

<ul>
  <li><strong>有 GC 的语言</strong>（Java、Go）：向内核申请大块堆，自行管理。优点是开发效率高，缺点包括：<strong>Stop-The-World（STW）</strong>——GC 时暂停所有业务线程，导致延迟尖刺，对延迟敏感场景（如游戏、实时系统）是实打实的问题<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">5</a></sup>；回收不及时或长期不把内存还给 OS 会导致常驻内存偏高，与内核的交互也变得不可预测。</li>
  <li><strong>无 GC 的语言</strong>（Rust、C++）：可精细控制何时释放回 OS。例如 glibc 下可用 <strong><code class="language-plaintext highlighter-rouge">malloc_trim(0)</code></strong> 把空闲页归还内核，降低进程 RSS；Rust 的所有权在编译期约束生命周期，减少运行时开销<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">6</a></sup>。</li>
</ul>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 释放堆上未用内存回内核，降低 RSS（glibc）</span>
<span class="cp">#include</span> <span class="cpf">&lt;malloc.h&gt;</span><span class="cp">
</span><span class="kt">void</span> <span class="nf">release_unused_heap</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">malloc_trim</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>   <span class="cm">/* 将 free list 中的空闲页归还内核 */</span>
<span class="p">}</span>
</code></pre></div></div>

<p>内核侧：用户态堆扩展通过 <strong><code class="language-plaintext highlighter-rouge">brk</code></strong>/ <strong><code class="language-plaintext highlighter-rouge">mmap</code></strong> 与 VMA 管理，物理页按需分配（缺页时再给）。本博客在<a href="https://weinan.io/2026/03/01/stack-vs-heap-why-stack-faster.html">《栈为什么比堆快》</a>中已有梳理<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">7</a></sup>。</p>

<h3 id="14-用户态与内核态的壁垒">1.4 用户态与内核态的壁垒</h3>

<p>每次<strong>系统调用</strong>都是一次模式切换，成本远高于用户态几条指令。因此：</p>

<ul>
  <li><strong>内存池</strong>：在用户态维护一块已申请的内存，反复复用，减少频繁 <code class="language-plaintext highlighter-rouge">brk</code>/<code class="language-plaintext highlighter-rouge">mmap</code>。这本质上是在减少「从内核到用户态」的申请次数，与本博客<a href="https://weinan.io/2026/03/01/stack-vs-heap-why-stack-faster.html">《栈为什么比堆快》</a>里说的「批发-零售」链条一致：少向内核要、多在用户态复用，摊薄单次分配成本<sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">7</a></sup>。</li>
  <li><strong>批量 I/O</strong>：如 epoll 一次 <code class="language-plaintext highlighter-rouge">epoll_wait</code> 返回多个就绪 fd；io_uring 一次 submit 可提交多个 I/O。</li>
</ul>

<p>语言“跑得快”若伴随大量 syscall，实际表现可能反而不如“跑得慢一点但少进内核”的实现。</p>

<h3 id="15-锁的误用与性能">1.5 锁的误用与性能</h3>

<p><strong>错误地使用锁</strong>（粗粒度锁、持锁做慢操作、锁竞争）同样是导致代码性能差的核心原因，与语言本身关系不大。一把大锁包住整段逻辑会把多核压成“串行执行”；在持锁期间做 I/O 或复杂计算会极大拉长其他线程的等待时间，造成延迟尖刺与吞吐下降。内核与用户态都依赖<strong>细粒度锁</strong>（只锁最小临界区）、<strong>缩短持锁时间</strong>（持锁内不做 I/O）、以及合理选择锁类型（自旋与睡眠的取舍）来降低竞争<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">8</a></sup>。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 反例：一把大锁包住查找 + 处理，持锁期间可能做 I/O，多线程被串行化</span>
<span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">global_lock</span><span class="p">);</span>
<span class="n">item</span> <span class="o">=</span> <span class="n">lookup</span><span class="p">(</span><span class="n">key</span><span class="p">);</span>           <span class="cm">/* 临界区内做查找 */</span>
<span class="n">process</span><span class="p">(</span><span class="n">item</span><span class="p">);</span>                <span class="cm">/* 若 process() 含网络/磁盘 I/O，其他线程长时间阻塞 */</span>
<span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">global_lock</span><span class="p">);</span>

<span class="c1">// 正例：细粒度锁，持锁只做最小临界区（查表 + 取引用），慢操作在锁外</span>
<span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">bucket_lock</span><span class="p">[</span><span class="n">key</span> <span class="o">%</span> <span class="n">NBUCKET</span><span class="p">]);</span>
<span class="n">item</span> <span class="o">=</span> <span class="n">lookup_in_bucket</span><span class="p">(</span><span class="n">key</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">item</span><span class="p">)</span> <span class="n">ref_inc</span><span class="p">(</span><span class="n">item</span><span class="p">);</span>
<span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">bucket_lock</span><span class="p">[</span><span class="n">key</span> <span class="o">%</span> <span class="n">NBUCKET</span><span class="p">]);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">item</span><span class="p">)</span> <span class="n">process</span><span class="p">(</span><span class="n">item</span><span class="p">);</span>      <span class="cm">/* I/O 与重逻辑在锁外，不阻塞其他桶 */</span>
</code></pre></div></div>

<p>可参见：Linux 内核 <a href="https://docs.kernel.org/locking/mutex-design.html">Generic Mutex Subsystem</a>（mutex 设计、自旋与睡眠的取舍）、<a href="https://lwn.net/Articles/314512/">LWN — mutex: implement adaptive spinning</a>（竞争下的自适应行为），以及 <a href="https://www.intel.com/content/www/us/en/docs/advisor/user-guide/2025-0/reduce-lock-contention-001.html">Intel Advisor — Reduce Lock Contention</a>（用户态锁竞争分析与优化思路）<sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">8</a></sup>。用户态锁的阻塞与唤醒如何依赖内核（futex），见本博客<a href="https://weinan.io/2026/03/02/userspace-locks-and-kernel-futex.html">《用户态锁与内核：谁在管理「等待」与 futex》</a><sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">9</a></sup>。</p>

<hr />

<h2 id="2-运行时的隐藏成本vm-与-aot">2. 运行时的“隐藏成本”：VM 与 AOT</h2>

<ul>
  <li><strong>有 VM 的语言</strong>（Java、C#、Erlang）：带来跨平台和 JIT 等优化，但冷启动慢、VM 自身占内存，在 Serverless 或短生命周期任务中可能成为瓶颈。</li>
  <li><strong>AOT 编译、无传统 VM</strong>（Go、Rust、C++）：直接生成二进制，启动快、内存占用小。Go 的运行时（GC、调度）是链接进二进制的一部分，而非独立 VM。</li>
</ul>

<p>因此“谁更快”还要看<strong>启动与常驻成本</strong>是否在你的场景里被放大。</p>

<p><strong>语言有适用场景</strong>，某些场景下某类语言根本不可用。例如<strong>带 VM 的语言</strong>（Java、C# 等）无法用于<strong>内核开发</strong>：内核是跑在裸机上的第一层软件，没有“操作系统”为其提供进程、虚拟内存或系统调用；VM 依赖的运行时、GC、线程调度等都假设已有内核，内核自身不能依赖这些。因此内核必须用 C、Rust（no_std）等无传统 VM、可直接控制内存与硬件的语言。反之，内核、嵌入式、实时系统等会排除 VM 语言；企业后端、CRUD、大数据等则常首选 VM 语言以换取生态与开发效率。本博客<a href="https://weinan.io/2026/02/26/kernel-c-cpp-rust-runtime-stdlib.html">《内核开发中的语言选择：C、C++ 与 Rust》</a>对内核场景下各语言的约束有专门讨论<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">10</a></sup>。</p>

<hr />

<h2 id="3-非技术因素的一票否决权">3. 非技术因素的“一票否决权”</h2>

<p>在工程选型中，非技术因素往往权重更高：</p>

<ul>
  <li><strong>市场与招聘</strong>：企业级后端仍以 Java/C# 为主流，Rust 等虽优但人力与梯队成本高。</li>
  <li><strong>生态与投资</strong>：大厂与社区投入决定库的成熟度；“开箱即用”的组件是否覆盖你的业务，比单语言性能更关键。</li>
  <li><strong>历史债务</strong>：很多系统沿用 Java/PHP 等，只因存量代码如此。除非有颠覆性收益，否则“稳定可用”常优于“换语言重构”。</li>
</ul>

<hr />

<h2 id="总结">总结</h2>

<p>选语言不是在选“谁跑得快”，而是在选<strong>谁的运行时哲学和生态，最匹配你的业务场景和团队能力</strong>。</p>

<ul>
  <li><strong>技术收益</strong>：在 I/O 密集或 CPU 密集场景下，能否通过并发模型和内存控制，把硬件与内核的潜力发挥出来。</li>
  <li><strong>业务成本</strong>：招聘难度、开发效率、生态成熟度与长期维护的可控性。</li>
</ul>

<p><strong>语言速度只是众多维度之一；I/O、并发、内存与内核的交互方式，以及 VM/AOT 与生态，往往更能决定实际表现与可维护性。</strong> 反过来看：<strong>一门语言在某个领域取得成功，一定是因为它解决了该领域的实际需求</strong>（性能、生态、开发效率、团队能力等），而不是技术品味或“谁更优雅”的问题。</p>

<hr />

<h2 id="扩展阅读内核与接口">扩展阅读（内核与接口）</h2>

<ul>
  <li><strong>epoll</strong>：Linux 内核 <strong><code class="language-plaintext highlighter-rouge">fs/eventpoll.c</code></strong>，<code class="language-plaintext highlighter-rouge">epoll_create1</code>、<code class="language-plaintext highlighter-rouge">epoll_ctl</code>、<code class="language-plaintext highlighter-rouge">epoll_wait</code> 等<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>。一次 <code class="language-plaintext highlighter-rouge">epoll_wait</code> 可返回多个就绪 fd，减少系统调用次数。</li>
  <li><strong>io_uring</strong>：<strong><code class="language-plaintext highlighter-rouge">io_uring/io_uring.c</code></strong>，<code class="language-plaintext highlighter-rouge">io_uring_setup</code>、提交与完成队列；适合高 IOPS、低 syscall 场景<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup>。</li>
  <li><strong>用户态堆与内核</strong>：<code class="language-plaintext highlighter-rouge">brk</code>/<code class="language-plaintext highlighter-rouge">mmap</code>、VMA、缺页与零页见本博客<a href="https://weinan.io/2026/03/01/stack-vs-heap-why-stack-faster.html">《栈为什么比堆快》</a><sup id="fnref:4:2" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">7</a></sup>。内核 <code class="language-plaintext highlighter-rouge">mm/mmap.c</code>（<code class="language-plaintext highlighter-rouge">sys_brk</code>）、<code class="language-plaintext highlighter-rouge">mm/vma.c</code>（<code class="language-plaintext highlighter-rouge">do_brk_flags</code>）。</li>
  <li><strong>锁与性能</strong>：细粒度锁、持锁时间最小化、自旋与睡眠取舍见内核 <a href="https://docs.kernel.org/locking/mutex-design.html">mutex-design</a>、LWN mutex 自适应自旋<sup id="fnref:9:2" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">8</a></sup>，以及 Intel Advisor 锁竞争分析。用户态锁如何依赖内核（futex）见本博客<a href="https://weinan.io/2026/03/02/userspace-locks-and-kernel-futex.html">《用户态锁与内核》</a><sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">9</a></sup>。</li>
</ul>

<hr />

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Linux 内核 <strong>fs/eventpoll.c</strong>：epoll 实现，<code class="language-plaintext highlighter-rouge">SYSCALL_DEFINE1(epoll_create1,...)</code>、<code class="language-plaintext highlighter-rouge">epoll_ctl</code>、<code class="language-plaintext highlighter-rouge">epoll_wait</code> 等。<a href="https://elixir.bootlin.com/linux/latest/source/fs/eventpoll.c">Bootlin - eventpoll.c</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://man7.org/linux/man-pages/man7/epoll.7.html">epoll(7)</a> - Linux 手册：epoll 概述与 API <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>Linux 内核 <strong>io_uring/io_uring.c</strong>：io_uring 实现，<code class="language-plaintext highlighter-rouge">SYSCALL_DEFINE2(io_uring_setup,...)</code> 等。<a href="https://elixir.bootlin.com/linux/latest/source/io_uring/io_uring.c">Bootlin - io_uring.c</a>、<a href="https://kernel.dk/io_uring.pdf">io_uring 文档</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://kernel.dk/io_uring.pdf">Efficient IO with io_uring</a> - Jens Axboe，io_uring 设计说明（PDF） <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p><strong>Stop-The-World（STW）</strong>：GC 暂停所有应用线程以独占堆访问，导致延迟尖刺。<a href="https://docs.oracle.com/en/java/javase/21/gctuning/introduction-garbage-collection-tuning.html">Oracle Java GC Tuning - Introduction</a> 介绍各 GC 与停顿；<a href="https://go.dev/doc/gc-guide">A Guide to the Go Garbage Collector</a> 说明 Go 的并发 GC 与 STW 阶段 <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://man7.org/linux/man-pages/man3/malloc_trim.3.html">malloc_trim(3)</a> - 将 free 列表中的空闲页归还内核 <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>本博客 <a href="https://weinan.io/2026/03/01/stack-vs-heap-why-stack-faster.html">栈为什么比堆快：从分配方式到「批发-零售」链条</a> - brk/mmap、VMA、缺页与零页 <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:4:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p><strong>锁与性能</strong>：粗粒度锁与持锁做 I/O 会串行化多线程并拉高延迟。<a href="https://docs.kernel.org/locking/mutex-design.html">Generic Mutex Subsystem — The Linux Kernel documentation</a> 介绍内核 mutex 设计与自旋/睡眠取舍；<a href="https://lwn.net/Articles/314512/">LWN — mutex: implement adaptive spinning</a> 讨论竞争下的自适应自旋；<a href="https://www.intel.com/content/www/us/en/docs/advisor/user-guide/2025-0/reduce-lock-contention-001.html">Intel Advisor — Reduce Lock Contention</a> 提供用户态锁竞争分析与优化思路 <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:9:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p>本博客 <a href="https://weinan.io/2026/03/02/userspace-locks-and-kernel-futex.html">用户态锁与内核：谁在管理「等待」与 futex</a> - futex 无竞争 fast path、有竞争时进内核阻塞/唤醒，及内核代码说明 <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p>本博客 <a href="https://weinan.io/2026/02/26/kernel-c-cpp-rust-runtime-stdlib.html">内核开发中的语言选择：C、C++ 与 Rust 的运行时与标准库</a> - 内核为何不能用 VM、C++/Rust 的约束与取舍 <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[在现代环境中，单纯比较语言的“执行速度”远远不够。一方面，现代 CPU 执行指令已经极快，各语言在“单纯执行同一条指令”的层面差异很小（纳秒级），难以成为系统瓶颈。另一方面，就像在拥挤的城市街道上比较两辆赛车的极速，意义有限——真正决定系统表现的是 I/O 如何被处理、并发如何利用多核、内存如何与内核交互，以及运行时与生态的取舍。本文从技术内因（I/O、并发、内存与系统调用）、运行时成本（VM 与 AOT）以及非技术因素三方面梳理，并辅以 Linux 内核与用户态代码示例。]]></summary></entry><entry><title type="html">内核开发中的语言选择：C、C++ 与 Rust 的运行时与标准库</title><link href="https://weinan.io/2026/02/26/kernel-c-cpp-rust-runtime-stdlib.html" rel="alternate" type="text/html" title="内核开发中的语言选择：C、C++ 与 Rust 的运行时与标准库" /><published>2026-02-26T00:00:00+00:00</published><updated>2026-02-26T00:00:00+00:00</updated><id>https://weinan.io/2026/02/26/kernel-c-cpp-rust-runtime-stdlib</id><content type="html" xml:base="https://weinan.io/2026/02/26/kernel-c-cpp-rust-runtime-stdlib.html"><![CDATA[<p>操作系统内核开发与应用程序开发的核心区别之一，在于运行时与内存管理模型的约束。本文从运行时大小、内存管理模型和标准库依赖三个方面，分析 C、C++、Rust 在内核开发中的差异。</p>

<h2 id="运行时大小问题">运行时大小问题</h2>

<h3 id="c-运行时">C 运行时</h3>
<p>C 的运行时几乎可以忽略不计：</p>
<ul>
  <li><strong>最小运行时</strong>：C 语言被设计为「接近硬件」，运行时仅提供最基本的启动代码（crt0）和库函数</li>
  <li><strong>可控性</strong>：内核开发者可以完全避免使用标准库，直接使用系统调用和硬件指令</li>
  <li><strong>典型例子</strong>：Linux 内核几乎完全用 C 编写，运行时开销极小<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">1</a></sup></li>
</ul>

<h3 id="c-运行时-1">C++ 运行时</h3>
<p>C++ 的运行时较大，原因是：</p>
<ul>
  <li><strong>异常处理</strong>：需要 unwind 表和 RTTI（运行时类型信息）</li>
  <li><strong>标准库</strong>：STL 容器、算法等需要大量初始化代码</li>
  <li><strong>构造函数</strong>：静态对象的构造需要运行时支持</li>
  <li><strong>内存管理</strong>：operator new/delete 的默认实现</li>
  <li><strong>例子</strong>：即使在嵌入式环境中，完整的 C++ 运行时可能增加数百 KB 到数 MB 的开销</li>
</ul>

<h3 id="rust-运行时">Rust 运行时</h3>
<p>Rust 介于两者之间：</p>
<ul>
  <li><strong>零成本抽象</strong>：大部分抽象在编译时展开，不增加运行时开销</li>
  <li><strong>最小运行时</strong>：只需要基本的 panic 处理、内存分配器（若使用）</li>
  <li><strong>no_std 模式</strong>：可以完全禁用标准库，只使用 core 库，运行时开销与 C 相当<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">3</a></sup></li>
  <li><strong>例子</strong>：Redox OS 内核完全用 Rust 编写，使用 no_std 模式<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">4</a></sup></li>
</ul>

<h2 id="内存管理的核心区别">内存管理的核心区别</h2>

<p>内存管理模型的差异是另一关键因素。</p>

<h3 id="c-的内存管理问题">C++ 的内存管理问题</h3>

<ol>
  <li><strong>构造函数和析构函数</strong>
    <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Device</span> <span class="p">{</span>
 <span class="n">Resource</span><span class="o">*</span> <span class="n">res</span><span class="p">;</span>
<span class="nl">public:</span>
 <span class="n">Device</span><span class="p">()</span> <span class="p">{</span> <span class="n">res</span> <span class="o">=</span> <span class="n">allocate_resource</span><span class="p">();</span> <span class="p">}</span>  <span class="c1">// 可能失败</span>
 <span class="o">~</span><span class="n">Device</span><span class="p">()</span> <span class="p">{</span> <span class="n">release_resource</span><span class="p">();</span> <span class="p">}</span>        <span class="c1">// 异常可能发生</span>
<span class="p">};</span>
</code></pre></div>    </div>
    <ul>
      <li>构造函数无法返回错误码（只能用异常）</li>
      <li>析构函数中不能抛出异常</li>
      <li>对象生命周期由编译器自动管理，但在内核中这往往是不可预测的</li>
    </ul>
  </li>
  <li><strong>异常处理</strong>
    <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">driver_function</span><span class="p">()</span> <span class="p">{</span>
 <span class="n">Device</span> <span class="n">d</span><span class="p">;</span>  <span class="c1">// 构造</span>
 <span class="c1">// 如果这里发生异常，d 的析构函数会自动调用</span>
 <span class="c1">// 但在内核中，这种隐式控制流是危险的</span>
<span class="p">}</span>
</code></pre></div>    </div>
    <ul>
      <li>异常展开需要复杂的栈回溯</li>
      <li>增加了二进制文件大小</li>
      <li>实时性无法保证</li>
    </ul>
  </li>
  <li><strong>RAII 的局限性与运行时依赖</strong></li>
</ol>

<p>RAII（Resource Acquisition Is Initialization）的核心是：资源在对象构造时获取，在对象析构时释放。这一机制在内核中受限，且其实现本身依赖运行时支持。</p>

<p><strong>为何 RAII 需要运行时支持：</strong></p>

<ul>
  <li><strong>构造与析构的自动调用</strong>：编译器需在正确位置插入构造/析构调用，对象生命周期的管理（何时创建、何时销毁）依赖运行时机制。例如：</li>
</ul>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">FileHandler</span> <span class="p">{</span>
    <span class="kt">FILE</span><span class="o">*</span> <span class="n">file</span><span class="p">;</span>
<span class="nl">public:</span>
    <span class="n">FileHandler</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">filename</span><span class="p">)</span> <span class="p">{</span> <span class="n">file</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="s">"r"</span><span class="p">);</span> <span class="p">}</span>
    <span class="o">~</span><span class="n">FileHandler</span><span class="p">()</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="n">file</span><span class="p">)</span> <span class="n">fclose</span><span class="p">(</span><span class="n">file</span><span class="p">);</span> <span class="p">}</span>
<span class="p">};</span>

<span class="kt">void</span> <span class="n">processFile</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">FileHandler</span> <span class="n">fh</span><span class="p">(</span><span class="s">"data.txt"</span><span class="p">);</span>  <span class="c1">// 构造时获取资源</span>
    <span class="c1">// 使用文件...</span>
<span class="p">}</span>  <span class="c1">// 离开作用域时析构被自动调用</span>
</code></pre></div></div>

<ul>
  <li><strong>栈展开（Stack Unwinding）</strong>：异常发生时，需要按与构造相反的顺序自动调用所有已构造局部对象的析构函数，并维护调用栈信息。内核通常禁用异常，因此无法依赖这套机制。</li>
</ul>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">function</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">FileHandler</span> <span class="n">fh1</span><span class="p">(</span><span class="s">"a.txt"</span><span class="p">);</span>
    <span class="n">FileHandler</span> <span class="n">fh2</span><span class="p">(</span><span class="s">"b.txt"</span><span class="p">);</span>
    <span class="k">throw</span> <span class="n">std</span><span class="o">::</span><span class="n">runtime_error</span><span class="p">(</span><span class="s">"error"</span><span class="p">);</span>  <span class="c1">// 异常时 fh2、fh1 的析构须被调用</span>
<span class="p">}</span>
</code></pre></div></div>

<ul>
  <li>
    <p><strong>动态内存与智能指针</strong>：<code class="language-plaintext highlighter-rouge">std::vector</code>、<code class="language-plaintext highlighter-rouge">std::unique_ptr</code>/<code class="language-plaintext highlighter-rouge">std::shared_ptr</code> 等依赖堆分配与引用计数，需要在运行时跟踪资源。</p>
  </li>
  <li>
    <p><strong>多态对象的析构</strong>：通过基类指针删除派生类对象时，必须通过虚函数表（vtable）在运行时找到正确的析构函数，同样依赖运行时类型信息。</p>
  </li>
</ul>

<p>若纯靠编译时实现，无法处理异常路径下的释放、多态析构和动态资源的引用计数等，因此 RAII 既是 C++ 的核心特性，又离不开运行时支持，这与内核需要的确定性、无异常、显式控制相冲突。</p>

<p><strong>运行时实现简述</strong>：局部对象的构造/析构由编译器在固定位置插入调用；全局或静态对象由启动代码遍历 <code class="language-plaintext highlighter-rouge">.init_array</code>（或 <code class="language-plaintext highlighter-rouge">.ctors</code>）在进程启动时调用构造，退出时按逆序调用析构。异常时的栈展开则依赖 <strong>unwinder</strong>：编译器为每个函数生成 unwind 元数据（如 DWARF 的 <code class="language-plaintext highlighter-rouge">.eh_frame</code>），描述栈帧与需析构的对象；异常抛出时，运行时库按栈回溯，调用每帧的 personality 函数，按表调用析构并查找 catch。多态析构通过对象的 vtable 在运行时查表得到正确析构函数。这些机制多在编译器运行时（如 libgcc、libstdc++ 的一部分）中实现，与「标准库 STL」不是同一层，但都属 C++ 运行时。</p>

<ul>
  <li>RAII 假设资源释放是确定性的、无错的</li>
  <li>内核中可能需要延迟释放、异步释放</li>
  <li>硬件资源的释放可能非常复杂</li>
</ul>

<ol>
  <li><strong>模板元编程</strong>
    <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
<span class="k">class</span> <span class="nc">RingBuffer</span> <span class="p">{</span>
 <span class="n">T</span> <span class="n">buffer</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>  <span class="c1">// 类型在编译时确定</span>
 <span class="c1">// 但在内核中，可能需要根据硬件配置动态选择类型</span>
<span class="p">};</span>
</code></pre></div>    </div>
    <ul>
      <li>过度依赖模板会导致代码膨胀</li>
      <li>难以处理动态硬件配置</li>
    </ul>
  </li>
</ol>

<h3 id="c-的内存管理优势">C 的内存管理优势</h3>

<ol>
  <li><strong>显式控制</strong>
    <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">device</span> <span class="o">*</span><span class="n">dev</span> <span class="o">=</span> <span class="n">kmalloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">dev</span><span class="p">),</span> <span class="n">GFP_KERNEL</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">dev</span><span class="p">)</span>
 <span class="k">return</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span>
<span class="n">dev</span><span class="o">-&gt;</span><span class="n">ops</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">device_ops</span><span class="p">;</span>
<span class="c1">// 所有操作都是显式的，没有隐藏的控制流</span>
</code></pre></div>    </div>
  </li>
  <li><strong>错误处理直接</strong>
    <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">init_device</span><span class="p">(</span><span class="k">struct</span> <span class="n">device</span> <span class="o">*</span><span class="n">dev</span><span class="p">)</span> <span class="p">{</span>
 <span class="kt">int</span> <span class="n">ret</span><span class="p">;</span>
 <span class="n">ret</span> <span class="o">=</span> <span class="n">init_resource_a</span><span class="p">(</span><span class="n">dev</span><span class="p">);</span>
 <span class="k">if</span> <span class="p">(</span><span class="n">ret</span><span class="p">)</span>
     <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
 <span class="n">ret</span> <span class="o">=</span> <span class="n">init_resource_b</span><span class="p">(</span><span class="n">dev</span><span class="p">);</span>
 <span class="k">if</span> <span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="p">{</span>
     <span class="n">cleanup_resource_a</span><span class="p">(</span><span class="n">dev</span><span class="p">);</span>
     <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
 <span class="p">}</span>
 <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>    </div>
    <ul>
      <li>所有错误路径都清晰可见</li>
      <li>没有隐式的资源释放</li>
    </ul>
  </li>
  <li><strong>内存布局可预测</strong>
    <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">packet</span> <span class="p">{</span>
 <span class="kt">uint32_t</span> <span class="n">len</span><span class="p">;</span>
 <span class="kt">char</span> <span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>  <span class="c1">// 灵活数组成员</span>
<span class="p">};</span>  <span class="c1">// 内存布局完全由程序员控制</span>
</code></pre></div>    </div>
  </li>
</ol>

<h3 id="内核-c-的面向对象风格">内核 C 的面向对象风格</h3>

<p>内核虽然用 C 编写，但大量采用<strong>面向对象式</strong>的写法：用结构体承载「状态」，用函数指针表承载「行为」，多态通过查表调用实现，无需 C++ 的虚函数或异常<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">5</a></sup><sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">6</a></sup>。</p>

<p><strong>1. 函数指针表（类似 vtable）</strong></p>

<p>例如 VFS 层的 <code class="language-plaintext highlighter-rouge">struct file_operations</code>（<code class="language-plaintext highlighter-rouge">include/linux/fs.h</code>）：每个字段是一类操作，由具体驱动/文件系统填不同实现，通用代码通过 <code class="language-plaintext highlighter-rouge">file-&gt;f_op-&gt;read(...)</code> 等形式调用，实现多态。<code class="language-plaintext highlighter-rouge">file_operations</code> 与 inode 等结构的定义与用法可参考本博客<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">7</a></sup>。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化自 linux/fs.h</span>
<span class="k">struct</span> <span class="n">file_operations</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">module</span> <span class="o">*</span><span class="n">owner</span><span class="p">;</span>
    <span class="kt">ssize_t</span> <span class="p">(</span><span class="o">*</span><span class="n">read</span><span class="p">)</span> <span class="p">(</span><span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="p">,</span> <span class="kt">char</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="kt">size_t</span><span class="p">,</span> <span class="n">loff_t</span> <span class="o">*</span><span class="p">);</span>
    <span class="kt">ssize_t</span> <span class="p">(</span><span class="o">*</span><span class="n">write</span><span class="p">)</span> <span class="p">(</span><span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="kt">size_t</span><span class="p">,</span> <span class="n">loff_t</span> <span class="o">*</span><span class="p">);</span>
    <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">open</span><span class="p">)</span> <span class="p">(</span><span class="k">struct</span> <span class="n">inode</span> <span class="o">*</span><span class="p">,</span> <span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="p">);</span>
    <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">release</span><span class="p">)</span> <span class="p">(</span><span class="k">struct</span> <span class="n">inode</span> <span class="o">*</span><span class="p">,</span> <span class="k">struct</span> <span class="n">file</span> <span class="o">*</span><span class="p">);</span>
    <span class="c1">// ...</span>
<span class="p">};</span>

<span class="c1">// 驱动侧：实现“类”并挂到 file 上</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">file_operations</span> <span class="n">my_fops</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">.</span><span class="n">owner</span> <span class="o">=</span> <span class="n">THIS_MODULE</span><span class="p">,</span>
    <span class="p">.</span><span class="n">read</span>  <span class="o">=</span> <span class="n">my_read</span><span class="p">,</span>
    <span class="p">.</span><span class="n">write</span> <span class="o">=</span> <span class="n">my_write</span><span class="p">,</span>
    <span class="p">.</span><span class="n">open</span>  <span class="o">=</span> <span class="n">my_open</span><span class="p">,</span>
    <span class="p">.</span><span class="n">release</span> <span class="o">=</span> <span class="n">my_release</span><span class="p">,</span>
<span class="p">};</span>
</code></pre></div></div>

<p>同类结构还有 <code class="language-plaintext highlighter-rouge">inode_operations</code>、<code class="language-plaintext highlighter-rouge">dentry_operations</code>、<code class="language-plaintext highlighter-rouge">super_operations</code>、各类 <code class="language-plaintext highlighter-rouge">*_ops</code> 等，内核中有大量这种「操作表」<sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">1</a></sup>。</p>

<p><strong>2. “继承”通过结构体嵌入</strong></p>

<p>子类型通过<strong>在结构体里嵌入父类型</strong>复用共同字段，并可用 <code class="language-plaintext highlighter-rouge">container_of</code> 从父指针反推子指针。例如设备模型里 <code class="language-plaintext highlighter-rouge">struct device</code> 内嵌 <code class="language-plaintext highlighter-rouge">struct kobject</code>，子设备再内嵌 <code class="language-plaintext highlighter-rouge">struct device</code>，形成层次与共同生命周期管理。</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 概念上：子结构体包含“基类”</span>
<span class="k">struct</span> <span class="n">my_device</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">device</span> <span class="n">dev</span><span class="p">;</span>   <span class="c1">// 内嵌，相当于“继承” device 的字段</span>
    <span class="kt">int</span> <span class="n">my_private_data</span><span class="p">;</span>
<span class="p">};</span>

<span class="c1">// 从通用 device* 得到 my_device*</span>
<span class="k">struct</span> <span class="n">my_device</span> <span class="o">*</span><span class="n">mdev</span> <span class="o">=</span> <span class="n">container_of</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="k">struct</span> <span class="n">my_device</span><span class="p">,</span> <span class="n">dev</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>3. “方法”约定：首参为对象指针</strong></p>

<p>很多内核 API 的「方法」形态是：第一个参数为操作对象，例如 <code class="language-plaintext highlighter-rouge">int (*open)(struct inode *, struct file *)</code>。调用方持有 <code class="language-plaintext highlighter-rouge">struct file *</code>，通过 <code class="language-plaintext highlighter-rouge">f_op-&gt;open(inode, filp)</code> 调用，等价于「对 file 做 open」，与 OO 的 <code class="language-plaintext highlighter-rouge">obj-&gt;method(args)</code> 对应。</p>

<p>综上，内核 C 用「结构体 + 函数指针表 + 嵌入 + 显式首参」实现接口抽象和多态，无需 C++ 的运行时（异常、vtable 展开、构造/析构顺序），仍能保持清晰的层次与可扩展性。</p>

<h3 id="rust-的创新解决方案">Rust 的创新解决方案</h3>

<p>Rust 通过所有权系统和生命周期来平衡安全性和控制力：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Device</span> <span class="p">{</span>
    <span class="n">resource</span><span class="p">:</span> <span class="n">Resource</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">Device</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">new</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">,</span> <span class="n">Error</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">res</span> <span class="o">=</span> <span class="nn">Resource</span><span class="p">::</span><span class="nf">new</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>  <span class="c1">// 显式错误处理</span>
        <span class="nf">Ok</span><span class="p">(</span><span class="n">Device</span> <span class="p">{</span> <span class="n">resource</span><span class="p">:</span> <span class="n">res</span> <span class="p">})</span>
    <span class="p">}</span>
<span class="p">}</span>  <span class="c1">// Drop trait 提供确定性析构，但比 C++ 更可控</span>

<span class="c1">// 所有权确保资源只有一个所有者</span>
<span class="k">fn</span> <span class="nf">use_device</span><span class="p">(</span><span class="n">dev</span><span class="p">:</span> <span class="n">Device</span><span class="p">)</span> <span class="p">{</span>  <span class="c1">// 获得所有权</span>
    <span class="c1">// 使用设备</span>
<span class="p">}</span>  <span class="c1">// 这里自动释放，但行为是确定的</span>
</code></pre></div></div>

<p>Rust 解决了 C++ 的几个关键问题：</p>
<ol>
  <li><strong>无异常</strong>：使用 Result 类型进行显式错误处理</li>
  <li><strong>所有权系统</strong>：资源释放是确定性的</li>
  <li><strong>零成本抽象</strong>：无运行时开销</li>
  <li><strong>内存安全</strong>：编译时检查，无 GC 开销</li>
</ol>

<p><strong>Rust 的错误处理（与 C++ 异常对比）</strong></p>

<p>Rust 没有异常，错误通过类型在类型系统中显式表达，调用方必须处理，适合内核等不能依赖 unwinder 的环境。</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">Result&lt;T, E&gt;</code></strong>：表示可能失败的操作，成功为 <code class="language-plaintext highlighter-rouge">Ok(t)</code>，失败为 <code class="language-plaintext highlighter-rouge">Err(e)</code>。<code class="language-plaintext highlighter-rouge">Option&lt;T&gt;</code> 表示可选值（<code class="language-plaintext highlighter-rouge">Some(t)</code> / <code class="language-plaintext highlighter-rouge">None</code>），二者均在 <code class="language-plaintext highlighter-rouge">core</code> 中，no_std 可用。</li>
  <li><strong>构造/初始化可返回错误</strong>：类似上面 <code class="language-plaintext highlighter-rouge">Device::new() -&gt; Result&lt;Self, Error&gt;</code>，失败时返回 <code class="language-plaintext highlighter-rouge">Err(...)</code>，无需两阶段 init。</li>
  <li><strong><code class="language-plaintext highlighter-rouge">?</code> 操作符</strong>：在返回 <code class="language-plaintext highlighter-rouge">Result</code> 的函数内，<code class="language-plaintext highlighter-rouge">expr?</code> 表示若 <code class="language-plaintext highlighter-rouge">expr</code> 为 <code class="language-plaintext highlighter-rouge">Err(e)</code> 则当前函数立即返回 <code class="language-plaintext highlighter-rouge">Err(e)</code>，否则解出 <code class="language-plaintext highlighter-rouge">Ok</code> 中的值继续执行，错误沿调用栈「向上传播」但<strong>无栈展开</strong>，仅是一次返回。</li>
  <li><strong>调用方必须处理</strong>：用 <code class="language-plaintext highlighter-rouge">match</code>、<code class="language-plaintext highlighter-rouge">if let</code>、<code class="language-plaintext highlighter-rouge">.map_err()</code> 或继续 <code class="language-plaintext highlighter-rouge">?</code>，编译器要求覆盖 <code class="language-plaintext highlighter-rouge">Ok</code>/<code class="language-plaintext highlighter-rouge">Err</code> 分支，不会「忘记」检查错误。</li>
</ul>

<p>示例（no_std 下常见写法）：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 内核/裸机中常用 &amp;'static str 或自定义 enum 作为错误类型</span>
<span class="k">fn</span> <span class="nf">init_hw</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="o">&amp;</span><span class="k">'static</span> <span class="nb">str</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="nf">enable_clock</span><span class="p">()</span><span class="nf">.ok_or</span><span class="p">(</span><span class="s">"clock init failed"</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">cfg</span> <span class="o">=</span> <span class="nf">read_cfg</span><span class="p">()</span><span class="nf">.ok_or</span><span class="p">(</span><span class="s">"bad config"</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="nf">apply_config</span><span class="p">(</span><span class="n">cfg</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>  <span class="c1">// 若返回 Err，本函数直接 return Err(...)</span>
    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">driver_init</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="o">&amp;</span><span class="k">'static</span> <span class="nb">str</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="nf">init_hw</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
    <span class="nf">register_irq</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
<span class="c1">// 调用方：match driver_init() { Ok(()) =&gt; {}, Err(e) =&gt; { ... } }</span>
</code></pre></div></div>

<p>与 C++ 对比：C++ 构造不能返回错误，只能抛异常或两阶段 init；Rust 用 <code class="language-plaintext highlighter-rouge">Result</code> 让「可能失败」成为类型的一部分，无运行时开销，也不依赖异常展开，因此更适合内核。</p>

<h2 id="为什么内核不能使用标准库">为什么内核不能使用标准库</h2>

<h3 id="1-标准库依赖操作系统服务">1. 标准库依赖操作系统服务</h3>

<p>标准库本质上是操作系统功能的封装：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 标准库的实现依赖系统调用</span>
<span class="c1">// std::fs::File::open("test.txt") 最终会调用：</span>
<span class="c1">// Linux: openat() 系统调用</span>
<span class="c1">// Windows: NtCreateFile() 系统调用</span>

<span class="c1">// 但在内核中：</span>
<span class="c1">// 1. 没有文件系统（或文件系统实现不同）</span>
<span class="c1">// 2. 没有当前工作目录的概念</span>
<span class="c1">// 3. 没有用户态/内核态的转换机制</span>
</code></pre></div></div>

<h3 id="2-内核需要裸机环境">2. 内核需要裸机环境</h3>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 用户态程序可以这样：</span>
<span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
</span><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Hello</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>  <span class="c1">// 依赖操作系统的标准输出</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="c1">// 内核只能这样：</span>
<span class="kt">void</span> <span class="nf">kernel_entry</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// 没有 main 函数，没有标准库</span>
    <span class="c1">// 需要直接操作硬件</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">video_memory</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="mh">0xb8000</span><span class="p">;</span>
    <span class="o">*</span><span class="n">video_memory</span> <span class="o">=</span> <span class="sc">'H'</span><span class="p">;</span>  <span class="c1">// 直接写入显存</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="各语言在没有标准库时的表现">各语言在没有标准库时的表现</h2>

<h3 id="c-语言裸机编程的典范">C 语言：裸机编程的典范</h3>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 内核中常见的 C 代码</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">serial_putc</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// 直接操作硬件寄存器</span>
    <span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">inb</span><span class="p">(</span><span class="n">COM1</span> <span class="o">+</span> <span class="mi">5</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x20</span><span class="p">));</span>
    <span class="n">outb</span><span class="p">(</span><span class="n">COM1</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// 自己实现需要的功能</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">memcpy</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">dest</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">src</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">char</span><span class="o">*</span> <span class="n">d</span> <span class="o">=</span> <span class="n">dest</span><span class="p">;</span>
    <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">s</span> <span class="o">=</span> <span class="n">src</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">n</span><span class="o">--</span><span class="p">)</span> <span class="o">*</span><span class="n">d</span><span class="o">++</span> <span class="o">=</span> <span class="o">*</span><span class="n">s</span><span class="o">++</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">dest</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>C 语言的特点：</p>
<ul>
  <li><strong>语言本身与运行时分离</strong>：语法不依赖标准库</li>
  <li><strong>freestanding environment</strong>：C 标准明确支持无标准库环境</li>
  <li><strong>最小依赖</strong>：甚至连 <code class="language-plaintext highlighter-rouge">memcpy</code> 都可以自己实现</li>
</ul>

<h3 id="c标准库依赖严重">C++：标准库依赖严重</h3>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 不能用的 C++ 特性：</span>
<span class="cp">#include</span> <span class="cpf">&lt;vector&gt;</span><span class="c1">      // 需要动态内存分配和异常</span><span class="cp">
#include</span> <span class="cpf">&lt;string&gt;</span><span class="c1">      // 需要内存分配和字符处理</span><span class="cp">
#include</span> <span class="cpf">&lt;iostream&gt;</span><span class="c1">    // 需要操作系统支持</span><span class="cp">
#include</span> <span class="cpf">&lt;thread&gt;</span><span class="c1">      // 需要线程库支持</span><span class="cp">
#include</span> <span class="cpf">&lt;mutex&gt;</span><span class="c1">       // 需要同步原语</span><span class="cp">
</span>
<span class="c1">// 即使不用标准库，语言特性本身也有问题：</span>
<span class="k">class</span> <span class="nc">Device</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">name</span><span class="p">;</span>  <span class="c1">// 错误：string 需要标准库</span>
<span class="nl">public:</span>
    <span class="n">Device</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* 构造函数不能失败？ */</span> <span class="p">}</span>
    <span class="o">~</span><span class="n">Device</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* 析构函数不能抛异常？ */</span> <span class="p">}</span>
<span class="p">};</span>

<span class="c1">// 尝试不用标准库：</span>
<span class="k">class</span> <span class="nc">Device</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="n">name</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>  <span class="c1">// 固定大小，但不够灵活</span>
    <span class="kt">int</span> <span class="n">fd</span><span class="p">;</span>
<span class="nl">public:</span>
    <span class="n">Device</span><span class="p">()</span> <span class="o">:</span> <span class="n">fd</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{}</span>  <span class="c1">// 两阶段构造（anti-pattern）</span>
    <span class="kt">bool</span> <span class="n">init</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* 真正的初始化 */</span> <span class="p">}</span>
    <span class="kt">void</span> <span class="n">deinit</span><span class="p">()</span> <span class="p">{</span> <span class="cm">/* 手动释放 */</span> <span class="p">}</span>
<span class="p">};</span>
<span class="c1">// 但这违背了 RAII 原则</span>
</code></pre></div></div>

<p>C++ 的问题：</p>
<ul>
  <li><strong>语言特性隐含依赖</strong>：即使不用标准库，异常、RTTI 等也需要运行时支持</li>
  <li><strong>STL 无法移植</strong>：容器都假设有堆内存管理和操作系统服务</li>
  <li><strong>构造函数限制</strong>：无法优雅处理初始化失败</li>
</ul>

<p><strong>澄清</strong>：离开标准库并不等于「所有 C++ 特性都用不了」。RAII（自己的类）、虚函数、vtable、重载 <code class="language-plaintext highlighter-rouge">operator new/delete</code> 都是<strong>语言特性</strong>，不依赖标准库；异常则依赖 unwinder 等<strong>运行时</strong>（多在编译器运行时库里），与 STL 是不同层。内核里通常还禁用异常（<code class="language-plaintext highlighter-rouge">-fno-exceptions</code>）和 RTTI（<code class="language-plaintext highlighter-rouge">-fno-rtti</code>），因此异常和 <code class="language-plaintext highlighter-rouge">dynamic_cast</code>/<code class="language-plaintext highlighter-rouge">typeid</code> 不可用，RAII 在异常路径上的保障也随之消失。</p>

<p><strong>假设内核用 C++：去掉标准库并加上常见限制（如 -fno-exceptions、-fno-rtti、禁止复杂全局构造）后，功能退化可概括为：</strong></p>

<table>
  <thead>
    <tr>
      <th>情况</th>
      <th>功能</th>
      <th>说明</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>完全不可用</strong></td>
      <td>STL 容器/算法、std::string、标准智能指针、iostream</td>
      <td>依赖标准库，内核不链接</td>
    </tr>
    <tr>
      <td> </td>
      <td>异常 (throw/catch)</td>
      <td>通常 -fno-exceptions，且不愿携带 unwinder</td>
    </tr>
    <tr>
      <td> </td>
      <td>RTTI (dynamic_cast, typeid)</td>
      <td>通常 -fno-rtti</td>
    </tr>
    <tr>
      <td><strong>语义退化</strong></td>
      <td>RAII</td>
      <td>构造不能返回错误 → 退化为两阶段 init；无异常则「任意路径都析构」的保证弱化；析构常被要求只做简单、确定性释放</td>
    </tr>
    <tr>
      <td> </td>
      <td>全局/静态对象（非平凡构造）</td>
      <td>依赖 .init_array 与启动顺序，内核中多禁止或极简使用</td>
    </tr>
    <tr>
      <td><strong>仍可用但受限</strong></td>
      <td>new/delete</td>
      <td>可重载到 kmalloc/kfree；有的规范禁止全局 new，仅允许 placement new + 内核分配器</td>
    </tr>
    <tr>
      <td> </td>
      <td>虚函数 / vtable、模板、类与继承</td>
      <td>不依赖标准库；风格上常限制深继承与过度模板</td>
    </tr>
    <tr>
      <td> </td>
      <td>const、引用、重载、命名空间</td>
      <td>纯语言特性，无退化</td>
    </tr>
  </tbody>
</table>

<p>整体上 C++ 会退化成「带类、模板和虚函数的 C」：语法和类型系统仍在，错误处理回到返回码，资源管理更显式，不能依赖异常与标准库。</p>

<h3 id="rustno_std-模式">Rust：no_std 模式<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup></h3>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 指定不使用标准库</span>
<span class="nd">#![no_std]</span>

<span class="c1">// 只能使用 core 库（无操作系统依赖）</span>
<span class="k">use</span> <span class="nn">core</span><span class="p">::</span><span class="nn">panic</span><span class="p">::</span><span class="n">PanicInfo</span><span class="p">;</span>

<span class="c1">// 需要自己处理 panic</span>
<span class="nd">#[panic_handler]</span>
<span class="k">fn</span> <span class="nf">panic</span><span class="p">(</span><span class="n">_info</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">PanicInfo</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">!</span> <span class="p">{</span>
    <span class="k">loop</span> <span class="p">{}</span>
<span class="p">}</span>

<span class="c1">// 需要自己实现内存分配（如果需要）</span>
<span class="nd">#[global_allocator]</span>
<span class="k">static</span> <span class="n">ALLOCATOR</span><span class="p">:</span> <span class="n">MyAllocator</span> <span class="o">=</span> <span class="n">MyAllocator</span><span class="p">;</span>

<span class="c1">// 可以安全地使用大部分语言特性</span>
<span class="nd">#[repr(C)]</span>
<span class="k">struct</span> <span class="n">Device</span> <span class="p">{</span>
    <span class="n">base_addr</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
    <span class="n">irq</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">Device</span> <span class="p">{</span>
    <span class="k">const</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="k">Self</span> <span class="p">{</span>  <span class="c1">// const fn 可以在编译时执行</span>
        <span class="n">Device</span> <span class="p">{</span> <span class="n">base_addr</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="n">irq</span><span class="p">:</span> <span class="mi">0</span> <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">read_reg</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">offset</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
        <span class="c1">// 直接操作内存映射 IO</span>
        <span class="k">unsafe</span> <span class="p">{</span> <span class="p">(</span><span class="k">self</span><span class="py">.base_addr</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u32</span><span class="p">)</span><span class="nf">.add</span><span class="p">(</span><span class="n">offset</span><span class="p">)</span><span class="nf">.read_volatile</span><span class="p">()</span> <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Rust 的优势<sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup>：</p>
<ul>
  <li><strong>core 库</strong>：提供语言核心功能，无操作系统依赖。core 中<strong>不包含与操作系统相关的 I/O 能力</strong>：文件、标准输入/输出（stdin/stdout）、网络（TcpStream 等）均在 <code class="language-plaintext highlighter-rouge">std</code> 中；core 里仅有极少的 I/O 相关 trait/类型定义（如 <code class="language-plaintext highlighter-rouge">BorrowedBuf</code>），不提供实际读写，因此 <code class="language-plaintext highlighter-rouge">#![no_std]</code> 下无法使用 <code class="language-plaintext highlighter-rouge">println!</code>、<code class="language-plaintext highlighter-rouge">File</code>、<code class="language-plaintext highlighter-rouge">std::net</code> 等，需自行实现或依赖其他库。</li>
  <li><strong>语言特性零成本</strong>：所有权、借用检查都在编译期</li>
  <li><strong>明确的 unsafe</strong>：硬件操作需要显式标记</li>
  <li><strong>const fn</strong>：可以在编译时执行函数</li>
</ul>

<h2 id="实际代码对比">实际代码对比</h2>

<h3 id="实现一个简单的串口驱动">实现一个简单的串口驱动</h3>

<p><strong>C 版本</strong>：</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// serial.h</span>
<span class="k">struct</span> <span class="n">serial_port</span> <span class="p">{</span>
    <span class="kt">uint16_t</span> <span class="n">port</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">initialized</span><span class="p">;</span>
<span class="p">};</span>

<span class="kt">void</span> <span class="nf">serial_init</span><span class="p">(</span><span class="k">struct</span> <span class="n">serial_port</span> <span class="o">*</span><span class="n">sp</span><span class="p">,</span> <span class="kt">uint16_t</span> <span class="n">port</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">serial_putc</span><span class="p">(</span><span class="k">struct</span> <span class="n">serial_port</span> <span class="o">*</span><span class="n">sp</span><span class="p">,</span> <span class="kt">char</span> <span class="n">c</span><span class="p">);</span>

<span class="c1">// serial.c</span>
<span class="kt">void</span> <span class="nf">serial_init</span><span class="p">(</span><span class="k">struct</span> <span class="n">serial_port</span> <span class="o">*</span><span class="n">sp</span><span class="p">,</span> <span class="kt">uint16_t</span> <span class="n">port</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">sp</span><span class="o">-&gt;</span><span class="n">port</span> <span class="o">=</span> <span class="n">port</span><span class="p">;</span>
    <span class="n">sp</span><span class="o">-&gt;</span><span class="n">initialized</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="n">outb</span><span class="p">(</span><span class="n">port</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">);</span>  <span class="c1">// 关闭中断</span>
    <span class="n">outb</span><span class="p">(</span><span class="n">port</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="mh">0x80</span><span class="p">);</span>  <span class="c1">// 设置波特率</span>
    <span class="n">outb</span><span class="p">(</span><span class="n">port</span> <span class="o">+</span> <span class="mi">0</span><span class="p">,</span> <span class="mh">0x03</span><span class="p">);</span>
    <span class="n">outb</span><span class="p">(</span><span class="n">port</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">);</span>
    <span class="n">outb</span><span class="p">(</span><span class="n">port</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="mh">0x03</span><span class="p">);</span>
    <span class="n">outb</span><span class="p">(</span><span class="n">port</span> <span class="o">+</span> <span class="mi">2</span><span class="p">,</span> <span class="mh">0xC7</span><span class="p">);</span>
    <span class="n">outb</span><span class="p">(</span><span class="n">port</span> <span class="o">+</span> <span class="mi">4</span><span class="p">,</span> <span class="mh">0x0B</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">serial_putc</span><span class="p">(</span><span class="k">struct</span> <span class="n">serial_port</span> <span class="o">*</span><span class="n">sp</span><span class="p">,</span> <span class="kt">char</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">while</span> <span class="p">((</span><span class="n">inb</span><span class="p">(</span><span class="n">sp</span><span class="o">-&gt;</span><span class="n">port</span> <span class="o">+</span> <span class="mi">5</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x20</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">outb</span><span class="p">(</span><span class="n">sp</span><span class="o">-&gt;</span><span class="n">port</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>C++ 版本（有问题）</strong>：</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 尝试用 C++ 风格</span>
<span class="k">class</span> <span class="nc">SerialPort</span> <span class="p">{</span>
<span class="nl">private:</span>
    <span class="kt">uint16_t</span> <span class="n">port</span><span class="p">;</span>
    <span class="kt">bool</span> <span class="n">initialized</span><span class="p">;</span>

<span class="nl">public:</span>
    <span class="n">SerialPort</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">port</span><span class="p">)</span> <span class="o">:</span> <span class="n">port</span><span class="p">(</span><span class="n">port</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// 构造函数中初始化，但如果失败？</span>
        <span class="n">init</span><span class="p">();</span>  <span class="c1">// 不能返回错误码</span>
    <span class="p">}</span>

    <span class="o">~</span><span class="n">SerialPort</span><span class="p">()</span> <span class="p">{</span>
        <span class="c1">// 析构函数中清理</span>
    <span class="p">}</span>

    <span class="kt">void</span> <span class="n">putc</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">while</span> <span class="p">((</span><span class="n">inb</span><span class="p">(</span><span class="n">port</span> <span class="o">+</span> <span class="mi">5</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x20</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">);</span>
        <span class="n">outb</span><span class="p">(</span><span class="n">port</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
    <span class="p">}</span>

<span class="nl">private:</span>
    <span class="kt">void</span> <span class="n">init</span><span class="p">()</span> <span class="p">{</span>
        <span class="c1">// 如果这里失败，只能抛异常</span>
        <span class="c1">// 但内核中不能使用异常</span>
        <span class="n">outb</span><span class="p">(</span><span class="n">port</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">);</span>
        <span class="c1">// ...</span>
    <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p><strong>Rust 版本</strong>（内存映射 I/O 风格）：</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#![no_std]</span>

<span class="k">use</span> <span class="nn">core</span><span class="p">::</span><span class="nn">ptr</span><span class="p">::{</span><span class="n">read_volatile</span><span class="p">,</span> <span class="n">write_volatile</span><span class="p">};</span>

<span class="nd">#[repr(C)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">SerialPort</span> <span class="p">{</span>
    <span class="n">port</span><span class="p">:</span> <span class="nb">u16</span><span class="p">,</span>
    <span class="n">initialized</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">SerialPort</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">port</span><span class="p">:</span> <span class="nb">u16</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">,</span> <span class="o">&amp;</span><span class="k">'static</span> <span class="nb">str</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">let</span> <span class="k">mut</span> <span class="n">sp</span> <span class="o">=</span> <span class="n">SerialPort</span> <span class="p">{</span>
            <span class="n">port</span><span class="p">,</span>
            <span class="n">initialized</span><span class="p">:</span> <span class="k">false</span><span class="p">,</span>
        <span class="p">};</span>
        <span class="n">sp</span><span class="nf">.init</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
        <span class="nf">Ok</span><span class="p">(</span><span class="n">sp</span><span class="p">)</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">init</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="o">&amp;</span><span class="k">'static</span> <span class="nb">str</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">unsafe</span> <span class="p">{</span>
            <span class="nf">write_volatile</span><span class="p">((</span><span class="k">self</span><span class="py">.port</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="mi">0x00</span><span class="p">);</span>
            <span class="nf">write_volatile</span><span class="p">((</span><span class="k">self</span><span class="py">.port</span> <span class="o">+</span> <span class="mi">3</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="mi">0x80</span><span class="p">);</span>
            <span class="nf">write_volatile</span><span class="p">((</span><span class="k">self</span><span class="py">.port</span> <span class="o">+</span> <span class="mi">0</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="mi">0x03</span><span class="p">);</span>
            <span class="nf">write_volatile</span><span class="p">((</span><span class="k">self</span><span class="py">.port</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="mi">0x00</span><span class="p">);</span>
            <span class="nf">write_volatile</span><span class="p">((</span><span class="k">self</span><span class="py">.port</span> <span class="o">+</span> <span class="mi">3</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="mi">0x03</span><span class="p">);</span>
            <span class="nf">write_volatile</span><span class="p">((</span><span class="k">self</span><span class="py">.port</span> <span class="o">+</span> <span class="mi">2</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="mi">0xC7</span><span class="p">);</span>
            <span class="nf">write_volatile</span><span class="p">((</span><span class="k">self</span><span class="py">.port</span> <span class="o">+</span> <span class="mi">4</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="mi">0x0B</span><span class="p">);</span>
        <span class="p">}</span>
        <span class="k">self</span><span class="py">.initialized</span> <span class="o">=</span> <span class="k">true</span><span class="p">;</span>
        <span class="nf">Ok</span><span class="p">(())</span>
    <span class="p">}</span>

    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">putc</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">c</span><span class="p">:</span> <span class="nb">u8</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">unsafe</span> <span class="p">{</span>
            <span class="k">while</span> <span class="p">(</span><span class="nf">read_volatile</span><span class="p">((</span><span class="k">self</span><span class="py">.port</span> <span class="o">+</span> <span class="mi">5</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u8</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mi">0x20</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{}</span>
            <span class="nf">write_volatile</span><span class="p">(</span><span class="k">self</span><span class="py">.port</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>上述 Rust 示例为<strong>内存映射 I/O</strong> 风格（例如常见于 ARM 等平台）；在 x86 上 COM 口为<strong>端口 I/O</strong>，需使用 inb/outb 或 <code class="language-plaintext highlighter-rouge">x86_64::instructions::port::Port</code> 等封装。</p>

<h2 id="标准库-vs-no_std-的生态差异">标准库 vs no_std 的生态差异</h2>

<h3 id="可用功能对比">可用功能对比</h3>

<table>
  <thead>
    <tr>
      <th>功能</th>
      <th>标准库</th>
      <th>no_std</th>
      <th>说明</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Vec/String</td>
      <td>✅</td>
      <td>❌</td>
      <td>需要内存分配器</td>
    </tr>
    <tr>
      <td>Box/Rc/Arc</td>
      <td>✅</td>
      <td>⚠️</td>
      <td>需要内存分配器</td>
    </tr>
    <tr>
      <td>HashMap</td>
      <td>✅</td>
      <td>❌</td>
      <td>需要随机数源</td>
    </tr>
    <tr>
      <td>println!</td>
      <td>✅</td>
      <td>❌</td>
      <td>需要 IO（core 无具体 I/O 实现）</td>
    </tr>
    <tr>
      <td>文件操作</td>
      <td>✅</td>
      <td>❌</td>
      <td>需要文件系统</td>
    </tr>
    <tr>
      <td>线程</td>
      <td>✅</td>
      <td>❌</td>
      <td>需要调度器</td>
    </tr>
    <tr>
      <td>Mutex</td>
      <td>✅</td>
      <td>⚠️</td>
      <td>需要原子操作支持</td>
    </tr>
    <tr>
      <td>迭代器</td>
      <td>✅</td>
      <td>✅</td>
      <td>纯语言特性</td>
    </tr>
    <tr>
      <td>match</td>
      <td>✅</td>
      <td>✅</td>
      <td>语言特性</td>
    </tr>
    <tr>
      <td>trait</td>
      <td>✅</td>
      <td>✅</td>
      <td>语言特性</td>
    </tr>
    <tr>
      <td>闭包</td>
      <td>✅</td>
      <td>✅</td>
      <td>语言特性</td>
    </tr>
  </tbody>
</table>

<h3 id="实际影响">实际影响</h3>

<p>在裸机环境中：</p>
<ul>
  <li><strong>C</strong>：完全掌控，需要什么写什么</li>
  <li><strong>C++</strong>：大量特性受限，变成「更好的 C」</li>
  <li><strong>Rust</strong>：通过 no_std + core 保留大部分语言能力<sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">2</a></sup></li>
</ul>

<h2 id="实际内核开发的选择">实际内核开发的选择</h2>

<ul>
  <li><strong>Linux</strong>：C 语言，完全掌控内存和运行时；近年来开始接纳 Rust 编写的子系统<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">8</a></sup><sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">9</a></sup></li>
  <li><strong>Windows</strong>：混合，内核主要用 C，部分驱动用 C++</li>
  <li><strong>Redox OS</strong>：Rust，展示现代语言也能做内核<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">4</a></sup></li>
  <li><strong>鸿蒙</strong>：混合，内核用 C，上层用 C++/Rust</li>
</ul>

<h2 id="总结">总结</h2>

<p>从运行时与内存管理看，C++ 不适合内核开发的主要原因在于<strong>内存管理模型的差异</strong>：异常处理、隐式构造/析构、RAII 等与内核需要的确定性和显式控制相冲突；Rust 则用所有权系统在零成本抽象与内存安全之间取得折中。从标准库看，内核不能使用标准库：<strong>C</strong> 失去的很少（语言本身不依赖库），<strong>C++</strong> 失去核心优势（STL、异常、部分 RAII），<strong>Rust</strong> 失去便利性（集合类型、格式化输出）但保留安全性。因此 Linux 选择 C（简单、可控、最小依赖，Rust 作为补充逐步引入<sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">9</a></sup>），Windows 内核主要用 C、部分驱动用 C++ 且限制特性，Redox 选择 Rust（no_std 提供安全性与表达能力的最佳平衡<sup id="fnref:5:2" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">4</a></sup>）。</p>

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git">Linux Kernel Source (torvalds/linux)</a> - 官方内核源码（C 为主，含 Rust 子系统） <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://docs.rust-embedded.org/book/intro/no-std.html">The Embedded Rust Book - no_std</a> - Rust 裸机/内核开发中的 no_std 与 core 库说明 <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://rust-lang.github.io/rfcs/1184-stabilize-no_std.html">Rust RFC 1184: Stabilize no_std</a> - no_std 稳定化与 libcore 范围 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://www.redox-os.org/">Redox OS</a> - 使用 Rust no_std 编写的操作系统 <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:5:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://lwn.net/Articles/444910/">Object-oriented design patterns in the kernel, part 1</a> - LWN，方法分派与 vtable（file_operations、inode_operations 等）模式 <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p><a href="https://lwn.net/Articles/446317/">Object-oriented design patterns in the kernel, part 2</a> - LWN，数据继承与结构体内嵌（container_of）模式 <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p><a href="https://weinan.io/2017/12/17/linux-driver.html">Linux驱动开发入门（四）</a> - 本博客，file_operations / inode 等内核数据结构与驱动示例 <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://docs.kernel.org/rust/general-information.html">Linux Kernel - Rust support</a> - 内核 Rust 支持说明（仅链接 libcore，无 std） <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://rust-for-linux.com/">Rust for Linux</a> - 内核内 Rust 支持项目与文档 <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[操作系统内核开发与应用程序开发的核心区别之一，在于运行时与内存管理模型的约束。本文从运行时大小、内存管理模型和标准库依赖三个方面，分析 C、C++、Rust 在内核开发中的差异。]]></summary></entry><entry><title type="html">How C Calls Rust in Linux Kernel: Module Lifecycle Deep Dive</title><link href="https://weinan.io/2026/02/18/how-c-calls-rust-in-linux-kernel.html" rel="alternate" type="text/html" title="How C Calls Rust in Linux Kernel: Module Lifecycle Deep Dive" /><published>2026-02-18T00:00:00+00:00</published><updated>2026-02-18T00:00:00+00:00</updated><id>https://weinan.io/2026/02/18/how-c-calls-rust-in-linux-kernel</id><content type="html" xml:base="https://weinan.io/2026/02/18/how-c-calls-rust-in-linux-kernel.html"><![CDATA[<p>A comprehensive technical analysis of how C kernel code calls Rust functions through the module loading mechanism. Using actual Linux kernel source code (6.x), this article reveals the complete evidence chain: from Rust’s #[no_mangle] attribute to C’s function pointer invocation, from ELF symbol binding to the actual call flow. We demonstrate that C→Rust calls are not theoretical but a production reality implemented through standard module lifecycle management.</p>

<h2 id="introduction-the-question">Introduction: The Question</h2>

<p>In discussions about Rust in the Linux kernel, a fundamental architectural question often arises:</p>

<p><strong>“Can C kernel code call Rust functions?”</strong></p>

<p>This isn’t just an academic question. Understanding the call direction between C and Rust is crucial for grasping:</p>
<ul>
  <li>The integration architecture</li>
  <li>ABI stability requirements</li>
  <li>Future evolution possibilities</li>
  <li>Security and safety boundaries</li>
</ul>

<p>Many assume that Rust only wraps C APIs (unidirectional), making Rust purely a “consumer” of C services. However, <strong>actual kernel source code reveals a different reality</strong>: C does call Rust functions, specifically for module lifecycle management.</p>

<p>This article provides a complete evidence chain based on Linux kernel 6.x source code.</p>

<h2 id="the-answer-yes-through-module-lifecycle">The Answer: Yes, Through Module Lifecycle</h2>

<p><strong>C kernel code DOES call Rust functions</strong> for:</p>
<ul>
  <li>✅ Module initialization (<code class="language-plaintext highlighter-rouge">init_module()</code>, <code class="language-plaintext highlighter-rouge">__&lt;name&gt;_init()</code>)</li>
  <li>✅ Module cleanup (<code class="language-plaintext highlighter-rouge">cleanup_module()</code>, <code class="language-plaintext highlighter-rouge">__&lt;name&gt;_exit()</code>)</li>
</ul>

<p><strong>C kernel code does NOT call Rust for</strong>:</p>
<ul>
  <li>❌ Data processing or utility functions</li>
  <li>❌ Core subsystem services</li>
  <li>❌ General-purpose APIs</li>
</ul>

<p>The scope is <strong>strictly limited to module lifecycle management</strong>, but this is a critical integration point that enables all Rust drivers to work.</p>

<h2 id="evidence-1-rust-generates-c-compatible-symbols">Evidence 1: Rust Generates C-Compatible Symbols</h2>

<p>Every Rust module automatically generates C-callable functions via the <code class="language-plaintext highlighter-rouge">module!</code> macro family. Here’s the actual code from <code class="language-plaintext highlighter-rouge">rust/macros/module.rs</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/macros/module.rs (lines 260-290)</span>

<span class="c1">// For loadable modules (.ko files)</span>
<span class="nd">#[cfg(MODULE)]</span>
<span class="nd">#[doc(hidden)]</span>
<span class="nd">#[no_mangle]</span>
<span class="nd">#[link_section</span> <span class="nd">=</span> <span class="s">".init.text"</span><span class="nd">]</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">init_module</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="p">::</span><span class="nn">kernel</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// SAFETY: This function is inaccessible to the outside due to the double</span>
    <span class="c1">// module wrapping it. It is called exactly once by the C side via its</span>
    <span class="c1">// unique name.</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__init</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="nd">#[cfg(MODULE)]</span>
<span class="nd">#[doc(hidden)]</span>
<span class="nd">#[no_mangle]</span>
<span class="nd">#[link_section</span> <span class="nd">=</span> <span class="s">".exit.text"</span><span class="nd">]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">cleanup_module</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// SAFETY:</span>
    <span class="c1">// - This function is inaccessible to the outside due to the double</span>
    <span class="c1">//   module wrapping it. It is called exactly once by the C side via its</span>
    <span class="c1">//   unique name,</span>
    <span class="c1">// - furthermore it is only called after `init_module` has returned `0`</span>
    <span class="c1">//   (which delegates to `__init`).</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__exit</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// For built-in modules (compiled into kernel)</span>
<span class="nd">#[cfg(not(MODULE))]</span>
<span class="nd">#[doc(hidden)]</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="n">__</span><span class="o">&lt;</span><span class="n">ident</span><span class="o">&gt;</span><span class="nf">_init</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="p">::</span><span class="nn">kernel</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// SAFETY: This function is inaccessible to the outside due to the double</span>
    <span class="c1">// module wrapping it. It is called exactly once by the C side via its</span>
    <span class="c1">// placement above in the initcall section.</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__init</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="nd">#[cfg(not(MODULE))]</span>
<span class="nd">#[doc(hidden)]</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="n">__</span><span class="o">&lt;</span><span class="n">ident</span><span class="o">&gt;</span><span class="nf">_exit</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__exit</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="key-mechanisms-explained">Key Mechanisms Explained</h3>

<p><strong>1. <code class="language-plaintext highlighter-rouge">#[no_mangle]</code> Attribute</strong></p>

<p>Without this attribute, Rust applies name mangling:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>init_module → _ZN7mymodule11init_module17h&lt;hash&gt;E
</code></pre></div></div>

<p>With <code class="language-plaintext highlighter-rouge">#[no_mangle]</code>, the symbol name remains:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>init_module → init_module
</code></pre></div></div>

<p>This allows C code to find the function by its expected standard name.</p>

<p><strong>2. <code class="language-plaintext highlighter-rouge">extern "C"</code> Calling Convention</strong></p>

<p>This ensures:</p>
<ul>
  <li>Parameters passed according to C ABI (System V on x86_64)</li>
  <li>Stack frame layout matches C expectations</li>
  <li>Register usage follows C calling convention</li>
  <li>No Rust-specific calling overhead</li>
</ul>

<p><strong>3. <code class="language-plaintext highlighter-rouge">#[link_section = ".init.text"]</code></strong></p>

<p>Places the function in the ELF <code class="language-plaintext highlighter-rouge">.init.text</code> section, where the C kernel expects to find initialization code. This section can be freed after initialization completes.</p>

<h2 id="evidence-2-c-kernels-module-structure">Evidence 2: C Kernel’s Module Structure</h2>

<p>The C kernel defines a standard module structure that holds a function pointer to the init function:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// include/linux/module.h (line 470)</span>
<span class="k">struct</span> <span class="n">module</span> <span class="p">{</span>
    <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">name</span><span class="p">;</span>

    <span class="c1">// ... many fields omitted ...</span>

    <span class="cm">/* Startup function. */</span>
    <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">init</span><span class="p">)(</span><span class="kt">void</span><span class="p">);</span>  <span class="c1">// ← Function pointer to init_module</span>

    <span class="k">struct</span> <span class="n">module_memory</span> <span class="n">mem</span><span class="p">[</span><span class="n">MOD_MEM_NUM_TYPES</span><span class="p">]</span> <span class="n">__module_memory_align</span><span class="p">;</span>

    <span class="c1">// ... more fields ...</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">init</code> field is a <strong>function pointer</strong> that will be invoked during module loading.</p>

<h2 id="evidence-3-c-kernel-calls-the-function-pointer">Evidence 3: C Kernel Calls the Function Pointer</h2>

<p>When loading a module, the C kernel explicitly calls <code class="language-plaintext highlighter-rouge">mod-&gt;init</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// kernel/module/main.c (lines 2989-3020)</span>
<span class="k">static</span> <span class="n">noinline</span> <span class="kt">int</span> <span class="nf">do_init_module</span><span class="p">(</span><span class="k">struct</span> <span class="n">module</span> <span class="o">*</span><span class="n">mod</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">mod_initfree</span> <span class="o">*</span><span class="n">freeinit</span><span class="p">;</span>

    <span class="c1">// ... setup code omitted ...</span>

    <span class="n">freeinit</span> <span class="o">=</span> <span class="n">kmalloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">freeinit</span><span class="p">),</span> <span class="n">GFP_KERNEL</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">freeinit</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span>
        <span class="k">goto</span> <span class="n">fail</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">freeinit</span><span class="o">-&gt;</span><span class="n">init_text</span> <span class="o">=</span> <span class="n">mod</span><span class="o">-&gt;</span><span class="n">mem</span><span class="p">[</span><span class="n">MOD_INIT_TEXT</span><span class="p">].</span><span class="n">base</span><span class="p">;</span>
    <span class="n">freeinit</span><span class="o">-&gt;</span><span class="n">init_data</span> <span class="o">=</span> <span class="n">mod</span><span class="o">-&gt;</span><span class="n">mem</span><span class="p">[</span><span class="n">MOD_INIT_DATA</span><span class="p">].</span><span class="n">base</span><span class="p">;</span>
    <span class="n">freeinit</span><span class="o">-&gt;</span><span class="n">init_rodata</span> <span class="o">=</span> <span class="n">mod</span><span class="o">-&gt;</span><span class="n">mem</span><span class="p">[</span><span class="n">MOD_INIT_RODATA</span><span class="p">].</span><span class="n">base</span><span class="p">;</span>

    <span class="n">do_mod_ctors</span><span class="p">(</span><span class="n">mod</span><span class="p">);</span>

    <span class="cm">/* Start the module */</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">mod</span><span class="o">-&gt;</span><span class="n">init</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">do_one_initcall</span><span class="p">(</span><span class="n">mod</span><span class="o">-&gt;</span><span class="n">init</span><span class="p">);</span>  <span class="c1">// ← CALLS THE FUNCTION POINTER</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">goto</span> <span class="n">fail_free_freeinit</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// ... post-init code ...</span>

    <span class="n">mod</span><span class="o">-&gt;</span><span class="n">state</span> <span class="o">=</span> <span class="n">MODULE_STATE_LIVE</span><span class="p">;</span>

    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Key observation</strong>: <code class="language-plaintext highlighter-rouge">do_one_initcall(mod-&gt;init)</code> invokes the function pointer, which points to Rust’s <code class="language-plaintext highlighter-rouge">init_module()</code> for Rust modules.</p>

<h2 id="evidence-4-how-mod-init-gets-set">Evidence 4: How mod-&gt;init Gets Set</h2>

<p><strong>Critical question</strong>: How does <code class="language-plaintext highlighter-rouge">mod-&gt;init</code> point to the Rust function?</p>

<p><strong>Answer</strong>: Through ELF symbol binding at link time, not runtime lookup.</p>

<h3 id="the-elf-module-structure-layout">The ELF Module Structure Layout</h3>

<p>When compiling a kernel module (C or Rust), the linker creates a special section:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.gnu.linkonce.this_module
</code></pre></div></div>

<p>This section contains the <strong>complete binary layout</strong> of <code class="language-plaintext highlighter-rouge">struct module</code>, including:</p>
<ul>
  <li>Module name</li>
  <li>Module version</li>
  <li><strong>Init function pointer</strong> (already resolved to <code class="language-plaintext highlighter-rouge">init_module</code> address)</li>
  <li>Cleanup function pointer</li>
  <li>Other metadata</li>
</ul>

<h3 id="module-loading-process">Module Loading Process</h3>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// kernel/module/main.c (line 2901)</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">module</span> <span class="o">*</span><span class="nf">layout_and_allocate</span><span class="p">(</span><span class="k">struct</span> <span class="n">load_info</span> <span class="o">*</span><span class="n">info</span><span class="p">,</span> <span class="kt">int</span> <span class="n">flags</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">module</span> <span class="o">*</span><span class="n">mod</span><span class="p">;</span>
    <span class="c1">// ... layout calculation ...</span>

    <span class="cm">/* Module has been copied to its final place now: return it. */</span>
    <span class="n">mod</span> <span class="o">=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">info</span><span class="o">-&gt;</span><span class="n">sechdrs</span><span class="p">[</span><span class="n">info</span><span class="o">-&gt;</span><span class="n">index</span><span class="p">.</span><span class="n">mod</span><span class="p">].</span><span class="n">sh_addr</span><span class="p">;</span>
    <span class="c1">// ↑ Direct memory mapping - the module struct is already complete!</span>

    <span class="n">kmemleak_load_module</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">info</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">mod</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The kernel <strong>does NOT</strong> manually assign each field. Instead:</p>
<ol>
  <li>The <code class="language-plaintext highlighter-rouge">.gnu.linkonce.this_module</code> section is mapped into memory</li>
  <li>This section IS the <code class="language-plaintext highlighter-rouge">struct module</code></li>
  <li>All fields, including <code class="language-plaintext highlighter-rouge">init</code>, are <strong>already set by the linker</strong></li>
</ol>

<h3 id="symbol-resolution-at-link-time">Symbol Resolution at Link Time</h3>

<p>When linking a Rust module:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Simplified linking process</span>
ld <span class="nt">-r</span> <span class="se">\</span>
  <span class="nt">-o</span> rcpufreq_dt.ko <span class="se">\</span>
  rcpufreq_dt.o <span class="se">\</span>
  <span class="nt">--build-id</span>
</code></pre></div></div>

<p>The linker:</p>
<ol>
  <li>Finds the <code class="language-plaintext highlighter-rouge">init_module</code> symbol (address 0xXXXX)</li>
  <li>Writes this address into <code class="language-plaintext highlighter-rouge">module.init</code> field</li>
  <li>Embeds the complete struct in <code class="language-plaintext highlighter-rouge">.gnu.linkonce.this_module</code> section</li>
  <li>Writes everything to the <code class="language-plaintext highlighter-rouge">.ko</code> file</li>
</ol>

<h2 id="evidence-5-real-rust-driver-example">Evidence 5: Real Rust Driver Example</h2>

<p>Every Rust driver uses a macro that generates these functions. For example:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/cpufreq/rcpufreq_dt.rs (lines 215-221)</span>
<span class="nd">module_platform_driver!</span> <span class="p">{</span>
    <span class="k">type</span><span class="p">:</span> <span class="n">CPUFreqDTDriver</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="s">"cpufreq-dt"</span><span class="p">,</span>
    <span class="n">author</span><span class="p">:</span> <span class="s">"Viresh Kumar &lt;viresh.kumar@linaro.org&gt;"</span><span class="p">,</span>
    <span class="n">description</span><span class="p">:</span> <span class="s">"Generic CPUFreq DT driver"</span><span class="p">,</span>
    <span class="n">license</span><span class="p">:</span> <span class="s">"GPL v2"</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This macro expands to:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Generated code (conceptual)</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">init_module</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">i32</span> <span class="p">{</span>
    <span class="c1">// Register CPUFreqDTDriver as platform driver</span>
    <span class="nn">cpufreq</span><span class="p">::</span><span class="nn">Registration</span><span class="p">::</span><span class="o">&lt;</span><span class="n">CPUFreqDTDriver</span><span class="o">&gt;</span><span class="p">::</span><span class="nf">new_foreign_owned</span><span class="p">(</span><span class="cm">/*...*/</span><span class="p">)</span>
<span class="p">}</span>

<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">cleanup_module</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// Unregister driver</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="complete-call-flow">Complete Call Flow</h2>

<p>Let’s trace what happens when loading a Rust module:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. User executes:
   $ insmod rcpufreq_dt.ko

2. Kernel syscall:
   SYSCALL_DEFINE3(init_module, void __user *, umod, ...)
   ↓

3. Copy module to kernel memory:
   copy_module_from_user(umod, len, &amp;info)
   ↓

4. Parse ELF and allocate:
   mod = layout_and_allocate(&amp;info, flags)
   ↓ (maps .gnu.linkonce.this_module section)

5. mod struct is now complete:
   mod-&gt;init = &amp;init_module  // ← Already set by linker
   mod-&gt;name = "cpufreq-dt"
   // ... all fields populated ...
   ↓

6. Call initialization:
   do_init_module(mod)
   ↓

7. Invoke the function pointer:
   ret = do_one_initcall(mod-&gt;init)
   ↓ (Calls through function pointer)

8. EXECUTION TRANSFERS TO RUST:
   init_module() in Rust code executes
   ↓

9. Rust driver initializes:
   CPUFreqDTDriver::probe() registers driver
   ↓

10. Module is live:
    mod-&gt;state = MODULE_STATE_LIVE
</code></pre></div></div>

<p><strong>Critical insight</strong>: The C→Rust call at step 7 is a <strong>standard indirect function call</strong> through a function pointer, exactly the same as calling a C module’s init function.</p>

<h2 id="symbol-naming-convention">Symbol Naming Convention</h2>

<p>The kernel expects specific symbol names:</p>

<table>
  <thead>
    <tr>
      <th>Module Type</th>
      <th>Init Symbol</th>
      <th>Cleanup Symbol</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Loadable (.ko)</td>
      <td><code class="language-plaintext highlighter-rouge">init_module</code></td>
      <td><code class="language-plaintext highlighter-rouge">cleanup_module</code></td>
    </tr>
    <tr>
      <td>Built-in</td>
      <td><code class="language-plaintext highlighter-rouge">__&lt;name&gt;_init</code></td>
      <td><code class="language-plaintext highlighter-rouge">__&lt;name&gt;_exit</code></td>
    </tr>
  </tbody>
</table>

<p>Both C and Rust modules must follow this convention. Example:</p>

<p><strong>C module</strong>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/example/example_c.c</span>
<span class="k">static</span> <span class="kt">int</span> <span class="n">__init</span> <span class="nf">my_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// ...</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">void</span> <span class="n">__exit</span> <span class="nf">my_exit</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// ...</span>
<span class="p">}</span>

<span class="n">module_init</span><span class="p">(</span><span class="n">my_init</span><span class="p">);</span>  <span class="c1">// Expands to create init_module</span>
<span class="n">module_exit</span><span class="p">(</span><span class="n">my_exit</span><span class="p">);</span>  <span class="c1">// Expands to create cleanup_module</span>
</code></pre></div></div>

<p><strong>Rust module</strong>:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/example/example_rust.rs</span>
<span class="nd">module_platform_driver!</span> <span class="p">{</span>
    <span class="k">type</span><span class="p">:</span> <span class="n">MyDriver</span><span class="p">,</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
<span class="c1">// Macro generates init_module and cleanup_module</span>
</code></pre></div></div>

<p>Both produce the <strong>same ELF symbols</strong> that the kernel expects.</p>

<h2 id="verification-methods">Verification Methods</h2>

<p>If you have a compiled Rust kernel module, you can verify this mechanism directly:</p>

<h3 id="1-check-symbol-table">1. Check Symbol Table</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>nm drivers/cpufreq/rcpufreq_dt.ko | <span class="nb">grep </span>init_module
0000000000000000 T init_module
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">T</code> indicates a symbol in the <code class="language-plaintext highlighter-rouge">.text</code> section (code). Address <code class="language-plaintext highlighter-rouge">0000000000000000</code> is relative to the module’s base.</p>

<h3 id="2-examine-elf-sections">2. Examine ELF Sections</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>readelf <span class="nt">-S</span> drivers/cpufreq/rcpufreq_dt.ko | <span class="nb">grep</span> <span class="nt">-E</span> <span class="s2">"</span><span class="se">\.</span><span class="s2">init</span><span class="se">\.</span><span class="s2">text|</span><span class="se">\.</span><span class="s2">gnu</span><span class="se">\.</span><span class="s2">linkonce"</span>
  <span class="o">[</span>12] .init.text        PROGBITS         0000000000000000  00001000
  <span class="o">[</span>23] .gnu.linkonce.th  PROGBITS         0000000000000000  00003400
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">.gnu.linkonce.this_module</code> section contains the <code class="language-plaintext highlighter-rouge">struct module</code>.</p>

<h3 id="3-disassemble-init-function">3. Disassemble Init Function</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>objdump <span class="nt">-d</span> drivers/cpufreq/rcpufreq_dt.ko | <span class="nb">grep</span> <span class="nt">-A20</span> <span class="s2">"&lt;init_module&gt;:"</span>
0000000000000000 &lt;init_module&gt;:
   0:   push   %rbx
   1:   mov    %rsp,%rbx
   4:   sub    <span class="nv">$0x10</span>,%rsp
   <span class="c"># ... actual Rust code ...</span>
</code></pre></div></div>

<p>This shows the compiled Rust code at the <code class="language-plaintext highlighter-rouge">init_module</code> symbol.</p>

<h3 id="4-verify-module-structure">4. Verify Module Structure</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>readelf <span class="nt">-x</span> .gnu.linkonce.this_module drivers/cpufreq/rcpufreq_dt.ko
<span class="c"># Displays hex dump of the struct module</span>
<span class="c"># Bytes 0x470-0x478 (on 64-bit) contain the init function pointer</span>
</code></pre></div></div>

<h2 id="counter-proof-what-if-c-didnt-call-rust">Counter-Proof: What If C Didn’t Call Rust?</h2>

<p>If the C kernel did NOT call Rust’s <code class="language-plaintext highlighter-rouge">init_module()</code>, then:</p>

<p><strong>Expected failures</strong>:</p>
<ul>
  <li>❌ <code class="language-plaintext highlighter-rouge">insmod rcpufreq_dt.ko</code> would fail</li>
  <li>❌ Module would not initialize</li>
  <li>❌ Driver would not register with the subsystem</li>
  <li>❌ Device would not be managed by the driver</li>
  <li>❌ <code class="language-plaintext highlighter-rouge">lsmod</code> would not show the module as loaded</li>
</ul>

<p><strong>Actual reality</strong>:</p>
<ul>
  <li>✅ Rust modules load successfully</li>
  <li>✅ Drivers initialize and register</li>
  <li>✅ Devices are managed correctly</li>
  <li>✅ <code class="language-plaintext highlighter-rouge">lsmod</code> shows the module</li>
</ul>

<p><strong>Conclusion</strong>: C must be calling Rust’s <code class="language-plaintext highlighter-rouge">init_module()</code>, otherwise none of this would work.</p>

<h2 id="why-limited-to-module-lifecycle">Why Limited to Module Lifecycle?</h2>

<p>The current design restricts C→Rust calls to module initialization and cleanup because:</p>

<h3 id="1-well-defined-interface">1. Well-Defined Interface</h3>

<p>Module lifecycle has a simple, stable signature:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">init</span><span class="p">)(</span><span class="kt">void</span><span class="p">);</span>     <span class="c1">// No parameters, returns error code</span>
<span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">exit</span><span class="p">)(</span><span class="kt">void</span><span class="p">);</span>    <span class="c1">// No parameters, no return value</span>
</code></pre></div></div>

<p>This simplicity means:</p>
<ul>
  <li>No complex ABI negotiations</li>
  <li>No data structure marshaling</li>
  <li>No lifetime management across boundary</li>
  <li>Clear success/failure semantics</li>
</ul>

<h3 id="2-abi-stability">2. ABI Stability</h3>

<p>Only the <strong>entry points</strong> need stable ABI:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">init_module</code> signature: fixed forever</li>
  <li>Internal Rust code: can evolve freely</li>
  <li>No internal Rust APIs exposed to C</li>
</ul>

<p>If C depended on internal Rust APIs, those APIs would need eternal ABI stability.</p>

<h3 id="3-minimal-coupling">3. Minimal Coupling</h3>

<p>The C kernel core does NOT depend on Rust for functionality:</p>
<ul>
  <li>C kernel can load C modules without Rust support</li>
  <li>Rust support is purely additive</li>
  <li>Disabling Rust doesn’t break core kernel</li>
</ul>

<p>This keeps the dependency graph clean:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>C kernel core (independent)
    ↓ (can load)
C modules (independent)
    ↓ (can load)
Rust modules (depend on C kernel APIs)
</code></pre></div></div>

<h3 id="4-standard-module-pattern">4. Standard Module Pattern</h3>

<p>Both C and Rust modules follow the <strong>same loading mechanism</strong>:</p>
<ul>
  <li>Parse ELF</li>
  <li>Map sections</li>
  <li>Resolve relocations</li>
  <li>Call <code class="language-plaintext highlighter-rouge">mod-&gt;init()</code></li>
</ul>

<p>This uniformity means:</p>
<ul>
  <li>No special-case code for Rust</li>
  <li>Same security checks apply</li>
  <li>Same debugging tools work</li>
  <li>Same performance characteristics</li>
</ul>

<h2 id="future-expansion-possibilities">Future Expansion Possibilities</h2>

<p>While currently limited to module lifecycle, C→Rust calls could expand:</p>

<h3 id="1-callback-registration-2027-2028">1. Callback Registration (2027-2028)</h3>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Future possibility</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">rust_timer_callback</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">c_void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Safe Rust timer handler</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C code registers Rust callback</span>
<span class="n">setup_timer</span><span class="p">(</span><span class="o">&amp;</span><span class="n">timer</span><span class="p">,</span> <span class="n">rust_timer_callback</span><span class="p">,</span> <span class="n">data</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>Challenges</strong>:</p>
<ul>
  <li>Lifetime management (who owns the data?)</li>
  <li>Error propagation (panic handling)</li>
  <li>ABI stability (callback signatures must be stable)</li>
</ul>

<h3 id="2-subsystem-interfaces-2028-2030">2. Subsystem Interfaces (2028-2030)</h3>

<p>If a core subsystem is rewritten in Rust:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Future: Rust scheduler interface</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">sched_yield_to</span><span class="p">(</span><span class="n">task</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">task_struct</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// Safe scheduler implementation</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C code calls Rust scheduler</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">sched_yield_to</span><span class="p">(</span><span class="n">next_task</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>Requirements</strong>:</p>
<ul>
  <li>Proven stability in production</li>
  <li>Performance validation</li>
  <li>Gradual migration path</li>
  <li>Fallback to C implementation</li>
</ul>

<h3 id="3-utility-functions-2026-2027">3. Utility Functions (2026-2027)</h3>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Future: Safe allocator</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">rust_safe_kmalloc</span><span class="p">(</span>
    <span class="n">size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
    <span class="n">flags</span><span class="p">:</span> <span class="n">gfp_t</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">c_void</span> <span class="p">{</span>
    <span class="c1">// Memory-safe allocation with compile-time checks</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Benefits</strong>:</p>
<ul>
  <li>Gradual safety improvements</li>
  <li>No need to rewrite entire subsystems</li>
  <li>Easy to benchmark and validate</li>
</ul>

<h2 id="current-production-reality-2026">Current Production Reality (2026)</h2>

<p>As of Linux kernel 6.x, C→Rust calls are <strong>production reality</strong>:</p>

<p><strong>Active Rust drivers</strong>:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">drivers/net/phy/ax88796b_rust.ko</code> - Network PHY driver</li>
  <li><code class="language-plaintext highlighter-rouge">drivers/net/phy/qt2025.ko</code> - Marvell PHY driver</li>
  <li><code class="language-plaintext highlighter-rouge">drivers/cpufreq/rcpufreq_dt.ko</code> - CPU frequency driver</li>
  <li><code class="language-plaintext highlighter-rouge">drivers/block/rnull.ko</code> - Null block device</li>
  <li><code class="language-plaintext highlighter-rouge">drivers/gpu/drm/nova/*.ko</code> - NVIDIA GPU driver (13 modules)</li>
</ul>

<p><strong>Every one of these is loaded by C calling Rust’s <code class="language-plaintext highlighter-rouge">init_module()</code>.</strong></p>

<p>You can verify this on a running system:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>lsmod | <span class="nb">grep </span>_rust
ax88796b_rust          16384  0
<span class="nv">$ </span>modinfo ax88796b_rust
filename:       /lib/modules/.../ax88796b_rust.ko
license:        GPL
description:    Rust Asix PHYs driver
author:         FUJITA Tomonori
<span class="c"># This module's init_module() was called by C kernel</span>
</code></pre></div></div>

<h2 id="architectural-significance">Architectural Significance</h2>

<p>Understanding that C calls Rust reveals important architectural truths:</p>

<h3 id="1-bidirectional-integration">1. Bidirectional Integration</h3>

<p>The integration is not purely “Rust wraps C”:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Rust → C: For kernel services (most common)
C → Rust: For module lifecycle (critical integration point)
</code></pre></div></div>

<h3 id="2-standard-abi-compliance">2. Standard ABI Compliance</h3>

<p>Rust doesn’t require a special loader or runtime. It complies with:</p>
<ul>
  <li>Standard ELF module format</li>
  <li>Standard System V ABI</li>
  <li>Standard symbol conventions</li>
  <li>Standard linking process</li>
</ul>

<h3 id="3-production-grade-engineering">3. Production-Grade Engineering</h3>

<p>The <code class="language-plaintext highlighter-rouge">#[no_mangle]</code> + <code class="language-plaintext highlighter-rouge">extern "C"</code> pattern shows:</p>
<ul>
  <li>Careful ABI design</li>
  <li>Clear separation of concerns</li>
  <li>Pragmatic integration approach</li>
  <li>No magic or special-casing</li>
</ul>

<h3 id="4-evolution-path">4. Evolution Path</h3>

<p>The module lifecycle integration establishes:</p>
<ul>
  <li>Proven mechanism for C→Rust calls</li>
  <li>Template for future expansion</li>
  <li>Trust in production environment</li>
  <li>Foundation for deeper integration</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p><strong>Yes, C kernel code calls Rust functions</strong> - this is not theoretical but a production reality.</p>

<p><strong>Mechanism</strong>: Standard ELF symbol binding and function pointers</p>
<ul>
  <li>Rust generates C-compatible symbols via <code class="language-plaintext highlighter-rouge">#[no_mangle]</code> and <code class="language-plaintext highlighter-rouge">extern "C"</code></li>
  <li>Linker resolves symbols and populates <code class="language-plaintext highlighter-rouge">struct module</code></li>
  <li>C kernel calls through function pointers</li>
  <li>No runtime lookup, no special handling</li>
</ul>

<p><strong>Scope</strong>: Currently limited to module lifecycle</p>
<ul>
  <li>✅ Module initialization (<code class="language-plaintext highlighter-rouge">init_module</code>, <code class="language-plaintext highlighter-rouge">__&lt;name&gt;_init</code>)</li>
  <li>✅ Module cleanup (<code class="language-plaintext highlighter-rouge">cleanup_module</code>, <code class="language-plaintext highlighter-rouge">__&lt;name&gt;_exit</code>)</li>
  <li>❌ Not used for data processing or core services (yet)</li>
</ul>

<p><strong>Evidence</strong>:</p>
<ul>
  <li>Source code in <code class="language-plaintext highlighter-rouge">rust/macros/module.rs</code> generates the functions</li>
  <li>C code in <code class="language-plaintext highlighter-rouge">kernel/module/main.c</code> calls the functions</li>
  <li>Real drivers (<code class="language-plaintext highlighter-rouge">rcpufreq_dt.ko</code>, <code class="language-plaintext highlighter-rouge">ax88796b_rust.ko</code>) rely on this mechanism</li>
  <li>Working Rust modules prove C must be calling Rust</li>
</ul>

<p><strong>Future</strong>: The infrastructure exists for expansion</p>
<ul>
  <li>Callback registration</li>
  <li>Subsystem interfaces</li>
  <li>Utility functions</li>
</ul>

<p>But for now (2022-2026 phase), the focus is on proving Rust’s reliability in controlled scenarios before expanding the C→Rust interface.</p>

<p><strong>The key insight</strong>: Rust in Linux is not just a consumer of C APIs - it’s a cooperative participant where both languages call each other through well-defined, standard mechanisms.</p>

<hr />

<h1 id="c如何调用rustlinux内核模块生命周期深度剖析">C如何调用Rust：Linux内核模块生命周期深度剖析</h1>

<p><strong>摘要</strong>：本文对C内核代码如何通过模块加载机制调用Rust函数进行全面技术分析。基于Linux内核6.x的实际源代码，本文揭示了完整的证据链：从Rust的#[no_mangle]属性到C的函数指针调用，从ELF符号绑定到实际调用流程。我们证明C→Rust调用不是理论而是通过标准模块生命周期管理实现的生产现实。</p>

<h2 id="引言问题">引言：问题</h2>

<p>在关于Rust在Linux内核中的讨论中，经常出现一个基本的架构问题：</p>

<p><strong>“C内核代码能调用Rust函数吗？”</strong></p>

<p>这不仅仅是学术问题。理解C和Rust之间的调用方向对于理解以下内容至关重要：</p>
<ul>
  <li>集成架构</li>
  <li>ABI稳定性要求</li>
  <li>未来演进可能性</li>
  <li>安全和安全边界</li>
</ul>

<p>许多人认为Rust只是封装C API（单向），使Rust纯粹是C服务的”消费者”。然而，<strong>实际内核源代码揭示了不同的现实</strong>：C确实会调用Rust函数，特别是用于模块生命周期管理。</p>

<p>本文基于Linux内核6.x源代码提供完整的证据链。</p>

<h2 id="答案是的通过模块生命周期">答案：是的，通过模块生命周期</h2>

<p><strong>C内核代码确实调用Rust函数</strong>用于：</p>
<ul>
  <li>✅ 模块初始化（<code class="language-plaintext highlighter-rouge">init_module()</code>、<code class="language-plaintext highlighter-rouge">__&lt;name&gt;_init()</code>）</li>
  <li>✅ 模块清理（<code class="language-plaintext highlighter-rouge">cleanup_module()</code>、<code class="language-plaintext highlighter-rouge">__&lt;name&gt;_exit()</code>）</li>
</ul>

<p><strong>C内核代码不调用Rust用于</strong>：</p>
<ul>
  <li>❌ 数据处理或工具函数</li>
  <li>❌ 核心子系统服务</li>
  <li>❌ 通用API</li>
</ul>

<p>范围<strong>严格限制于模块生命周期管理</strong>，但这是使所有Rust驱动工作的关键集成点。</p>

<h2 id="证据1rust生成c兼容符号">证据1：Rust生成C兼容符号</h2>

<p>每个Rust模块通过<code class="language-plaintext highlighter-rouge">module!</code>宏系列自动生成C可调用函数。这是<code class="language-plaintext highlighter-rouge">rust/macros/module.rs</code>中的实际代码：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/macros/module.rs (260-290行)</span>

<span class="c1">// 对于可加载模块（.ko文件）</span>
<span class="nd">#[cfg(MODULE)]</span>
<span class="nd">#[doc(hidden)]</span>
<span class="nd">#[no_mangle]</span>
<span class="nd">#[link_section</span> <span class="nd">=</span> <span class="s">".init.text"</span><span class="nd">]</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">init_module</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="p">::</span><span class="nn">kernel</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// 安全性：由于双层模块包装，此函数对外部不可访问。</span>
    <span class="c1">// C侧通过其唯一名称恰好调用一次。</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__init</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="nd">#[cfg(MODULE)]</span>
<span class="nd">#[doc(hidden)]</span>
<span class="nd">#[no_mangle]</span>
<span class="nd">#[link_section</span> <span class="nd">=</span> <span class="s">".exit.text"</span><span class="nd">]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">cleanup_module</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// 安全性：</span>
    <span class="c1">// - 由于双层模块包装，此函数对外部不可访问。</span>
    <span class="c1">//   C侧通过其唯一名称恰好调用一次，</span>
    <span class="c1">// - 而且仅在`init_module`返回`0`后调用（委托给`__init`）。</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__exit</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// 对于内置模块（编译到内核中）</span>
<span class="nd">#[cfg(not(MODULE))]</span>
<span class="nd">#[doc(hidden)]</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="n">__</span><span class="o">&lt;</span><span class="n">ident</span><span class="o">&gt;</span><span class="nf">_init</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="p">::</span><span class="nn">kernel</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// 安全性：由于双层模块包装，此函数对外部不可访问。</span>
    <span class="c1">// C侧通过其在上述initcall段中的位置恰好调用一次。</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__init</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="nd">#[cfg(not(MODULE))]</span>
<span class="nd">#[doc(hidden)]</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="n">__</span><span class="o">&lt;</span><span class="n">ident</span><span class="o">&gt;</span><span class="nf">_exit</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__exit</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="关键机制解释">关键机制解释</h3>

<p><strong>1. <code class="language-plaintext highlighter-rouge">#[no_mangle]</code> 属性</strong></p>

<p>没有此属性，Rust会应用名称改编：</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>init_module → _ZN7mymodule11init_module17h&lt;hash&gt;E
</code></pre></div></div>

<p>使用<code class="language-plaintext highlighter-rouge">#[no_mangle]</code>，符号名保持为：</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>init_module → init_module
</code></pre></div></div>

<p>这使C代码能够通过其预期的标准名称找到函数。</p>

<p><strong>2. <code class="language-plaintext highlighter-rouge">extern "C"</code> 调用约定</strong></p>

<p>这确保：</p>
<ul>
  <li>参数按照C ABI传递（x86_64上的System V）</li>
  <li>栈帧布局符合C预期</li>
  <li>寄存器使用遵循C调用约定</li>
  <li>没有Rust特定的调用开销</li>
</ul>

<p><strong>3. <code class="language-plaintext highlighter-rouge">#[link_section = ".init.text"]</code></strong></p>

<p>将函数放在ELF <code class="language-plaintext highlighter-rouge">.init.text</code>段中，C内核期望在此找到初始化代码。此段可在初始化完成后释放。</p>

<h2 id="证据2c内核的模块结构">证据2：C内核的模块结构</h2>

<p>C内核定义了一个标准模块结构，持有指向init函数的函数指针：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// include/linux/module.h (第470行)</span>
<span class="k">struct</span> <span class="n">module</span> <span class="p">{</span>
    <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">name</span><span class="p">;</span>

    <span class="c1">// ... 省略许多字段 ...</span>

    <span class="cm">/* Startup function. */</span>
    <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">init</span><span class="p">)(</span><span class="kt">void</span><span class="p">);</span>  <span class="c1">// ← 指向init_module的函数指针</span>

    <span class="k">struct</span> <span class="n">module_memory</span> <span class="n">mem</span><span class="p">[</span><span class="n">MOD_MEM_NUM_TYPES</span><span class="p">]</span> <span class="n">__module_memory_align</span><span class="p">;</span>

    <span class="c1">// ... 更多字段 ...</span>
<span class="p">};</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">init</code>字段是一个<strong>函数指针</strong>，将在模块加载期间被调用。</p>

<h2 id="证据3c内核调用函数指针">证据3：C内核调用函数指针</h2>

<p>加载模块时，C内核显式调用<code class="language-plaintext highlighter-rouge">mod-&gt;init</code>：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// kernel/module/main.c (2989-3020行)</span>
<span class="k">static</span> <span class="n">noinline</span> <span class="kt">int</span> <span class="nf">do_init_module</span><span class="p">(</span><span class="k">struct</span> <span class="n">module</span> <span class="o">*</span><span class="n">mod</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">mod_initfree</span> <span class="o">*</span><span class="n">freeinit</span><span class="p">;</span>

    <span class="c1">// ... 省略设置代码 ...</span>

    <span class="n">freeinit</span> <span class="o">=</span> <span class="n">kmalloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">freeinit</span><span class="p">),</span> <span class="n">GFP_KERNEL</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">freeinit</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span>
        <span class="k">goto</span> <span class="n">fail</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">freeinit</span><span class="o">-&gt;</span><span class="n">init_text</span> <span class="o">=</span> <span class="n">mod</span><span class="o">-&gt;</span><span class="n">mem</span><span class="p">[</span><span class="n">MOD_INIT_TEXT</span><span class="p">].</span><span class="n">base</span><span class="p">;</span>
    <span class="n">freeinit</span><span class="o">-&gt;</span><span class="n">init_data</span> <span class="o">=</span> <span class="n">mod</span><span class="o">-&gt;</span><span class="n">mem</span><span class="p">[</span><span class="n">MOD_INIT_DATA</span><span class="p">].</span><span class="n">base</span><span class="p">;</span>
    <span class="n">freeinit</span><span class="o">-&gt;</span><span class="n">init_rodata</span> <span class="o">=</span> <span class="n">mod</span><span class="o">-&gt;</span><span class="n">mem</span><span class="p">[</span><span class="n">MOD_INIT_RODATA</span><span class="p">].</span><span class="n">base</span><span class="p">;</span>

    <span class="n">do_mod_ctors</span><span class="p">(</span><span class="n">mod</span><span class="p">);</span>

    <span class="cm">/* Start the module */</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">mod</span><span class="o">-&gt;</span><span class="n">init</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">do_one_initcall</span><span class="p">(</span><span class="n">mod</span><span class="o">-&gt;</span><span class="n">init</span><span class="p">);</span>  <span class="c1">// ← 调用函数指针</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">goto</span> <span class="n">fail_free_freeinit</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// ... 初始化后代码 ...</span>

    <span class="n">mod</span><span class="o">-&gt;</span><span class="n">state</span> <span class="o">=</span> <span class="n">MODULE_STATE_LIVE</span><span class="p">;</span>

    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>关键观察</strong>：<code class="language-plaintext highlighter-rouge">do_one_initcall(mod-&gt;init)</code>调用函数指针，对于Rust模块，它指向Rust的<code class="language-plaintext highlighter-rouge">init_module()</code>。</p>

<h2 id="证据4mod-init如何被设置">证据4：mod-&gt;init如何被设置</h2>

<p><strong>关键问题</strong>：<code class="language-plaintext highlighter-rouge">mod-&gt;init</code>如何指向Rust函数？</p>

<p><strong>答案</strong>：通过链接时的ELF符号绑定，而非运行时查找。</p>

<h3 id="elf模块结构布局">ELF模块结构布局</h3>

<p>编译内核模块（C或Rust）时，链接器创建一个特殊段：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.gnu.linkonce.this_module
</code></pre></div></div>

<p>此段包含<code class="language-plaintext highlighter-rouge">struct module</code>的<strong>完整二进制布局</strong>，包括：</p>
<ul>
  <li>模块名</li>
  <li>模块版本</li>
  <li><strong>Init函数指针</strong>（已解析为<code class="language-plaintext highlighter-rouge">init_module</code>地址）</li>
  <li>清理函数指针</li>
  <li>其他元数据</li>
</ul>

<h3 id="模块加载过程">模块加载过程</h3>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// kernel/module/main.c (第2901行)</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">module</span> <span class="o">*</span><span class="nf">layout_and_allocate</span><span class="p">(</span><span class="k">struct</span> <span class="n">load_info</span> <span class="o">*</span><span class="n">info</span><span class="p">,</span> <span class="kt">int</span> <span class="n">flags</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">struct</span> <span class="n">module</span> <span class="o">*</span><span class="n">mod</span><span class="p">;</span>
    <span class="c1">// ... 布局计算 ...</span>

    <span class="cm">/* Module has been copied to its final place now: return it. */</span>
    <span class="n">mod</span> <span class="o">=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="n">info</span><span class="o">-&gt;</span><span class="n">sechdrs</span><span class="p">[</span><span class="n">info</span><span class="o">-&gt;</span><span class="n">index</span><span class="p">.</span><span class="n">mod</span><span class="p">].</span><span class="n">sh_addr</span><span class="p">;</span>
    <span class="c1">// ↑ 直接内存映射 - 模块结构体已经完整！</span>

    <span class="n">kmemleak_load_module</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">info</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">mod</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>内核<strong>不会</strong>手动分配每个字段。相反：</p>
<ol>
  <li><code class="language-plaintext highlighter-rouge">.gnu.linkonce.this_module</code>段被映射到内存</li>
  <li>此段<strong>就是</strong><code class="language-plaintext highlighter-rouge">struct module</code></li>
  <li>所有字段，包括<code class="language-plaintext highlighter-rouge">init</code>，<strong>已由链接器设置</strong></li>
</ol>

<h3 id="链接时符号解析">链接时符号解析</h3>

<p>链接Rust模块时：</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 简化的链接过程</span>
ld <span class="nt">-r</span> <span class="se">\</span>
  <span class="nt">-o</span> rcpufreq_dt.ko <span class="se">\</span>
  rcpufreq_dt.o <span class="se">\</span>
  <span class="nt">--build-id</span>
</code></pre></div></div>

<p>链接器：</p>
<ol>
  <li>找到<code class="language-plaintext highlighter-rouge">init_module</code>符号（地址0xXXXX）</li>
  <li>将此地址写入<code class="language-plaintext highlighter-rouge">module.init</code>字段</li>
  <li>将完整结构体嵌入<code class="language-plaintext highlighter-rouge">.gnu.linkonce.this_module</code>段</li>
  <li>将所有内容写入<code class="language-plaintext highlighter-rouge">.ko</code>文件</li>
</ol>

<h2 id="证据5真实rust驱动示例">证据5：真实Rust驱动示例</h2>

<p>每个Rust驱动都使用生成这些函数的宏。例如：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/cpufreq/rcpufreq_dt.rs (215-221行)</span>
<span class="nd">module_platform_driver!</span> <span class="p">{</span>
    <span class="k">type</span><span class="p">:</span> <span class="n">CPUFreqDTDriver</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="s">"cpufreq-dt"</span><span class="p">,</span>
    <span class="n">author</span><span class="p">:</span> <span class="s">"Viresh Kumar &lt;viresh.kumar@linaro.org&gt;"</span><span class="p">,</span>
    <span class="n">description</span><span class="p">:</span> <span class="s">"Generic CPUFreq DT driver"</span><span class="p">,</span>
    <span class="n">license</span><span class="p">:</span> <span class="s">"GPL v2"</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>此宏展开为：</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 生成的代码（概念）</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">init_module</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">i32</span> <span class="p">{</span>
    <span class="c1">// 注册CPUFreqDTDriver为平台驱动</span>
    <span class="nn">cpufreq</span><span class="p">::</span><span class="nn">Registration</span><span class="p">::</span><span class="o">&lt;</span><span class="n">CPUFreqDTDriver</span><span class="o">&gt;</span><span class="p">::</span><span class="nf">new_foreign_owned</span><span class="p">(</span><span class="cm">/*...*/</span><span class="p">)</span>
<span class="p">}</span>

<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">cleanup_module</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// 注销驱动</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="完整调用流程">完整调用流程</h2>

<p>让我们追踪加载Rust模块时发生的事情：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. 用户执行：
   $ insmod rcpufreq_dt.ko

2. 内核系统调用：
   SYSCALL_DEFINE3(init_module, void __user *, umod, ...)
   ↓

3. 复制模块到内核内存：
   copy_module_from_user(umod, len, &amp;info)
   ↓

4. 解析ELF并分配：
   mod = layout_and_allocate(&amp;info, flags)
   ↓ (映射.gnu.linkonce.this_module段)

5. mod结构体现在完整：
   mod-&gt;init = &amp;init_module  // ← 已由链接器设置
   mod-&gt;name = "cpufreq-dt"
   // ... 所有字段已填充 ...
   ↓

6. 调用初始化：
   do_init_module(mod)
   ↓

7. 调用函数指针：
   ret = do_one_initcall(mod-&gt;init)
   ↓ (通过函数指针调用)

8. 执行转移到RUST：
   Rust代码中的init_module()执行
   ↓

9. Rust驱动初始化：
   CPUFreqDTDriver::probe()注册驱动
   ↓

10. 模块已激活：
    mod-&gt;state = MODULE_STATE_LIVE
</code></pre></div></div>

<p><strong>关键洞察</strong>：步骤7的C→Rust调用是通过函数指针的<strong>标准间接函数调用</strong>，与调用C模块的init函数完全相同。</p>

<h2 id="符号命名约定">符号命名约定</h2>

<p>内核期望特定的符号名：</p>

<table>
  <thead>
    <tr>
      <th>模块类型</th>
      <th>Init符号</th>
      <th>清理符号</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>可加载（.ko）</td>
      <td><code class="language-plaintext highlighter-rouge">init_module</code></td>
      <td><code class="language-plaintext highlighter-rouge">cleanup_module</code></td>
    </tr>
    <tr>
      <td>内置</td>
      <td><code class="language-plaintext highlighter-rouge">__&lt;name&gt;_init</code></td>
      <td><code class="language-plaintext highlighter-rouge">__&lt;name&gt;_exit</code></td>
    </tr>
  </tbody>
</table>

<p>C和Rust模块都必须遵循此约定。</p>

<h2 id="验证方法">验证方法</h2>

<p>如果您有已编译的Rust内核模块，可以直接验证此机制：</p>

<h3 id="1-检查符号表">1. 检查符号表</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>nm drivers/cpufreq/rcpufreq_dt.ko | <span class="nb">grep </span>init_module
0000000000000000 T init_module
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">T</code>表示<code class="language-plaintext highlighter-rouge">.text</code>段（代码）中的符号。地址<code class="language-plaintext highlighter-rouge">0000000000000000</code>相对于模块基址。</p>

<h3 id="2-检查elf段">2. 检查ELF段</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>readelf <span class="nt">-S</span> drivers/cpufreq/rcpufreq_dt.ko | <span class="nb">grep</span> <span class="nt">-E</span> <span class="s2">"</span><span class="se">\.</span><span class="s2">init</span><span class="se">\.</span><span class="s2">text|</span><span class="se">\.</span><span class="s2">gnu</span><span class="se">\.</span><span class="s2">linkonce"</span>
  <span class="o">[</span>12] .init.text        PROGBITS         0000000000000000  00001000
  <span class="o">[</span>23] .gnu.linkonce.th  PROGBITS         0000000000000000  00003400
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">.gnu.linkonce.this_module</code>段包含<code class="language-plaintext highlighter-rouge">struct module</code>。</p>

<h3 id="3-反汇编init函数">3. 反汇编Init函数</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>objdump <span class="nt">-d</span> drivers/cpufreq/rcpufreq_dt.ko | <span class="nb">grep</span> <span class="nt">-A20</span> <span class="s2">"&lt;init_module&gt;:"</span>
0000000000000000 &lt;init_module&gt;:
   0:   push   %rbx
   1:   mov    %rsp,%rbx
   4:   sub    <span class="nv">$0x10</span>,%rsp
   <span class="c"># ... 实际Rust代码 ...</span>
</code></pre></div></div>

<p>这显示了<code class="language-plaintext highlighter-rouge">init_module</code>符号处编译的Rust代码。</p>

<h2 id="反证如果c不调用rust会怎样">反证：如果C不调用Rust会怎样？</h2>

<p>如果C内核不调用Rust的<code class="language-plaintext highlighter-rouge">init_module()</code>，那么：</p>

<p><strong>预期失败</strong>：</p>
<ul>
  <li>❌ <code class="language-plaintext highlighter-rouge">insmod rcpufreq_dt.ko</code>会失败</li>
  <li>❌ 模块不会初始化</li>
  <li>❌ 驱动不会向子系统注册</li>
  <li>❌ 设备不会由驱动管理</li>
  <li>❌ <code class="language-plaintext highlighter-rouge">lsmod</code>不会显示已加载的模块</li>
</ul>

<p><strong>实际现实</strong>：</p>
<ul>
  <li>✅ Rust模块成功加载</li>
  <li>✅ 驱动初始化并注册</li>
  <li>✅ 设备被正确管理</li>
  <li>✅ <code class="language-plaintext highlighter-rouge">lsmod</code>显示模块</li>
</ul>

<p><strong>结论</strong>：C必定调用了Rust的<code class="language-plaintext highlighter-rouge">init_module()</code>，否则这些都不会工作。</p>

<h2 id="为何限于模块生命周期">为何限于模块生命周期？</h2>

<p>当前设计将C→Rust调用限制于模块初始化和清理，因为：</p>

<h3 id="1-良好定义的接口">1. 良好定义的接口</h3>

<p>模块生命周期具有简单、稳定的签名：</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">init</span><span class="p">)(</span><span class="kt">void</span><span class="p">);</span>     <span class="c1">// 无参数，返回错误码</span>
<span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="n">exit</span><span class="p">)(</span><span class="kt">void</span><span class="p">);</span>    <span class="c1">// 无参数，无返回值</span>
</code></pre></div></div>

<p>这种简单性意味着：</p>
<ul>
  <li>无需复杂的ABI协商</li>
  <li>无需数据结构编组</li>
  <li>无需跨边界生命周期管理</li>
  <li>清晰的成功/失败语义</li>
</ul>

<h3 id="2-abi稳定性">2. ABI稳定性</h3>

<p>只有<strong>入口点</strong>需要稳定的ABI：</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">init_module</code>签名：永远固定</li>
  <li>内部Rust代码：可以自由演进</li>
  <li>无内部Rust API暴露给C</li>
</ul>

<p>如果C依赖内部Rust API，这些API将需要永久的ABI稳定性。</p>

<h3 id="3-最小耦合">3. 最小耦合</h3>

<p>C内核核心不依赖Rust的功能：</p>
<ul>
  <li>C内核可以加载C模块而无需Rust支持</li>
  <li>Rust支持纯粹是增量的</li>
  <li>禁用Rust不会破坏核心内核</li>
</ul>

<p>这保持了依赖图的清晰：</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>C内核核心（独立）
    ↓ (可以加载)
C模块（独立）
    ↓ (可以加载)
Rust模块（依赖C内核API）
</code></pre></div></div>

<h3 id="4-标准模块模式">4. 标准模块模式</h3>

<p>C和Rust模块遵循<strong>相同的加载机制</strong>：</p>
<ul>
  <li>解析ELF</li>
  <li>映射段</li>
  <li>解析重定位</li>
  <li>调用<code class="language-plaintext highlighter-rouge">mod-&gt;init()</code></li>
</ul>

<p>这种统一性意味着：</p>
<ul>
  <li>Rust无需特殊处理代码</li>
  <li>应用相同的安全检查</li>
  <li>相同的调试工具有效</li>
  <li>相同的性能特性</li>
</ul>

<h2 id="未来扩展可能性">未来扩展可能性</h2>

<p>虽然目前限于模块生命周期，C→Rust调用可能扩展：</p>

<h3 id="1-回调注册2027-2028">1. 回调注册（2027-2028）</h3>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 未来可能性</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">rust_timer_callback</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">c_void</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// 安全的Rust定时器处理程序</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C代码注册Rust回调</span>
<span class="n">setup_timer</span><span class="p">(</span><span class="o">&amp;</span><span class="n">timer</span><span class="p">,</span> <span class="n">rust_timer_callback</span><span class="p">,</span> <span class="n">data</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>挑战</strong>：</p>
<ul>
  <li>生命周期管理（谁拥有数据？）</li>
  <li>错误传播（panic处理）</li>
  <li>ABI稳定性（回调签名必须稳定）</li>
</ul>

<h3 id="2-子系统接口2028-2030">2. 子系统接口（2028-2030）</h3>

<p>如果核心子系统用Rust重写：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 未来：Rust调度器接口</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">sched_yield_to</span><span class="p">(</span><span class="n">task</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">task_struct</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// 安全的调度器实现</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C代码调用Rust调度器</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">sched_yield_to</span><span class="p">(</span><span class="n">next_task</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>要求</strong>：</p>
<ul>
  <li>在生产中证明稳定性</li>
  <li>性能验证</li>
  <li>渐进式迁移路径</li>
  <li>回退到C实现</li>
</ul>

<h3 id="3-工具函数2026-2027">3. 工具函数（2026-2027）</h3>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 未来：安全分配器</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">rust_safe_kmalloc</span><span class="p">(</span>
    <span class="n">size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
    <span class="n">flags</span><span class="p">:</span> <span class="n">gfp_t</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">c_void</span> <span class="p">{</span>
    <span class="c1">// 具有编译时检查的内存安全分配</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>好处</strong>：</p>
<ul>
  <li>渐进式安全改进</li>
  <li>无需重写整个子系统</li>
  <li>易于基准测试和验证</li>
</ul>

<h2 id="当前生产现实2026">当前生产现实（2026）</h2>

<p>截至Linux内核6.x，C→Rust调用是<strong>生产现实</strong>：</p>

<p><strong>活跃的Rust驱动</strong>：</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">drivers/net/phy/ax88796b_rust.ko</code> - 网络PHY驱动</li>
  <li><code class="language-plaintext highlighter-rouge">drivers/net/phy/qt2025.ko</code> - Marvell PHY驱动</li>
  <li><code class="language-plaintext highlighter-rouge">drivers/cpufreq/rcpufreq_dt.ko</code> - CPU频率驱动</li>
  <li><code class="language-plaintext highlighter-rouge">drivers/block/rnull.ko</code> - Null块设备</li>
  <li><code class="language-plaintext highlighter-rouge">drivers/gpu/drm/nova/*.ko</code> - NVIDIA GPU驱动（13个模块）</li>
</ul>

<p><strong>这些都是通过C调用Rust的<code class="language-plaintext highlighter-rouge">init_module()</code>加载的。</strong></p>

<p>您可以在运行的系统上验证：</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>lsmod | <span class="nb">grep </span>_rust
ax88796b_rust          16384  0
<span class="nv">$ </span>modinfo ax88796b_rust
filename:       /lib/modules/.../ax88796b_rust.ko
license:        GPL
description:    Rust Asix PHYs driver
author:         FUJITA Tomonori
<span class="c"># 此模块的init_module()由C内核调用</span>
</code></pre></div></div>

<h2 id="架构意义">架构意义</h2>

<p>理解C调用Rust揭示了重要的架构真相：</p>

<h3 id="1-双向集成">1. 双向集成</h3>

<p>集成不是纯粹的”Rust封装C”：</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Rust → C：用于内核服务（最常见）
C → Rust：用于模块生命周期（关键集成点）
</code></pre></div></div>

<h3 id="2-标准abi合规">2. 标准ABI合规</h3>

<p>Rust不需要特殊加载器或运行时。它符合：</p>
<ul>
  <li>标准ELF模块格式</li>
  <li>标准System V ABI</li>
  <li>标准符号约定</li>
  <li>标准链接过程</li>
</ul>

<h3 id="3-生产级工程">3. 生产级工程</h3>

<p><code class="language-plaintext highlighter-rouge">#[no_mangle]</code> + <code class="language-plaintext highlighter-rouge">extern "C"</code>模式显示：</p>
<ul>
  <li>精心的ABI设计</li>
  <li>清晰的关注点分离</li>
  <li>务实的集成方法</li>
  <li>无魔法或特殊处理</li>
</ul>

<h3 id="4-演进路径">4. 演进路径</h3>

<p>模块生命周期集成建立了：</p>
<ul>
  <li>经过验证的C→Rust调用机制</li>
  <li>未来扩展的模板</li>
  <li>在生产环境中的信任</li>
  <li>更深入集成的基础</li>
</ul>

<h2 id="结论">结论</h2>

<p><strong>是的，C内核代码调用Rust函数</strong> - 这不是理论而是生产现实。</p>

<p><strong>机制</strong>：标准ELF符号绑定和函数指针</p>
<ul>
  <li>Rust通过<code class="language-plaintext highlighter-rouge">#[no_mangle]</code>和<code class="language-plaintext highlighter-rouge">extern "C"</code>生成C兼容符号</li>
  <li>链接器解析符号并填充<code class="language-plaintext highlighter-rouge">struct module</code></li>
  <li>C内核通过函数指针调用</li>
  <li>无运行时查找，无特殊处理</li>
</ul>

<p><strong>范围</strong>：目前限于模块生命周期</p>
<ul>
  <li>✅ 模块初始化（<code class="language-plaintext highlighter-rouge">init_module</code>、<code class="language-plaintext highlighter-rouge">__&lt;name&gt;_init</code>）</li>
  <li>✅ 模块清理（<code class="language-plaintext highlighter-rouge">cleanup_module</code>、<code class="language-plaintext highlighter-rouge">__&lt;name&gt;_exit</code>）</li>
  <li>❌ 尚未用于数据处理或核心服务</li>
</ul>

<p><strong>证据</strong>：</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">rust/macros/module.rs</code>中的源代码生成函数</li>
  <li><code class="language-plaintext highlighter-rouge">kernel/module/main.c</code>中的C代码调用函数</li>
  <li>真实驱动（<code class="language-plaintext highlighter-rouge">rcpufreq_dt.ko</code>、<code class="language-plaintext highlighter-rouge">ax88796b_rust.ko</code>）依赖此机制</li>
  <li>工作的Rust模块证明C必定调用Rust</li>
</ul>

<p><strong>未来</strong>：扩展基础设施已存在</p>
<ul>
  <li>回调注册</li>
  <li>子系统接口</li>
  <li>工具函数</li>
</ul>

<p>但目前（2022-2026阶段），重点是在扩展C→Rust接口之前，在受控场景中证明Rust的可靠性。</p>

<p><strong>关键洞察</strong>：Linux中的Rust不仅仅是C API的消费者 - 它是一个合作参与者，两种语言通过良好定义的标准机制相互调用。</p>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[A comprehensive technical analysis of how C kernel code calls Rust functions through the module loading mechanism. Using actual Linux kernel source code (6.x), this article reveals the complete evidence chain: from Rust’s #[no_mangle] attribute to C’s function pointer invocation, from ELF symbol binding to the actual call flow. We demonstrate that C→Rust calls are not theoretical but a production reality implemented through standard module lifecycle management.]]></summary></entry><entry><title type="html">TcpRest: Reviving a 2012 RPC Framework with AI-Assisted Development</title><link href="https://weinan.io/2026/02/18/tcprest-revival-with-ai.html" rel="alternate" type="text/html" title="TcpRest: Reviving a 2012 RPC Framework with AI-Assisted Development" /><published>2026-02-18T00:00:00+00:00</published><updated>2026-02-18T00:00:00+00:00</updated><id>https://weinan.io/2026/02/18/tcprest-revival-with-ai</id><content type="html" xml:base="https://weinan.io/2026/02/18/tcprest-revival-with-ai.html"><![CDATA[<p>A 14-year journey from experimental project to production-ready framework. How AI tools transformed legacy code into a modern, modular, zero-dependency RPC solution.</p>

<h2 id="english-version">English Version</h2>

<h3 id="the-journey-from-2012-to-2026">The Journey: From 2012 to 2026</h3>

<p>In 2012, I created TcpRest as an experimental RPC (Remote Procedure Call) framework. The concept was simple but powerful: transform Plain Old Java Objects (POJOs) into network-accessible services over TCP, without the overhead of HTTP. At the time, it was a learning exercise exploring how to build lightweight RPC mechanisms in Java.</p>

<p>For over a decade, the project sat unmaintained - a time capsule of 2012-era Java development practices. Then, in 2024-2026, something changed: the emergence of AI-powered development tools like GitHub Copilot and Claude made it possible to revive and modernize this codebase in ways that would have taken months of manual work.</p>

<p><strong>Project Link:</strong> <a href="https://github.com/liweinan/tcprest">https://github.com/liweinan/tcprest</a></p>

<h3 id="what-changed-the-ai-assisted-renaissance">What Changed: The AI-Assisted Renaissance</h3>

<h4 id="1-bug-fixes-and-code-quality">1. <strong>Bug Fixes and Code Quality</strong></h4>

<p>The first phase involved systematically identifying and fixing bugs that had accumulated over the years. AI tools accelerated this process by:</p>

<ul>
  <li><strong>Pattern detection</strong>: Identifying similar bugs across the codebase</li>
  <li><strong>Test generation</strong>: Creating comprehensive test cases to catch edge cases</li>
  <li><strong>Refactoring suggestions</strong>: Proposing cleaner implementations for problematic code</li>
</ul>

<p>Example improvements:</p>
<ul>
  <li>Fixed null pointer handling in protocol parsing</li>
  <li>Resolved thread safety issues in the original server implementation</li>
  <li>Corrected resource cleanup in connection handling</li>
</ul>

<h4 id="2-modular-architecture-refactoring">2. <strong>Modular Architecture Refactoring</strong></h4>

<p>The original monolithic structure was split into focused Maven modules, each with a clear purpose:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tcprest-parent/
├── tcprest-commons/      # Zero-dependency core (protocol, client, mappers)
├── tcprest-singlethread/ # Simple blocking I/O server with SSL
├── tcprest-nio/          # Non-blocking I/O server (no SSL)
└── tcprest-netty/        # High-performance Netty server with SSL
</code></pre></div></div>

<p><strong>Key principle:</strong> The <code class="language-plaintext highlighter-rouge">tcprest-commons</code> module has <strong>zero runtime dependencies</strong> - only JDK built-in APIs. This minimizes dependency conflicts and security vulnerabilities.</p>

<p>This modular design allows developers to choose exactly what they need:</p>
<ul>
  <li><strong>Client-only applications</strong>: Just include <code class="language-plaintext highlighter-rouge">tcprest-commons</code> (zero deps)</li>
  <li><strong>Low-concurrency server</strong>: Add <code class="language-plaintext highlighter-rouge">tcprest-singlethread</code> with SSL support</li>
  <li><strong>High-concurrency production</strong>: Use <code class="language-plaintext highlighter-rouge">tcprest-netty</code> for thousands of concurrent connections</li>
</ul>

<h4 id="3-protocol-v2-with-modern-features">3. <strong>Protocol v2 with Modern Features</strong></h4>

<p>The original protocol was extended to support modern Java development needs:</p>

<p><strong>Method Overloading Support:</strong></p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">Calculator</span> <span class="o">{</span>
    <span class="kt">int</span> <span class="nf">add</span><span class="o">(</span><span class="kt">int</span> <span class="n">a</span><span class="o">,</span> <span class="kt">int</span> <span class="n">b</span><span class="o">);</span>           <span class="c1">// Integer addition</span>
    <span class="kt">double</span> <span class="nf">add</span><span class="o">(</span><span class="kt">double</span> <span class="n">a</span><span class="o">,</span> <span class="kt">double</span> <span class="n">b</span><span class="o">);</span>   <span class="c1">// Double addition</span>
    <span class="nc">String</span> <span class="nf">add</span><span class="o">(</span><span class="nc">String</span> <span class="n">a</span><span class="o">,</span> <span class="nc">String</span> <span class="n">b</span><span class="o">);</span>   <span class="c1">// String concatenation</span>
<span class="o">}</span>
</code></pre></div></div>

<p><strong>Proper Exception Propagation:</strong></p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Server throws exception</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">validateAge</span><span class="o">(</span><span class="kt">int</span> <span class="n">age</span><span class="o">)</span> <span class="o">{</span>
    <span class="k">if</span> <span class="o">(</span><span class="n">age</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="o">)</span> <span class="k">throw</span> <span class="k">new</span> <span class="nc">ValidationException</span><span class="o">(</span><span class="s">"Age must be non-negative"</span><span class="o">);</span>
<span class="o">}</span>

<span class="c1">// Client receives it</span>
<span class="k">try</span> <span class="o">{</span>
    <span class="n">service</span><span class="o">.</span><span class="na">validateAge</span><span class="o">(-</span><span class="mi">1</span><span class="o">);</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">RuntimeException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
    <span class="c1">// Exception message preserved across the wire</span>
<span class="o">}</span>
</code></pre></div></div>

<h4 id="4-data-compression">4. <strong>Data Compression</strong></h4>

<p>GZIP compression was added to reduce bandwidth usage, with smart threshold-based activation:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">server</span><span class="o">.</span><span class="na">enableCompression</span><span class="o">();</span>  <span class="c1">// Auto-compress messages &gt; 512 bytes</span>

<span class="c1">// Or customize</span>
<span class="nc">CompressionConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">CompressionConfig</span><span class="o">(</span>
    <span class="kc">true</span><span class="o">,</span>   <span class="c1">// enabled</span>
    <span class="mi">1024</span><span class="o">,</span>   <span class="c1">// only compress if message &gt; 1KB</span>
    <span class="mi">9</span>       <span class="c1">// compression level (1=fastest, 9=best)</span>
<span class="o">);</span>
</code></pre></div></div>

<p>Benchmark results show 85-96% reduction for text-heavy payloads.</p>

<h4 id="5-ssltls-security">5. <strong>SSL/TLS Security</strong></h4>

<p>Production-grade security was added:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Server with mutual TLS</span>
<span class="nc">SSLParam</span> <span class="n">serverSSL</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">SSLParam</span><span class="o">();</span>
<span class="n">serverSSL</span><span class="o">.</span><span class="na">setKeyStorePath</span><span class="o">(</span><span class="s">"classpath:server_ks"</span><span class="o">);</span>
<span class="n">serverSSL</span><span class="o">.</span><span class="na">setNeedClientAuth</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>  <span class="c1">// Require client certificate</span>

<span class="nc">TcpRestServer</span> <span class="n">server</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">NettyTcpRestServer</span><span class="o">(</span><span class="mi">8443</span><span class="o">,</span> <span class="n">sslParam</span><span class="o">);</span>
</code></pre></div></div>

<h4 id="6-comprehensive-documentation">6. <strong>Comprehensive Documentation</strong></h4>

<p>AI tools helped generate three detailed documentation files:</p>
<ul>
  <li><strong>PROTOCOL.md</strong>: Wire protocol specification and compatibility</li>
  <li><strong>ARCHITECTURE.md</strong>: Design decisions and implementation details</li>
  <li><strong>CLAUDE.md</strong>: Development guidelines and coding standards</li>
</ul>

<h4 id="7-dependency-updates">7. <strong>Dependency Updates</strong></h4>

<p>All dependencies were updated to their latest stable versions:</p>
<ul>
  <li>Java 11+ (from Java 1.7)</li>
  <li>Netty 4.1.131.Final (high-performance networking)</li>
  <li>TestNG 7.12.0 (modern testing framework)</li>
  <li>SLF4J 2.0.16 (logging facade)</li>
</ul>

<h3 id="performance-characteristics">Performance Characteristics</h3>

<p>TcpRest offers significant advantages over traditional HTTP REST:</p>

<table>
  <thead>
    <tr>
      <th>Aspect</th>
      <th>HTTP REST</th>
      <th>TcpRest (Netty)</th>
      <th>Improvement</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Protocol Overhead</strong></td>
      <td>200-300 bytes</td>
      <td>50-100 bytes</td>
      <td>60-80% reduction</td>
    </tr>
    <tr>
      <td><strong>Serialization</strong></td>
      <td>JSON text</td>
      <td>Binary/Custom</td>
      <td>50-70% smaller</td>
    </tr>
    <tr>
      <td><strong>Compression</strong></td>
      <td>Usually disabled</td>
      <td>Optional GZIP</td>
      <td>80-95% reduction</td>
    </tr>
    <tr>
      <td><strong>Latency</strong></td>
      <td>3-6ms</td>
      <td>0.6-0.9ms</td>
      <td>3-10x faster</td>
    </tr>
    <tr>
      <td><strong>Concurrency</strong></td>
      <td>~1000 threads</td>
      <td>~10-20 threads</td>
      <td>10-50x better</td>
    </tr>
  </tbody>
</table>

<p><strong>Best for</strong>: Microservice internal communication, high-concurrency scenarios (10k+ connections), low-latency requirements (&lt;5ms).</p>

<h3 id="technical-highlights">Technical Highlights</h3>

<h4 id="zero-copy-serialization">Zero-Copy Serialization</h4>

<p>Classes implementing <code class="language-plaintext highlighter-rouge">Serializable</code> work automatically without custom mappers:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">User</span> <span class="kd">implements</span> <span class="nc">Serializable</span> <span class="o">{</span>
    <span class="kd">private</span> <span class="kt">int</span> <span class="n">id</span><span class="o">;</span>
    <span class="kd">private</span> <span class="nc">String</span> <span class="n">name</span><span class="o">;</span>
    <span class="kd">private</span> <span class="kd">transient</span> <span class="nc">String</span> <span class="n">password</span><span class="o">;</span>  <span class="c1">// Auto-excluded</span>
<span class="o">}</span>

<span class="c1">// No mapper needed!</span>
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">UserService</span> <span class="o">{</span>
    <span class="nc">User</span> <span class="nf">getUser</span><span class="o">(</span><span class="kt">int</span> <span class="n">id</span><span class="o">);</span>
    <span class="nc">List</span><span class="o">&lt;</span><span class="nc">User</span><span class="o">&gt;</span> <span class="nf">getAllUsers</span><span class="o">();</span>
<span class="o">}</span>
</code></pre></div></div>

<h4 id="network-binding-for-security">Network Binding for Security</h4>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Production: Bind to specific IP (not 0.0.0.0)</span>
<span class="nc">TcpRestServer</span> <span class="n">server</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">NettyTcpRestServer</span><span class="o">(</span><span class="mi">8443</span><span class="o">,</span> <span class="s">"127.0.0.1"</span><span class="o">,</span> <span class="n">sslParam</span><span class="o">);</span>
</code></pre></div></div>

<h4 id="backward-compatibility">Backward Compatibility</h4>

<p>The server can accept both Protocol v1 and v2 clients simultaneously:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">server</span><span class="o">.</span><span class="na">setProtocolVersion</span><span class="o">(</span><span class="nc">ProtocolVersion</span><span class="o">.</span><span class="na">AUTO</span><span class="o">);</span>  <span class="c1">// Default</span>
</code></pre></div></div>

<h3 id="the-role-of-ai-in-this-revival">The Role of AI in This Revival</h3>

<p>AI tools didn’t just “write code” - they acted as:</p>

<ol>
  <li><strong>Architectural consultants</strong>: Suggesting modular structures and design patterns</li>
  <li><strong>Test engineers</strong>: Generating comprehensive test suites with edge cases</li>
  <li><strong>Documentation writers</strong>: Creating clear, detailed technical documentation</li>
  <li><strong>Code reviewers</strong>: Identifying anti-patterns and suggesting improvements</li>
  <li><strong>Migration assistants</strong>: Helping upgrade dependencies and APIs</li>
</ol>

<p><strong>Key insight</strong>: The human role shifted from “writing code” to “architectural design, requirement analysis, and quality control.” I defined <strong>what needed to be done</strong>, and AI accelerated <strong>how it got done</strong>.</p>

<h3 id="what-this-demonstrates">What This Demonstrates</h3>

<p>This project is a case study in how AI tools are reshaping software development:</p>

<ul>
  <li><strong>Legacy code revival</strong>: Projects that would have been abandoned can be modernized</li>
  <li><strong>Documentation debt payoff</strong>: Comprehensive docs become feasible</li>
  <li><strong>Testing coverage</strong>: Achieving thorough test coverage becomes practical</li>
  <li><strong>Refactoring confidence</strong>: Large-scale restructuring becomes less risky</li>
</ul>

<p><strong>The future</strong>: Developers become “AI conductors” - focusing on architecture, requirements, and quality while delegating implementation details to AI collaborators.</p>

<h3 id="try-it-yourself">Try It Yourself</h3>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">&lt;!-- Maven dependency --&gt;</span>
<span class="nt">&lt;dependency&gt;</span>
    <span class="nt">&lt;groupId&gt;</span>cn.huiwings<span class="nt">&lt;/groupId&gt;</span>
    <span class="nt">&lt;artifactId&gt;</span>tcprest-netty<span class="nt">&lt;/artifactId&gt;</span>
    <span class="nt">&lt;version&gt;</span>1.0-SNAPSHOT<span class="nt">&lt;/version&gt;</span>
<span class="nt">&lt;/dependency&gt;</span>
</code></pre></div></div>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Server</span>
<span class="nc">TcpRestServer</span> <span class="n">server</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">NettyTcpRestServer</span><span class="o">(</span><span class="mi">8001</span><span class="o">);</span>
<span class="n">server</span><span class="o">.</span><span class="na">addSingletonResource</span><span class="o">(</span><span class="k">new</span> <span class="nc">MyServiceImpl</span><span class="o">());</span>
<span class="n">server</span><span class="o">.</span><span class="na">up</span><span class="o">();</span>

<span class="c1">// Client</span>
<span class="nc">TcpRestClientFactory</span> <span class="n">factory</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">TcpRestClientFactory</span><span class="o">(</span>
    <span class="nc">MyService</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="s">"localhost"</span><span class="o">,</span> <span class="mi">8001</span>
<span class="o">);</span>
<span class="nc">MyService</span> <span class="n">client</span> <span class="o">=</span> <span class="n">factory</span><span class="o">.</span><span class="na">getClient</span><span class="o">();</span>
<span class="n">client</span><span class="o">.</span><span class="na">myMethod</span><span class="o">();</span>  <span class="c1">// Transparent RPC!</span>
</code></pre></div></div>

<h3 id="conclusion">Conclusion</h3>

<p>TcpRest’s journey from a 2012 experiment to a 2026 production-ready framework demonstrates the transformative power of AI-assisted development. What would have required months of tedious refactoring, testing, and documentation work was accomplished in weeks through human-AI collaboration.</p>

<p>The result is not just a modernized codebase, but a genuinely useful framework for high-performance RPC scenarios where HTTP overhead is unacceptable.</p>

<p><strong>The lesson</strong>: Good ideas don’t have to die. With AI tools, legacy projects can find new life.</p>

<hr />

<h2 id="中文版本">中文版本</h2>

<h3 id="旅程从2012到2026">旅程：从2012到2026</h3>

<p>2012年，我创建了TcpRest作为一个实验性的RPC（远程过程调用）框架。这个想法简单但强大：将普通的Java对象（POJOs）转换为通过TCP网络访问的服务，无需HTTP的开销。当时，这只是一个探索如何在Java中构建轻量级RPC机制的学习练习。</p>

<p>十多年来，这个项目一直没有维护——成为了2012年时代Java开发实践的时间胶囊。然后，在2024-2026年，情况发生了变化：GitHub Copilot和Claude等AI驱动的开发工具的出现，使得以一种原本需要数月手动工作才能完成的方式来复兴和现代化这个代码库成为可能。</p>

<p><strong>项目链接:</strong> <a href="https://github.com/liweinan/tcprest">https://github.com/liweinan/tcprest</a></p>

<h3 id="改变了什么ai辅助的文艺复兴">改变了什么：AI辅助的文艺复兴</h3>

<h4 id="1-bug修复和代码质量提升">1. <strong>Bug修复和代码质量提升</strong></h4>

<p>第一阶段涉及系统地识别和修复多年来积累的bug。AI工具通过以下方式加速了这个过程：</p>

<ul>
  <li><strong>模式检测</strong>：识别代码库中的类似bug</li>
  <li><strong>测试生成</strong>：创建全面的测试用例以捕获边界情况</li>
  <li><strong>重构建议</strong>：为有问题的代码提出更清晰的实现</li>
</ul>

<p>改进示例：</p>
<ul>
  <li>修复了协议解析中的空指针处理</li>
  <li>解决了原始服务器实现中的线程安全问题</li>
  <li>纠正了连接处理中的资源清理问题</li>
</ul>

<h4 id="2-模块化架构重构">2. <strong>模块化架构重构</strong></h4>

<p>原始的单体结构被拆分为专注的Maven模块，每个模块都有明确的目的：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tcprest-parent/
├── tcprest-commons/      # 零依赖核心（协议、客户端、映射器）
├── tcprest-singlethread/ # 简单的阻塞I/O服务器，支持SSL
├── tcprest-nio/          # 非阻塞I/O服务器（不支持SSL）
└── tcprest-netty/        # 高性能Netty服务器，支持SSL
</code></pre></div></div>

<p><strong>核心原则：</strong> <code class="language-plaintext highlighter-rouge">tcprest-commons</code>模块<strong>零运行时依赖</strong>——仅使用JDK内置API。这最大限度地减少了依赖冲突和安全漏洞。</p>

<p>这种模块化设计允许开发者精确选择他们需要的内容：</p>
<ul>
  <li><strong>纯客户端应用</strong>：只需包含<code class="language-plaintext highlighter-rouge">tcprest-commons</code>（零依赖）</li>
  <li><strong>低并发服务器</strong>：添加<code class="language-plaintext highlighter-rouge">tcprest-singlethread</code>，支持SSL</li>
  <li><strong>高并发生产环境</strong>：使用<code class="language-plaintext highlighter-rouge">tcprest-netty</code>处理数千个并发连接</li>
</ul>

<h4 id="3-具有现代特性的protocol-v2">3. <strong>具有现代特性的Protocol v2</strong></h4>

<p>原始协议被扩展以支持现代Java开发需求：</p>

<p><strong>方法重载支持：</strong></p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">Calculator</span> <span class="o">{</span>
    <span class="kt">int</span> <span class="nf">add</span><span class="o">(</span><span class="kt">int</span> <span class="n">a</span><span class="o">,</span> <span class="kt">int</span> <span class="n">b</span><span class="o">);</span>           <span class="c1">// 整数加法</span>
    <span class="kt">double</span> <span class="nf">add</span><span class="o">(</span><span class="kt">double</span> <span class="n">a</span><span class="o">,</span> <span class="kt">double</span> <span class="n">b</span><span class="o">);</span>   <span class="c1">// 双精度加法</span>
    <span class="nc">String</span> <span class="nf">add</span><span class="o">(</span><span class="nc">String</span> <span class="n">a</span><span class="o">,</span> <span class="nc">String</span> <span class="n">b</span><span class="o">);</span>   <span class="c1">// 字符串连接</span>
<span class="o">}</span>
</code></pre></div></div>

<p><strong>正确的异常传播：</strong></p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 服务器抛出异常</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">validateAge</span><span class="o">(</span><span class="kt">int</span> <span class="n">age</span><span class="o">)</span> <span class="o">{</span>
    <span class="k">if</span> <span class="o">(</span><span class="n">age</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="o">)</span> <span class="k">throw</span> <span class="k">new</span> <span class="nc">ValidationException</span><span class="o">(</span><span class="s">"年龄必须非负"</span><span class="o">);</span>
<span class="o">}</span>

<span class="c1">// 客户端接收异常</span>
<span class="k">try</span> <span class="o">{</span>
    <span class="n">service</span><span class="o">.</span><span class="na">validateAge</span><span class="o">(-</span><span class="mi">1</span><span class="o">);</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">RuntimeException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
    <span class="c1">// 异常消息通过网络保留</span>
<span class="o">}</span>
</code></pre></div></div>

<h4 id="4-数据压缩">4. <strong>数据压缩</strong></h4>

<p>添加了GZIP压缩以减少带宽使用，并具有智能的基于阈值的激活：</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">server</span><span class="o">.</span><span class="na">enableCompression</span><span class="o">();</span>  <span class="c1">// 自动压缩大于512字节的消息</span>

<span class="c1">// 或自定义</span>
<span class="nc">CompressionConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">CompressionConfig</span><span class="o">(</span>
    <span class="kc">true</span><span class="o">,</span>   <span class="c1">// 启用</span>
    <span class="mi">1024</span><span class="o">,</span>   <span class="c1">// 仅当消息&gt;1KB时压缩</span>
    <span class="mi">9</span>       <span class="c1">// 压缩级别（1=最快，9=最佳）</span>
<span class="o">);</span>
</code></pre></div></div>

<p>基准测试结果显示，对于文本密集型负载，压缩率为85-96%。</p>

<h4 id="5-ssltls安全性">5. <strong>SSL/TLS安全性</strong></h4>

<p>添加了生产级安全性：</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 带双向TLS的服务器</span>
<span class="nc">SSLParam</span> <span class="n">serverSSL</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">SSLParam</span><span class="o">();</span>
<span class="n">serverSSL</span><span class="o">.</span><span class="na">setKeyStorePath</span><span class="o">(</span><span class="s">"classpath:server_ks"</span><span class="o">);</span>
<span class="n">serverSSL</span><span class="o">.</span><span class="na">setNeedClientAuth</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>  <span class="c1">// 要求客户端证书</span>

<span class="nc">TcpRestServer</span> <span class="n">server</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">NettyTcpRestServer</span><span class="o">(</span><span class="mi">8443</span><span class="o">,</span> <span class="n">sslParam</span><span class="o">);</span>
</code></pre></div></div>

<h4 id="6-全面的文档">6. <strong>全面的文档</strong></h4>

<p>AI工具帮助生成了三个详细的文档文件：</p>
<ul>
  <li><strong>PROTOCOL.md</strong>：线协议规范和兼容性</li>
  <li><strong>ARCHITECTURE.md</strong>：设计决策和实现细节</li>
  <li><strong>CLAUDE.md</strong>：开发指南和编码标准</li>
</ul>

<h4 id="7-依赖更新">7. <strong>依赖更新</strong></h4>

<p>所有依赖都更新到了最新的稳定版本：</p>
<ul>
  <li>Java 11+（从Java 1.7）</li>
  <li>Netty 4.1.131.Final（高性能网络）</li>
  <li>TestNG 7.12.0（现代测试框架）</li>
  <li>SLF4J 2.0.16（日志门面）</li>
</ul>

<h3 id="性能特征">性能特征</h3>

<p>TcpRest相比传统的HTTP REST具有显著优势：</p>

<table>
  <thead>
    <tr>
      <th>方面</th>
      <th>HTTP REST</th>
      <th>TcpRest (Netty)</th>
      <th>改进</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>协议开销</strong></td>
      <td>200-300字节</td>
      <td>50-100字节</td>
      <td>减少60-80%</td>
    </tr>
    <tr>
      <td><strong>序列化</strong></td>
      <td>JSON文本</td>
      <td>二进制/自定义</td>
      <td>减小50-70%</td>
    </tr>
    <tr>
      <td><strong>压缩</strong></td>
      <td>通常禁用</td>
      <td>可选GZIP</td>
      <td>减少80-95%</td>
    </tr>
    <tr>
      <td><strong>延迟</strong></td>
      <td>3-6ms</td>
      <td>0.6-0.9ms</td>
      <td>快3-10倍</td>
    </tr>
    <tr>
      <td><strong>并发性</strong></td>
      <td>~1000线程</td>
      <td>~10-20线程</td>
      <td>好10-50倍</td>
    </tr>
  </tbody>
</table>

<p><strong>最适合</strong>：微服务内部通信、高并发场景（10k+连接）、低延迟要求（&lt;5ms）。</p>

<h3 id="技术亮点">技术亮点</h3>

<h4 id="零拷贝序列化">零拷贝序列化</h4>

<p>实现<code class="language-plaintext highlighter-rouge">Serializable</code>的类无需自定义映射器即可自动工作：</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">User</span> <span class="kd">implements</span> <span class="nc">Serializable</span> <span class="o">{</span>
    <span class="kd">private</span> <span class="kt">int</span> <span class="n">id</span><span class="o">;</span>
    <span class="kd">private</span> <span class="nc">String</span> <span class="n">name</span><span class="o">;</span>
    <span class="kd">private</span> <span class="kd">transient</span> <span class="nc">String</span> <span class="n">password</span><span class="o">;</span>  <span class="c1">// 自动排除</span>
<span class="o">}</span>

<span class="c1">// 无需映射器！</span>
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">UserService</span> <span class="o">{</span>
    <span class="nc">User</span> <span class="nf">getUser</span><span class="o">(</span><span class="kt">int</span> <span class="n">id</span><span class="o">);</span>
    <span class="nc">List</span><span class="o">&lt;</span><span class="nc">User</span><span class="o">&gt;</span> <span class="nf">getAllUsers</span><span class="o">();</span>
<span class="o">}</span>
</code></pre></div></div>

<h4 id="网络绑定以提高安全性">网络绑定以提高安全性</h4>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 生产环境：绑定到特定IP（而非0.0.0.0）</span>
<span class="nc">TcpRestServer</span> <span class="n">server</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">NettyTcpRestServer</span><span class="o">(</span><span class="mi">8443</span><span class="o">,</span> <span class="s">"127.0.0.1"</span><span class="o">,</span> <span class="n">sslParam</span><span class="o">);</span>
</code></pre></div></div>

<h4 id="向后兼容性">向后兼容性</h4>

<p>服务器可以同时接受Protocol v1和v2客户端：</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">server</span><span class="o">.</span><span class="na">setProtocolVersion</span><span class="o">(</span><span class="nc">ProtocolVersion</span><span class="o">.</span><span class="na">AUTO</span><span class="o">);</span>  <span class="c1">// 默认</span>
</code></pre></div></div>

<h3 id="ai在这次复兴中的角色">AI在这次复兴中的角色</h3>

<p>AI工具不仅仅是”编写代码”——它们充当了：</p>

<ol>
  <li><strong>架构顾问</strong>：建议模块化结构和设计模式</li>
  <li><strong>测试工程师</strong>：生成包含边界情况的全面测试套件</li>
  <li><strong>文档撰写者</strong>：创建清晰、详细的技术文档</li>
  <li><strong>代码审查者</strong>：识别反模式并提出改进建议</li>
  <li><strong>迁移助手</strong>：帮助升级依赖和API</li>
</ol>

<p><strong>关键见解</strong>：人类的角色从”编写代码”转变为”架构设计、需求分析和质量控制”。我定义了<strong>需要做什么</strong>，AI加速了<strong>如何完成</strong>。</p>

<h3 id="这展示了什么">这展示了什么</h3>

<p>这个项目是AI工具如何重塑软件开发的案例研究：</p>

<ul>
  <li><strong>遗留代码复兴</strong>：本来会被废弃的项目可以被现代化</li>
  <li><strong>文档债务偿还</strong>：全面的文档变得可行</li>
  <li><strong>测试覆盖率</strong>：实现彻底的测试覆盖变得实用</li>
  <li><strong>重构信心</strong>：大规模重构变得风险更小</li>
</ul>

<p><strong>未来</strong>：开发者成为”AI指挥者”——专注于架构、需求和质量，同时将实现细节委托给AI协作者。</p>

<h3 id="试一试">试一试</h3>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">&lt;!-- Maven依赖 --&gt;</span>
<span class="nt">&lt;dependency&gt;</span>
    <span class="nt">&lt;groupId&gt;</span>cn.huiwings<span class="nt">&lt;/groupId&gt;</span>
    <span class="nt">&lt;artifactId&gt;</span>tcprest-netty<span class="nt">&lt;/artifactId&gt;</span>
    <span class="nt">&lt;version&gt;</span>1.0-SNAPSHOT<span class="nt">&lt;/version&gt;</span>
<span class="nt">&lt;/dependency&gt;</span>
</code></pre></div></div>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 服务器</span>
<span class="nc">TcpRestServer</span> <span class="n">server</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">NettyTcpRestServer</span><span class="o">(</span><span class="mi">8001</span><span class="o">);</span>
<span class="n">server</span><span class="o">.</span><span class="na">addSingletonResource</span><span class="o">(</span><span class="k">new</span> <span class="nc">MyServiceImpl</span><span class="o">());</span>
<span class="n">server</span><span class="o">.</span><span class="na">up</span><span class="o">();</span>

<span class="c1">// 客户端</span>
<span class="nc">TcpRestClientFactory</span> <span class="n">factory</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">TcpRestClientFactory</span><span class="o">(</span>
    <span class="nc">MyService</span><span class="o">.</span><span class="na">class</span><span class="o">,</span> <span class="s">"localhost"</span><span class="o">,</span> <span class="mi">8001</span>
<span class="o">);</span>
<span class="nc">MyService</span> <span class="n">client</span> <span class="o">=</span> <span class="n">factory</span><span class="o">.</span><span class="na">getClient</span><span class="o">();</span>
<span class="n">client</span><span class="o">.</span><span class="na">myMethod</span><span class="o">();</span>  <span class="c1">// 透明的RPC！</span>
</code></pre></div></div>

<h3 id="结论">结论</h3>

<p>TcpRest从2012年的实验到2026年生产就绪框架的旅程，展示了AI辅助开发的变革力量。原本需要数月繁琐的重构、测试和文档工作，通过人机协作在几周内完成。</p>

<p>结果不仅仅是现代化的代码库，而是一个真正有用的框架，适用于HTTP开销不可接受的高性能RPC场景。</p>

<p><strong>教训</strong>：好的想法不必消亡。借助AI工具，遗留项目可以焕发新生。</p>

<h2 id="references">References</h2>

<ul>
  <li><strong>Project Repository</strong>: <a href="https://github.com/liweinan/tcprest">https://github.com/liweinan/tcprest</a></li>
  <li><strong>Protocol Documentation</strong>: <a href="https://github.com/liweinan/tcprest/blob/main/PROTOCOL.md">PROTOCOL.md</a></li>
  <li><strong>Architecture Guide</strong>: <a href="https://github.com/liweinan/tcprest/blob/main/ARCHITECTURE.md">ARCHITECTURE.md</a></li>
  <li><strong>Development Guidelines</strong>: <a href="https://github.com/liweinan/tcprest/blob/main/CLAUDE.md">CLAUDE.md</a></li>
</ul>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[A 14-year journey from experimental project to production-ready framework. How AI tools transformed legacy code into a modern, modular, zero-dependency RPC solution.]]></summary></entry><entry><title type="html">解剖Tyr：Linux首个Rust GPU驱动的代码实战分析</title><link href="https://weinan.io/2026/02/18/tyr-rust-gpu-driver-anatomy.html" rel="alternate" type="text/html" title="解剖Tyr：Linux首个Rust GPU驱动的代码实战分析" /><published>2026-02-18T00:00:00+00:00</published><updated>2026-02-18T00:00:00+00:00</updated><id>https://weinan.io/2026/02/18/tyr-rust-gpu-driver-anatomy</id><content type="html" xml:base="https://weinan.io/2026/02/18/tyr-rust-gpu-driver-anatomy.html"><![CDATA[<p>2025年9月，Linux内核合并了首个Rust GPU驱动Tyr（commit cf4fd52e3236），标志着Rust在内核图形子系统的正式落地。本文通过剖析Tyr的实际代码，展示Rust GPU驱动的架构设计、DRM抽象层的具体实现，以及从Panthor（C）移植到Tyr（Rust）的关键挑战。这是Rust在Linux内核从抽象到实战的完整技术案例。</p>

<h2 id="引言从理论到代码">引言：从理论到代码</h2>

<p>在前两篇文章中，我们分析了Rust在Linux内核的整体状态和ABI稳定性<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>。这些讨论主要停留在宏观层面：代码统计、政策争议、技术保证。但<strong>实际的Rust内核代码长什么样？如何与C内核交互？遇到了哪些具体挑战？</strong></p>

<p>本文通过解剖<strong>Tyr项目</strong>——Linux内核首个合并的Rust GPU驱动——来回答这些问题。我们将：</p>

<ol>
  <li><strong>分析实际代码</strong>：基于commit cf4fd52e3236的真实代码</li>
  <li><strong>对比C/Rust实现</strong>：Panthor（C）vs Tyr（Rust）</li>
  <li><strong>揭示技术挑战</strong>：为何上游代码如此精简？</li>
  <li><strong>理解DRM抽象层</strong>：<code class="language-plaintext highlighter-rouge">rust/kernel/drm/</code>如何工作？</li>
</ol>

<p>这不是一篇科普文章，而是<strong>代码级的技术剖析</strong>。</p>

<hr />

<h2 id="背景知识gpu驱动与drm子系统">背景知识：GPU驱动与DRM子系统</h2>

<h3 id="gpu驱动的双层架构">GPU驱动的双层架构</h3>

<p>在Linux中，GPU驱动分为两个部分：</p>

<p><strong>1. 内核模式驱动（Kernel-mode Driver）</strong></p>
<ul>
  <li>位置：Linux内核的<code class="language-plaintext highlighter-rouge">drivers/gpu/drm/</code>目录</li>
  <li>职责：
    <ul>
      <li>管理GPU硬件资源</li>
      <li>提供内存分配和映射</li>
      <li>处理多进程的GPU访问调度</li>
      <li>电源管理和故障恢复</li>
    </ul>
  </li>
  <li><strong>Tyr就是内核模式驱动</strong></li>
</ul>

<p><strong>2. 用户模式驱动（Userspace Driver）</strong></p>
<ul>
  <li>典型代表：Mesa（实现OpenGL/Vulkan）</li>
  <li>职责：
    <ul>
      <li>实现图形API（OpenGL、Vulkan等）</li>
      <li>将API调用翻译为GPU命令</li>
      <li>着色器编译</li>
    </ul>
  </li>
  <li>通过ioctl与内核驱动通信</li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────────────────┐
│   游戏/应用程序              │
└──────────┬──────────────────┘
           │ OpenGL/Vulkan API
           ↓
┌─────────────────────────────┐
│   Mesa (用户模式驱动)        │
│   - panfrost_dri.so (Panthor)│
└──────────┬──────────────────┘
           │ ioctl系统调用
           ↓
┌─────────────────────────────┐
│   Tyr (内核模式驱动)         │ ← 本文重点
│   drivers/gpu/drm/tyr/      │
└──────────┬──────────────────┘
           │ 硬件寄存器操作
           ↓
┌─────────────────────────────┐
│   Mali GPU 硬件              │
└─────────────────────────────┘
</code></pre></div></div>

<h3 id="什么是drm子系统">什么是DRM子系统？</h3>

<p><strong>DRM（Direct Rendering Manager）</strong> 是Linux内核的图形子系统，管理所有GPU驱动。</p>

<p><strong>核心组件</strong>：</p>

<ol>
  <li><strong>DRM Core</strong>（<code class="language-plaintext highlighter-rouge">drivers/gpu/drm/drm_*.c</code>）
    <ul>
      <li>提供通用GPU管理框架</li>
      <li>处理显示模式设置（KMS）</li>
      <li>管理图形内存（GEM）</li>
    </ul>
  </li>
  <li><strong>GEM（Graphics Execution Manager）</strong>
    <ul>
      <li>GPU内存对象管理</li>
      <li>处理CPU/GPU内存共享</li>
      <li>管理用户空间映射（mmap）</li>
    </ul>
  </li>
  <li><strong>GPUVM（GPU Virtual Address Management）</strong>
    <ul>
      <li>GPU虚拟地址空间管理</li>
      <li>类似CPU的虚拟内存</li>
      <li>支持多进程GPU内存隔离</li>
    </ul>
  </li>
  <li><strong>GPU调度器</strong>（drm_gpu_scheduler）
    <ul>
      <li>管理GPU任务队列</li>
      <li>处理任务依赖关系</li>
      <li>实现公平调度</li>
    </ul>
  </li>
</ol>

<p><strong>学习资源</strong>：</p>
<ul>
  <li><a href="https://docs.kernel.org/gpu/drm-internals.html">DRM Internals Documentation</a> - 官方内核文档</li>
  <li><a href="https://bootlin.com/doc/training/graphics/graphics-slides.pdf">Linux Graphics Stack Overview</a> - Bootlin培训材料</li>
  <li><a href="https://01.org/linuxgraphics/gfx-docs/drm/">DRM/KMS Overview</a> - Intel图形文档</li>
</ul>

<h3 id="arm-mali-gpu架构">ARM Mali GPU架构</h3>

<p><strong>Mali GPU家族</strong>：</p>

<table>
  <thead>
    <tr>
      <th>架构</th>
      <th>代表型号</th>
      <th>特点</th>
      <th>Tyr支持</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Midgard</strong></td>
      <td>Mali-T760</td>
      <td>早期架构</td>
      <td>❌</td>
    </tr>
    <tr>
      <td><strong>Bifrost</strong></td>
      <td>Mali-G71, G52</td>
      <td>引入四边形着色器</td>
      <td>❌</td>
    </tr>
    <tr>
      <td><strong>Valhall</strong></td>
      <td>Mali-G77, G78</td>
      <td>超标量引擎</td>
      <td>✅</td>
    </tr>
    <tr>
      <td><strong>Valhall CSF</strong></td>
      <td><strong>Mali-G610, G710</strong></td>
      <td>命令流前端</td>
      <td>✅ <strong>Tyr目标</strong></td>
    </tr>
  </tbody>
</table>

<p><strong>CSF（Command Stream Frontend）架构</strong>：</p>
<ul>
  <li>GPU固件（MCU）直接管理任务调度</li>
  <li>驱动通过命令流与固件通信</li>
  <li>减轻CPU负担，提高效率</li>
</ul>

<p><strong>Mali GPU硬件结构</strong>：</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────────────────────────┐
│  MCU (Microcontroller Unit)        │
│  - Cortex-M7核心 @ GHz             │
│  - 运行固件，管理GPU调度            │
└──────────┬──────────────────────────┘
           │ 内部总线
┌──────────┴──────────────────────────┐
│  Shader Cores (着色器核心)          │
│  - 执行计算/图形任务                │
│  - 多核并行（8-32核心不等）          │
└──────────┬──────────────────────────┘
           │
┌──────────┴──────────────────────────┐
│  L2 Cache + Memory System           │
│  - 共享L2缓存                       │
│  - MMU（内存管理单元）               │
└─────────────────────────────────────┘
</code></pre></div></div>

<p><strong>MCU固件的关键作用</strong>：</p>
<ul>
  <li><strong>任务调度</strong>：决定哪个任务在哪个核心执行</li>
  <li><strong>电源管理</strong>：动态开关核心和调节频率</li>
  <li><strong>故障恢复</strong>：检测和处理GPU挂起</li>
</ul>

<p><strong>学习资源</strong>：</p>
<ul>
  <li><a href="https://developer.arm.com/Processors/Mali-G610">ARM Mali GPU Datasheet</a> - 官方技术文档</li>
  <li><a href="https://docs.mesa3d.org/drivers/panfrost.html">Panfrost Driver Documentation</a> - Mesa的Mali开源驱动文档</li>
  <li><a href="https://community.arm.com/arm-community-blogs/b/graphics-gaming-and-vr-blog">Mali GPU Architecture</a> - ARM官方博客</li>
</ul>

<h3 id="为什么要用rust重写gpu驱动">为什么要用Rust重写GPU驱动？</h3>

<p><strong>GPU驱动的复杂性</strong>：</p>

<ol>
  <li><strong>海量内存操作</strong>：
    <ul>
      <li>CPU/GPU共享内存</li>
      <li>用户空间映射（mmap）</li>
      <li>DMA传输</li>
      <li><strong>常见bug</strong>：use-after-free、double-free</li>
    </ul>
  </li>
  <li><strong>并发密集</strong>：
    <ul>
      <li>多进程同时访问GPU</li>
      <li>中断处理</li>
      <li>任务队列管理</li>
      <li><strong>常见bug</strong>：数据竞争、死锁</li>
    </ul>
  </li>
  <li><strong>用户空间交互频繁</strong>：
    <ul>
      <li>ioctl暴露大量攻击面</li>
      <li>需要严格验证用户输入</li>
      <li><strong>常见bug</strong>：权限提升漏洞</li>
    </ul>
  </li>
</ol>

<p><strong>历史数据</strong>（来自前文<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>）：</p>
<ul>
  <li>Linux内核CVE中，<strong>约70%是内存安全问题</strong></li>
  <li>GPU驱动是CVE高发区</li>
</ul>

<p><strong>Rust的解决方案</strong>：</p>

<table>
  <thead>
    <tr>
      <th>问题类别</th>
      <th>C的困境</th>
      <th>Rust的保证</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>内存安全</td>
      <td>手动管理，易出错</td>
      <td>所有权系统，编译时检查</td>
    </tr>
    <tr>
      <td>并发安全</td>
      <td>锁靠约定</td>
      <td>借用检查器，编译时防数据竞争</td>
    </tr>
    <tr>
      <td>资源泄漏</td>
      <td>手动cleanup</td>
      <td>RAII自动管理</td>
    </tr>
    <tr>
      <td>空指针</td>
      <td>运行时崩溃</td>
      <td><code class="language-plaintext highlighter-rouge">Option&lt;T&gt;</code>编译时消除</td>
    </tr>
  </tbody>
</table>

<p><strong>Greg Kroah-Hartman（内核维护者）的评价</strong><sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>：</p>
<blockquote>
  <p>“The majority of bugs we have are due to the stupid little corner cases in C that are totally gone in Rust.”</p>
</blockquote>

<h3 id="panthor-vs-tyr移植关系">Panthor vs Tyr：移植关系</h3>

<p><strong>Panthor</strong>是Mali CSF GPU的<strong>C驱动</strong>（已上游）：</p>
<ul>
  <li>位置：<code class="language-plaintext highlighter-rouge">drivers/gpu/drm/panthor/</code></li>
  <li>作者：Collabora工程师（Boris Brezillon等）</li>
  <li>状态：生产就绪，功能完整</li>
</ul>

<p><strong>Tyr</strong>是<strong>Panthor的Rust移植</strong>：</p>
<ul>
  <li>目标：功能对等（feature parity）</li>
  <li>策略：暴露相同的uAPI（用户空间API），兼容Mesa</li>
  <li>当前状态：基础功能，依赖GPUVM等抽象完善</li>
</ul>

<p><strong>为什么不直接用Panthor？</strong></p>
<ol>
  <li><strong>技术演进</strong>：验证Rust在GPU驱动的可行性</li>
  <li><strong>安全提升</strong>：消除Panthor的潜在内存安全bug</li>
  <li><strong>生态建设</strong>：为其他GPU驱动提供Rust参考</li>
</ol>

<hr />

<h2 id="快速入门如何学习gpu驱动开发">快速入门：如何学习GPU驱动开发</h2>

<h3 id="前置知识">前置知识</h3>

<p><strong>必备基础</strong>：</p>
<ol>
  <li>✅ C语言（指针、结构体、位操作）</li>
  <li>✅ Linux系统编程（系统调用、设备驱动基础）</li>
  <li>✅ 计算机体系结构（虚拟内存、DMA、中断）</li>
</ol>

<p><strong>Rust特有</strong>：</p>
<ol>
  <li>✅ 所有权和借用</li>
  <li>✅ 生命周期</li>
  <li>✅ unsafe Rust（FFI互操作）</li>
</ol>

<h3 id="学习路径推荐顺序">学习路径（推荐顺序）</h3>

<p><strong>第1步：DRM基础</strong>（2-3周）</p>
<ul>
  <li>📚 <a href="https://docs.kernel.org/gpu/drm-kms.html">DRM Driver Development Guide</a></li>
  <li>💻 实践：编译并加载简单DRM驱动（vkms）</li>
  <li>🎯 目标：理解GEM对象、ioctl处理流程</li>
</ul>

<p><strong>第2步：Rust内核编程</strong>（3-4周）</p>
<ul>
  <li>📚 <a href="https://rust-for-linux.com/">Rust for Linux官方文档</a></li>
  <li>📚 <a href="https://github.com/rust-for-linux/linux/tree/rust/samples/rust">Kernel Module in Rust</a></li>
  <li>💻 实践：编写简单的Rust platform驱动</li>
  <li>🎯 目标：理解<code class="language-plaintext highlighter-rouge">Pin</code>, <code class="language-plaintext highlighter-rouge">Opaque</code>, <code class="language-plaintext highlighter-rouge">#[pin_data]</code>等内核特有概念</li>
</ul>

<p><strong>第3步：阅读现有代码</strong>（持续）</p>
<ul>
  <li>📖 <strong>rvkms</strong>（最简单的Rust DRM驱动）</li>
  <li>📖 <strong>Nova</strong>（完整的Rust GPU驱动，Nvidia GSP）</li>
  <li>📖 <strong>Tyr</strong>（本文重点）</li>
  <li>📖 <strong>Asahi</strong>（Apple Silicon GPU，最成熟）</li>
</ul>

<p><strong>第4步：理解GPU硬件</strong>（按需）</p>
<ul>
  <li>📚 <a href="https://developer.arm.com/documentation/102849/latest/">Mali GPU Architecture</a></li>
  <li>📚 <a href="https://gitlab.freedesktop.org/panfrost">Panfrost Wiki</a>（Mali开源驱动项目）</li>
  <li>🎯 目标：理解着色器核心、MMU、MCU固件</li>
</ul>

<h3 id="关键资源汇总">关键资源汇总</h3>

<p><strong>官方文档</strong>：</p>
<ul>
  <li><a href="https://docs.kernel.org/gpu/">Linux DRM Documentation</a> - 内核DRM子系统文档</li>
  <li><a href="https://rust-for-linux.com/">Rust for Linux</a> - 官方项目网站</li>
  <li><a href="https://dri.freedesktop.org/wiki/">freedesktop.org DRM</a> - 社区Wiki</li>
</ul>

<p><strong>代码仓库</strong>：</p>
<ul>
  <li><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git">Linux Kernel</a></li>
  <li><a href="https://gitlab.freedesktop.org/drm/rust/kernel">DRM Rust Tree</a> - Rust DRM开发树</li>
  <li><a href="https://gitlab.freedesktop.org/mesa/mesa">Mesa</a> - 用户空间驱动</li>
</ul>

<p><strong>社区资源</strong>：</p>
<ul>
  <li><a href="https://lore.kernel.org/rust-for-linux/">Rust for Linux邮件列表</a></li>
  <li><a href="irc://irc.oftc.net/dri-devel">DRM开发者IRC</a> - #dri-devel频道</li>
  <li><a href="https://www.collabora.com/news-and-blog/">Collabora博客</a> - Tyr团队的技术博客</li>
</ul>

<p><strong>书籍推荐</strong>：</p>
<ul>
  <li>《Linux Device Drivers》（3rd Edition）- 经典驱动开发书籍</li>
  <li>《Programming Rust》（2nd Edition）- Rust语言深入</li>
  <li>《The Rust Reference》- Rust语言规范</li>
</ul>

<h3 id="从哪里开始贡献">从哪里开始贡献？</h3>

<p><strong>难度递增的任务</strong>：</p>

<ol>
  <li><strong>⭐ 初级</strong>：
    <ul>
      <li>为Rust抽象添加文档注释</li>
      <li>修复编译警告</li>
      <li>添加单元测试</li>
    </ul>
  </li>
  <li><strong>⭐⭐ 中级</strong>：
    <ul>
      <li>实现缺失的寄存器定义</li>
      <li>添加新的GPU型号支持</li>
      <li>改进错误处理</li>
    </ul>
  </li>
  <li><strong>⭐⭐⭐ 高级</strong>：
    <ul>
      <li>开发GPUVM Rust抽象</li>
      <li>实现GPU调度器</li>
      <li>移植其他GPU驱动到Rust</li>
    </ul>
  </li>
</ol>

<p><strong>如何参与</strong>：</p>
<ol>
  <li>订阅Rust for Linux邮件列表</li>
  <li>在GitLab上关注DRM Rust项目</li>
  <li>参与代码审查（学习最快的方式！）</li>
  <li>从小patch开始提交</li>
</ol>

<hr />

<h2 id="tyr项目概览第一手资料">Tyr项目概览：第一手资料</h2>

<h3 id="git-commit信息">Git Commit信息</h3>

<p><strong>提交哈希</strong>：<code class="language-plaintext highlighter-rouge">cf4fd52e3236</code>
<strong>作者</strong>：Daniel Almeida <a href="mailto:daniel.almeida@collabora.com">daniel.almeida@collabora.com</a>
<strong>日期</strong>：2025年9月10日
<strong>合作方</strong>：Collabora、Arm、Google</p>

<p><strong>Commit message核心摘录</strong>（原文）<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>：</p>

<blockquote>
  <p>Add a Rust driver for ARM Mali CSF-based GPUs. It is a port of Panthor and therefore exposes Panthor’s uAPI and name to userspace, and the product of a joint effort between Collabora, Arm and Google engineers.</p>
</blockquote>

<blockquote>
  <p>The downstream code is capable of <strong>booting the MCU, doing sync VM_BINDS</strong> through the work-in-progress GPUVM abstraction and also doing <strong>(trivial) submits</strong> through Asahi’s drm_scheduler and dma_fence abstractions.</p>
</blockquote>

<blockquote>
  <p><strong>This first patch, however, only implements a subset</strong> of the current features available downstream, as the rest is not implementable without pulling in even more abstractions. In particular, a lot of things depend on properly mapping memory on a given VA range, which itself <strong>depends on the GPUVM abstraction that is currently work-in-progress</strong>. For this reason, <strong>we still cannot boot the MCU</strong> and thus, cannot do much for the moment.</p>
</blockquote>

<h3 id="关键信息解读">关键信息解读</h3>

<ol>
  <li><strong>下游分支功能完整</strong>：
    <ul>
      <li>✅ MCU启动（Mali GPU的微控制器）</li>
      <li>✅ 同步VM_BINDS（虚拟内存绑定）</li>
      <li>✅ 基础任务提交</li>
    </ul>
  </li>
  <li><strong>上游代码受限</strong>：
    <ul>
      <li>❌ 无法启动MCU</li>
      <li>❌ GPUVM抽象缺失</li>
      <li>❌ 只能查询GPU信息</li>
    </ul>
  </li>
  <li><strong>战略转变</strong>：
    <ul>
      <li>之前尝试C+Rust混合（失败）</li>
      <li>现在改为纯Rust，分阶段上游</li>
    </ul>
  </li>
</ol>

<hr />

<h2 id="tyr代码结构实际文件布局">Tyr代码结构：实际文件布局</h2>

<h3 id="代码树基于commit-cf4fd52e3236">代码树（基于commit cf4fd52e3236）</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>drivers/gpu/drm/tyr/
├── tyr.rs        # 模块入口，platform_driver声明
├── driver.rs     # 驱动核心，TyrDriver和TyrData实现
├── file.rs       # DRM file操作，处理用户空间连接
├── gem.rs        # GEM对象管理
├── gpu.rs        # GPU信息查询（GpuInfo结构体）
├── regs.rs       # GPU寄存器定义和访问
├── Kconfig       # 内核配置选项
└── Makefile      # 构建配置
</code></pre></div></div>

<p><strong>对比Panthor（C驱动）</strong>：</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cd</span> /Users/weli/works/linux
<span class="nv">$ </span><span class="nb">ls </span>drivers/gpu/drm/panthor/
panthor_devfreq.c  panthor_fw.c   panthor_gem.c  panthor_gpu.c
panthor_device.c   panthor_fw.h   panthor_gem.h  panthor_gpu.h
panthor_device.h   panthor_heap.c panthor_mmu.c  panthor_regs.h
...（共24个文件）
</code></pre></div></div>

<p><strong>Tyr更精简</strong>：8个文件 vs Panthor的24个文件。但这并非优势，而是<strong>功能缺失</strong>的体现。</p>

<hr />

<h2 id="代码分析1tyr驱动入口">代码分析1：Tyr驱动入口</h2>

<h3 id="文件driversgpudrmtyrtyrrs">文件：<code class="language-plaintext highlighter-rouge">drivers/gpu/drm/tyr/tyr.rs</code></h3>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// SPDX-License-Identifier: GPL-2.0 or MIT</span>

<span class="cd">//! Arm Mali Tyr DRM driver.</span>
<span class="cd">//!</span>
<span class="cd">//! The name "Tyr" is inspired by Norse mythology, reflecting Arm's tradition of</span>
<span class="cd">//! naming their GPUs after Nordic mythological figures and places.</span>

<span class="k">use</span> <span class="k">crate</span><span class="p">::</span><span class="nn">driver</span><span class="p">::</span><span class="n">TyrDriver</span><span class="p">;</span>

<span class="k">mod</span> <span class="n">driver</span><span class="p">;</span>
<span class="k">mod</span> <span class="n">file</span><span class="p">;</span>
<span class="k">mod</span> <span class="n">gem</span><span class="p">;</span>
<span class="k">mod</span> <span class="n">gpu</span><span class="p">;</span>
<span class="k">mod</span> <span class="n">regs</span><span class="p">;</span>

<span class="nn">kernel</span><span class="p">::</span><span class="nd">module_platform_driver!</span> <span class="p">{</span>
    <span class="k">type</span><span class="p">:</span> <span class="n">TyrDriver</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="s">"tyr"</span><span class="p">,</span>
    <span class="n">authors</span><span class="p">:</span> <span class="p">[</span><span class="s">"The Tyr driver authors"</span><span class="p">],</span>
    <span class="n">description</span><span class="p">:</span> <span class="s">"Arm Mali Tyr DRM driver"</span><span class="p">,</span>
    <span class="n">license</span><span class="p">:</span> <span class="s">"Dual MIT/GPL"</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>关键点</strong>：</p>

<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">module_platform_driver!</code> 宏</strong>：
    <ul>
      <li>自动生成平台驱动注册代码</li>
      <li>等价于C中的<code class="language-plaintext highlighter-rouge">module_platform_driver(tyr_driver)</code></li>
    </ul>
  </li>
  <li><strong>模块组织</strong>：
    <ul>
      <li>清晰的模块划分（driver、file、gem、gpu、regs）</li>
      <li>私有模块，不暴露内部细节</li>
    </ul>
  </li>
</ol>

<p><strong>对比C版本</strong>（<code class="language-plaintext highlighter-rouge">panthor_drv.c</code>）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">struct</span> <span class="n">platform_driver</span> <span class="n">panthor_driver</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">.</span><span class="n">probe</span> <span class="o">=</span> <span class="n">panthor_probe</span><span class="p">,</span>
    <span class="p">.</span><span class="n">remove</span> <span class="o">=</span> <span class="n">panthor_remove</span><span class="p">,</span>
    <span class="p">.</span><span class="n">driver</span> <span class="o">=</span> <span class="p">{</span>
        <span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="s">"panthor"</span><span class="p">,</span>
        <span class="p">.</span><span class="n">pm</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">panthor_pm_ops</span><span class="p">,</span>
        <span class="p">.</span><span class="n">of_match_table</span> <span class="o">=</span> <span class="n">dt_match</span><span class="p">,</span>
    <span class="p">},</span>
<span class="p">};</span>
<span class="n">module_platform_driver</span><span class="p">(</span><span class="n">panthor_driver</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>Rust的优势</strong>：</p>
<ul>
  <li>类型安全：<code class="language-plaintext highlighter-rouge">type: TyrDriver</code>编译时检查</li>
  <li>生命周期自动管理：probe/remove的资源管理通过RAII</li>
</ul>

<hr />

<h2 id="代码分析2驱动核心实现">代码分析2：驱动核心实现</h2>

<h3 id="文件driversgpudrmtyrdriverrs部分">文件：<code class="language-plaintext highlighter-rouge">drivers/gpu/drm/tyr/driver.rs</code>（部分）</h3>

<h4 id="21-设备树匹配">2.1 设备树匹配</h4>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">kernel</span><span class="p">::</span><span class="nd">of_device_table!</span><span class="p">(</span>
    <span class="n">OF_TABLE</span><span class="p">,</span>
    <span class="n">MODULE_OF_TABLE</span><span class="p">,</span>
    <span class="o">&lt;</span><span class="n">TyrDriver</span> <span class="k">as</span> <span class="nn">platform</span><span class="p">::</span><span class="n">Driver</span><span class="o">&gt;</span><span class="p">::</span><span class="n">IdInfo</span><span class="p">,</span>
    <span class="p">[</span>
        <span class="p">(</span><span class="nn">of</span><span class="p">::</span><span class="nn">DeviceId</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nd">c_str!</span><span class="p">(</span><span class="s">"rockchip,rk3588-mali"</span><span class="p">)),</span> <span class="p">()),</span>
        <span class="p">(</span><span class="nn">of</span><span class="p">::</span><span class="nn">DeviceId</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nd">c_str!</span><span class="p">(</span><span class="s">"arm,mali-valhall-csf"</span><span class="p">)),</span> <span class="p">())</span>
    <span class="p">]</span>
<span class="p">);</span>
</code></pre></div></div>

<p><strong>解释</strong>：</p>
<ul>
  <li>支持Rockchip RK3588 SoC的Mali GPU</li>
  <li>兼容ARM Mali Valhall CSF架构</li>
  <li><code class="language-plaintext highlighter-rouge">c_str!</code>宏：编译时C字符串，零运行时开销</li>
</ul>

<p><strong>对比C版本</strong>：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">const</span> <span class="k">struct</span> <span class="n">of_device_id</span> <span class="n">dt_match</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">{</span> <span class="p">.</span><span class="n">compatible</span> <span class="o">=</span> <span class="s">"arm,mali-valhall-csf"</span> <span class="p">},</span>
    <span class="p">{</span> <span class="p">.</span><span class="n">compatible</span> <span class="o">=</span> <span class="s">"rockchip,rk3588-mali"</span> <span class="p">},</span>
    <span class="p">{}</span>
<span class="p">};</span>
<span class="n">MODULE_DEVICE_TABLE</span><span class="p">(</span><span class="n">of</span><span class="p">,</span> <span class="n">dt_match</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>Rust的类型安全</strong>：</p>
<ul>
  <li>编译时检查字符串有效性</li>
  <li><code class="language-plaintext highlighter-rouge">of::DeviceId::new</code>确保格式正确</li>
</ul>

<h4 id="22-驱动数据结构">2.2 驱动数据结构</h4>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[pin_data(PinnedDrop)]</span>
<span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="k">struct</span> <span class="n">TyrData</span> <span class="p">{</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">pdev</span><span class="p">:</span> <span class="n">ARef</span><span class="o">&lt;</span><span class="nn">platform</span><span class="p">::</span><span class="n">Device</span><span class="o">&gt;</span><span class="p">,</span>

    <span class="nd">#[pin]</span>
    <span class="n">clks</span><span class="p">:</span> <span class="n">Mutex</span><span class="o">&lt;</span><span class="n">Clocks</span><span class="o">&gt;</span><span class="p">,</span>

    <span class="nd">#[pin]</span>
    <span class="n">regulators</span><span class="p">:</span> <span class="n">Mutex</span><span class="o">&lt;</span><span class="n">Regulators</span><span class="o">&gt;</span><span class="p">,</span>

    <span class="cd">/// Some information on the GPU.</span>
    <span class="cd">///</span>
    <span class="cd">/// This is mainly queried by userspace, i.e.: Mesa.</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">gpu_info</span><span class="p">:</span> <span class="n">GpuInfo</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>关键设计</strong>：</p>

<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">#[pin_data]</code> 属性</strong>：
    <ul>
      <li>保证内存不移动（pin到堆上）</li>
      <li>必需，因为C代码可能持有指针</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">ARef&lt;platform::Device&gt;</code></strong>：
    <ul>
      <li>引用计数的平台设备</li>
      <li>等价于C中的<code class="language-plaintext highlighter-rouge">struct platform_device *</code></li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">Mutex&lt;Clocks&gt;</code> 和 <code class="language-plaintext highlighter-rouge">Mutex&lt;Regulators&gt;</code></strong>：
    <ul>
      <li>内核互斥锁，保护共享资源</li>
      <li><code class="language-plaintext highlighter-rouge">#[pin]</code>：这些字段不能移动</li>
    </ul>
  </li>
</ol>

<h4 id="23-初始化流程probe函数">2.3 初始化流程（probe函数）</h4>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span> <span class="nn">platform</span><span class="p">::</span><span class="n">Driver</span> <span class="k">for</span> <span class="n">TyrDriver</span> <span class="p">{</span>
    <span class="k">type</span> <span class="n">IdInfo</span> <span class="o">=</span> <span class="p">();</span>
    <span class="k">const</span> <span class="n">OF_ID_TABLE</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="nn">of</span><span class="p">::</span><span class="n">IdTable</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">::</span><span class="n">IdInfo</span><span class="o">&gt;&gt;</span> <span class="o">=</span> <span class="nf">Some</span><span class="p">(</span><span class="o">&amp;</span><span class="n">OF_TABLE</span><span class="p">);</span>

    <span class="k">fn</span> <span class="nf">probe</span><span class="p">(</span>
        <span class="n">pdev</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">platform</span><span class="p">::</span><span class="n">Device</span><span class="o">&lt;</span><span class="n">Core</span><span class="o">&gt;</span><span class="p">,</span>
        <span class="n">_info</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;&amp;</span><span class="k">Self</span><span class="p">::</span><span class="n">IdInfo</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">Pin</span><span class="o">&lt;</span><span class="n">KBox</span><span class="o">&lt;</span><span class="k">Self</span><span class="o">&gt;&gt;&gt;</span> <span class="p">{</span>
        <span class="c1">// 1. 获取时钟</span>
        <span class="k">let</span> <span class="n">core_clk</span> <span class="o">=</span> <span class="nn">Clk</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="nf">Some</span><span class="p">(</span><span class="nd">c_str!</span><span class="p">(</span><span class="s">"core"</span><span class="p">)))</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">stacks_clk</span> <span class="o">=</span> <span class="nn">OptionalClk</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="nf">Some</span><span class="p">(</span><span class="nd">c_str!</span><span class="p">(</span><span class="s">"stacks"</span><span class="p">)))</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">coregroup_clk</span> <span class="o">=</span> <span class="nn">OptionalClk</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="nf">Some</span><span class="p">(</span><span class="nd">c_str!</span><span class="p">(</span><span class="s">"coregroup"</span><span class="p">)))</span><span class="o">?</span><span class="p">;</span>

        <span class="c1">// 2. 启用时钟</span>
        <span class="n">core_clk</span><span class="nf">.prepare_enable</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
        <span class="n">stacks_clk</span><span class="nf">.prepare_enable</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
        <span class="n">coregroup_clk</span><span class="nf">.prepare_enable</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>

        <span class="c1">// 3. 获取并启用电源调节器</span>
        <span class="k">let</span> <span class="n">mali_regulator</span> <span class="o">=</span> <span class="nn">Regulator</span><span class="p">::</span><span class="o">&lt;</span><span class="nn">regulator</span><span class="p">::</span><span class="n">Enabled</span><span class="o">&gt;</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="nd">c_str!</span><span class="p">(</span><span class="s">"mali"</span><span class="p">))</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">sram_regulator</span> <span class="o">=</span> <span class="nn">Regulator</span><span class="p">::</span><span class="o">&lt;</span><span class="nn">regulator</span><span class="p">::</span><span class="n">Enabled</span><span class="o">&gt;</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="nd">c_str!</span><span class="p">(</span><span class="s">"sram"</span><span class="p">))</span><span class="o">?</span><span class="p">;</span>

        <span class="c1">// 4. 映射MMIO寄存器</span>
        <span class="k">let</span> <span class="n">request</span> <span class="o">=</span> <span class="n">pdev</span><span class="nf">.io_request_by_index</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="nf">.ok_or</span><span class="p">(</span><span class="n">ENODEV</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">iomem</span> <span class="o">=</span> <span class="nn">Arc</span><span class="p">::</span><span class="nf">pin_init</span><span class="p">(</span><span class="n">request</span><span class="py">.iomap_sized</span><span class="p">::</span><span class="o">&lt;</span><span class="n">SZ_2M</span><span class="o">&gt;</span><span class="p">(),</span> <span class="n">GFP_KERNEL</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

        <span class="c1">// 5. 软复位GPU</span>
        <span class="nf">issue_soft_reset</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="o">&amp;</span><span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

        <span class="c1">// 6. L2缓存上电</span>
        <span class="nn">gpu</span><span class="p">::</span><span class="nf">l2_power_on</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="o">&amp;</span><span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

        <span class="c1">// 7. 读取GPU信息</span>
        <span class="k">let</span> <span class="n">gpu_info</span> <span class="o">=</span> <span class="nn">GpuInfo</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="o">&amp;</span><span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="n">gpu_info</span><span class="nf">.log</span><span class="p">(</span><span class="n">pdev</span><span class="p">);</span>

        <span class="c1">// 8. 创建DRM设备</span>
        <span class="k">let</span> <span class="n">data</span> <span class="o">=</span> <span class="nd">try_pin_init!</span><span class="p">(</span><span class="n">TyrData</span> <span class="p">{</span>
            <span class="n">pdev</span><span class="p">:</span> <span class="n">platform</span><span class="nf">.clone</span><span class="p">(),</span>
            <span class="n">clks</span> <span class="o">&lt;-</span> <span class="nd">new_mutex!</span><span class="p">(</span><span class="n">Clocks</span> <span class="p">{</span> <span class="o">...</span> <span class="p">}),</span>
            <span class="n">regulators</span> <span class="o">&lt;-</span> <span class="nd">new_mutex!</span><span class="p">(</span><span class="n">Regulators</span> <span class="p">{</span> <span class="o">...</span> <span class="p">}),</span>
            <span class="n">gpu_info</span><span class="p">,</span>
        <span class="p">});</span>

        <span class="k">let</span> <span class="n">tdev</span><span class="p">:</span> <span class="n">ARef</span><span class="o">&lt;</span><span class="n">TyrDevice</span><span class="o">&gt;</span> <span class="o">=</span> <span class="nn">drm</span><span class="p">::</span><span class="nn">Device</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="n">data</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="nn">drm</span><span class="p">::</span><span class="nn">driver</span><span class="p">::</span><span class="nn">Registration</span><span class="p">::</span><span class="nf">new_foreign_owned</span><span class="p">(</span><span class="o">&amp;</span><span class="n">tdev</span><span class="p">,</span> <span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="mi">0</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

        <span class="c1">// 9. 返回驱动实例</span>
        <span class="k">let</span> <span class="n">driver</span> <span class="o">=</span> <span class="nn">KBox</span><span class="p">::</span><span class="nf">pin_init</span><span class="p">(</span><span class="nd">try_pin_init!</span><span class="p">(</span><span class="n">TyrDriver</span> <span class="p">{</span> <span class="n">device</span><span class="p">:</span> <span class="n">tdev</span> <span class="p">}),</span> <span class="n">GFP_KERNEL</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

        <span class="nd">dev_info!</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="s">"Tyr initialized correctly.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="nf">Ok</span><span class="p">(</span><span class="n">driver</span><span class="p">)</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>详细分析</strong>：</p>

<p><strong>步骤1-2：时钟管理</strong></p>

<p>Rust的<code class="language-plaintext highlighter-rouge">Clk::get</code> + <code class="language-plaintext highlighter-rouge">prepare_enable</code><strong>自动管理生命周期</strong>：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">core_clk</span> <span class="o">=</span> <span class="nn">Clk</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="nf">Some</span><span class="p">(</span><span class="nd">c_str!</span><span class="p">(</span><span class="s">"core"</span><span class="p">)))</span><span class="o">?</span><span class="p">;</span>
<span class="n">core_clk</span><span class="nf">.prepare_enable</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
<span class="c1">// 当core_clk离开作用域时，自动disable + unprepare</span>
</code></pre></div></div>

<p>对比C版本：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">core_clk</span> <span class="o">=</span> <span class="n">devm_clk_get</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="s">"core"</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">IS_ERR</span><span class="p">(</span><span class="n">core_clk</span><span class="p">))</span>
    <span class="k">return</span> <span class="nf">PTR_ERR</span><span class="p">(</span><span class="n">core_clk</span><span class="p">);</span>

<span class="n">ret</span> <span class="o">=</span> <span class="n">clk_prepare_enable</span><span class="p">(</span><span class="n">core_clk</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ret</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>

<span class="c1">// ...</span>
<span class="c1">// 忘记disable？内存泄漏！</span>
<span class="c1">// clk_disable_unprepare(core_clk);  // 必须手动</span>
</code></pre></div></div>

<p><strong>步骤3：电源调节器的类型状态</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">mali_regulator</span> <span class="o">=</span> <span class="nn">Regulator</span><span class="p">::</span><span class="o">&lt;</span><span class="nn">regulator</span><span class="p">::</span><span class="n">Enabled</span><span class="o">&gt;</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="nd">c_str!</span><span class="p">(</span><span class="s">"mali"</span><span class="p">))</span><span class="o">?</span><span class="p">;</span>
</code></pre></div></div>

<p><strong>类型系统保证</strong>：</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">Regulator&lt;Enabled&gt;</code>：类型上已启用</li>
  <li><code class="language-plaintext highlighter-rouge">Regulator&lt;Disabled&gt;</code>：类型上已禁用</li>
  <li><strong>编译时防止操作未启用的调节器</strong></li>
</ul>

<p>C中无此保证，完全依赖运行时检查。</p>

<p><strong>步骤4：MMIO映射的大小检查</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">iomem</span> <span class="o">=</span> <span class="nn">Arc</span><span class="p">::</span><span class="nf">pin_init</span><span class="p">(</span><span class="n">request</span><span class="py">.iomap_sized</span><span class="p">::</span><span class="o">&lt;</span><span class="n">SZ_2M</span><span class="o">&gt;</span><span class="p">(),</span> <span class="n">GFP_KERNEL</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
</code></pre></div></div>

<ul>
  <li><code class="language-plaintext highlighter-rouge">iomap_sized::&lt;SZ_2M&gt;()</code>：编译时指定映射大小为2MB</li>
  <li><code class="language-plaintext highlighter-rouge">SZ_2M</code>是常量（<code class="language-plaintext highlighter-rouge">kernel::sizes::SZ_2M</code>），编译时检查</li>
</ul>

<p>C版本：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">iomem</span> <span class="o">=</span> <span class="n">devm_ioremap_resource</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">res</span><span class="p">);</span>
<span class="c1">// 没有大小检查，运行时越界访问可能！</span>
</code></pre></div></div>

<p><strong>步骤5：软复位实现</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">issue_soft_reset</span><span class="p">(</span><span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Device</span><span class="o">&lt;</span><span class="n">Bound</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">iomem</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Devres</span><span class="o">&lt;</span><span class="n">IoMem</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
    <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_CMD</span><span class="nf">.write</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">,</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_CMD_SOFT_RESET</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

    <span class="c1">// TODO: We cannot poll, as there is no support in Rust currently, so we</span>
    <span class="c1">// sleep. Change this when read_poll_timeout() is implemented in Rust.</span>
    <span class="nn">kernel</span><span class="p">::</span><span class="nn">time</span><span class="p">::</span><span class="nn">delay</span><span class="p">::</span><span class="nf">fsleep</span><span class="p">(</span><span class="nn">time</span><span class="p">::</span><span class="nn">Delta</span><span class="p">::</span><span class="nf">from_millis</span><span class="p">(</span><span class="mi">100</span><span class="p">));</span>

    <span class="k">if</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_IRQ_RAWSTAT</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span> <span class="o">&amp;</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_IRQ_RAWSTAT_RESET_COMPLETED</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
        <span class="nd">dev_err!</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="s">"GPU reset failed with errno</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="nd">dev_err!</span><span class="p">(</span>
            <span class="n">dev</span><span class="p">,</span>
            <span class="s">"GPU_INT_RAWSTAT is {}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
            <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_IRQ_RAWSTAT</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span>
        <span class="p">);</span>

        <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="n">EIO</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>TODO注释揭示的问题</strong>：</p>
<ul>
  <li>Rust内核还没有<code class="language-plaintext highlighter-rouge">read_poll_timeout()</code></li>
  <li>被迫用固定延迟（100ms）替代轮询</li>
  <li>这是<strong>基础设施缺失</strong>的直接体现</li>
</ul>

<p><strong>步骤7：GPU信息查询</strong></p>

<p>这是当前Tyr<strong>唯一能做的事情</strong>。详见下一节。</p>

<hr />

<h2 id="代码分析3gpu信息查询">代码分析3：GPU信息查询</h2>

<h3 id="文件driversgpudrmtyrgpurs">文件：<code class="language-plaintext highlighter-rouge">drivers/gpu/drm/tyr/gpu.rs</code></h3>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// Struct containing information that can be queried by userspace. This is read from</span>
<span class="cd">/// the GPU's registers.</span>
<span class="cd">///</span>
<span class="cd">/// # Invariants</span>
<span class="cd">///</span>
<span class="cd">/// - The layout of this struct identical to the C `struct drm_panthor_gpu_info`.</span>
<span class="nd">#[repr(C)]</span>
<span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="k">struct</span> <span class="n">GpuInfo</span> <span class="p">{</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">gpu_id</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">gpu_rev</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">csf_id</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">l2_features</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">tiler_features</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">mem_features</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">mmu_features</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">thread_features</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">max_threads</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">thread_max_workgroup_size</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">thread_max_barrier_size</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">coherency_features</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">texture_features</span><span class="p">:</span> <span class="p">[</span><span class="nb">u32</span><span class="p">;</span> <span class="mi">4</span><span class="p">],</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">as_present</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">pad0</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">shader_present</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">l2_present</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">tiler_present</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">core_features</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="n">pad</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>关键设计</strong>：</p>

<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">#[repr(C)]</code></strong>：
    <ul>
      <li>保证与C结构体<code class="language-plaintext highlighter-rouge">drm_panthor_gpu_info</code>内存布局完全相同</li>
      <li>用户空间通过ioctl读取这个结构体</li>
    </ul>
  </li>
  <li><strong>Invariants注释</strong>：
    <ul>
      <li>Rust文档化不变量</li>
      <li>编译器无法检查（需要人工审查）</li>
    </ul>
  </li>
</ol>

<h3 id="gpuinfo初始化">GpuInfo初始化</h3>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span> <span class="n">GpuInfo</span> <span class="p">{</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Device</span><span class="o">&lt;</span><span class="n">Bound</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">iomem</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Devres</span><span class="o">&lt;</span><span class="n">IoMem</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">gpu_id</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_ID</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">csf_id</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_CSF_ID</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">gpu_rev</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_REVID</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">core_features</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_CORE_FEATURES</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">l2_features</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_L2_FEATURES</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">tiler_features</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_TILER_FEATURES</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">mem_features</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_MEM_FEATURES</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">mmu_features</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_MMU_FEATURES</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">thread_features</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_THREAD_FEATURES</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">max_threads</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_THREAD_MAX_THREADS</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">thread_max_workgroup_size</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_THREAD_MAX_WORKGROUP_SIZE</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">thread_max_barrier_size</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_THREAD_MAX_BARRIER_SIZE</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="n">coherency_features</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_COHERENCY_FEATURES</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

        <span class="k">let</span> <span class="n">texture_features</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_TEXTURE_FEATURES0</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

        <span class="k">let</span> <span class="n">as_present</span> <span class="o">=</span> <span class="nn">regs</span><span class="p">::</span><span class="n">GPU_AS_PRESENT</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

        <span class="c1">// 64位寄存器，分两次读取</span>
        <span class="k">let</span> <span class="n">shader_present</span> <span class="o">=</span> <span class="nn">u64</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="nn">regs</span><span class="p">::</span><span class="n">GPU_SHADER_PRESENT_LO</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">);</span>
        <span class="k">let</span> <span class="n">shader_present</span> <span class="o">=</span>
            <span class="n">shader_present</span> <span class="p">|</span> <span class="nn">u64</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="nn">regs</span><span class="p">::</span><span class="n">GPU_SHADER_PRESENT_HI</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">)</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">;</span>

        <span class="k">let</span> <span class="n">tiler_present</span> <span class="o">=</span> <span class="nn">u64</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="nn">regs</span><span class="p">::</span><span class="n">GPU_TILER_PRESENT_LO</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">);</span>
        <span class="k">let</span> <span class="n">tiler_present</span> <span class="o">=</span>
            <span class="n">tiler_present</span> <span class="p">|</span> <span class="nn">u64</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="nn">regs</span><span class="p">::</span><span class="n">GPU_TILER_PRESENT_HI</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">)</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">;</span>

        <span class="k">let</span> <span class="n">l2_present</span> <span class="o">=</span> <span class="nn">u64</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="nn">regs</span><span class="p">::</span><span class="n">GPU_L2_PRESENT_LO</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">);</span>
        <span class="k">let</span> <span class="n">l2_present</span> <span class="o">=</span> <span class="n">l2_present</span> <span class="p">|</span> <span class="nn">u64</span><span class="p">::</span><span class="nf">from</span><span class="p">(</span><span class="nn">regs</span><span class="p">::</span><span class="n">GPU_L2_PRESENT_HI</span><span class="nf">.read</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="n">iomem</span><span class="p">)</span><span class="o">?</span><span class="p">)</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">;</span>

        <span class="nf">Ok</span><span class="p">(</span><span class="k">Self</span> <span class="p">{</span>
            <span class="n">gpu_id</span><span class="p">,</span>
            <span class="n">gpu_rev</span><span class="p">,</span>
            <span class="n">csf_id</span><span class="p">,</span>
            <span class="n">l2_features</span><span class="p">,</span>
            <span class="n">tiler_features</span><span class="p">,</span>
            <span class="n">mem_features</span><span class="p">,</span>
            <span class="n">mmu_features</span><span class="p">,</span>
            <span class="n">thread_features</span><span class="p">,</span>
            <span class="n">max_threads</span><span class="p">,</span>
            <span class="n">thread_max_workgroup_size</span><span class="p">,</span>
            <span class="n">thread_max_barrier_size</span><span class="p">,</span>
            <span class="n">coherency_features</span><span class="p">,</span>
            <span class="c1">// TODO: Add texture_features_{1,2,3}.</span>
            <span class="n">texture_features</span><span class="p">:</span> <span class="p">[</span><span class="n">texture_features</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
            <span class="n">as_present</span><span class="p">,</span>
            <span class="n">pad0</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
            <span class="n">shader_present</span><span class="p">,</span>
            <span class="n">l2_present</span><span class="p">,</span>
            <span class="n">tiler_present</span><span class="p">,</span>
            <span class="n">core_features</span><span class="p">,</span>
            <span class="n">pad</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
        <span class="p">})</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>技术细节</strong>：</p>

<ol>
  <li><strong>错误传播</strong>：
    <ul>
      <li>每次<code class="language-plaintext highlighter-rouge">regs::XXX.read()?</code>都可能失败</li>
      <li><code class="language-plaintext highlighter-rouge">?</code>运算符自动传播错误</li>
      <li>无需手动<code class="language-plaintext highlighter-rouge">if (ret &lt; 0) return ret;</code></li>
    </ul>
  </li>
  <li><strong>64位寄存器读取</strong>：
    <ul>
      <li>Mali GPU的64位寄存器分成LO/HI两个32位寄存器</li>
      <li>Rust明确显示位运算：<code class="language-plaintext highlighter-rouge">| u64::from(...) &lt;&lt; 32</code></li>
      <li>C中容易出错（符号扩展问题）</li>
    </ul>
  </li>
  <li><strong>TODO注释</strong>：
    <ul>
      <li><code class="language-plaintext highlighter-rouge">texture_features</code>只读取了第一个</li>
      <li>其余3个硬编码为0</li>
      <li>说明这是<strong>WIP（Work-in-Progress）</strong></li>
    </ul>
  </li>
</ol>

<hr />

<h2 id="代码分析4drm抽象层">代码分析4：DRM抽象层</h2>

<p>Tyr依赖<code class="language-plaintext highlighter-rouge">rust/kernel/drm/</code>提供的抽象层。让我们深入分析。</p>

<h3 id="文件rustkerneldrmgemmodrs">文件：<code class="language-plaintext highlighter-rouge">rust/kernel/drm/gem/mod.rs</code></h3>

<h4 id="41-basedriverobject-trait">4.1 BaseDriverObject trait</h4>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// GEM object functions, which must be implemented by drivers.</span>
<span class="k">pub</span> <span class="k">trait</span> <span class="n">BaseDriverObject</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="n">BaseObject</span><span class="o">&gt;</span><span class="p">:</span> <span class="nb">Sync</span> <span class="o">+</span> <span class="nb">Send</span> <span class="o">+</span> <span class="nb">Sized</span> <span class="p">{</span>
    <span class="cd">/// Create a new driver data object for a GEM object of a given size.</span>
    <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">drm</span><span class="p">::</span><span class="n">Device</span><span class="o">&lt;</span><span class="nn">T</span><span class="p">::</span><span class="n">Driver</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="k">impl</span> <span class="n">PinInit</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">,</span> <span class="n">Error</span><span class="o">&gt;</span><span class="p">;</span>

    <span class="cd">/// Open a new handle to an existing object, associated with a File.</span>
    <span class="k">fn</span> <span class="nf">open</span><span class="p">(</span>
        <span class="n">_obj</span><span class="p">:</span> <span class="o">&amp;&lt;&lt;</span><span class="n">T</span> <span class="k">as</span> <span class="n">IntoGEMObject</span><span class="o">&gt;</span><span class="p">::</span><span class="n">Driver</span> <span class="k">as</span> <span class="nn">drm</span><span class="p">::</span><span class="n">Driver</span><span class="o">&gt;</span><span class="p">::</span><span class="n">Object</span><span class="p">,</span>
        <span class="n">_file</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">drm</span><span class="p">::</span><span class="n">File</span><span class="o">&lt;&lt;&lt;</span><span class="n">T</span> <span class="k">as</span> <span class="n">IntoGEMObject</span><span class="o">&gt;</span><span class="p">::</span><span class="n">Driver</span> <span class="k">as</span> <span class="nn">drm</span><span class="p">::</span><span class="n">Driver</span><span class="o">&gt;</span><span class="p">::</span><span class="n">File</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
        <span class="nf">Ok</span><span class="p">(())</span>
    <span class="p">}</span>

    <span class="cd">/// Close a handle to an existing object, associated with a File.</span>
    <span class="k">fn</span> <span class="nf">close</span><span class="p">(</span>
        <span class="n">_obj</span><span class="p">:</span> <span class="o">&amp;&lt;&lt;</span><span class="n">T</span> <span class="k">as</span> <span class="n">IntoGEMObject</span><span class="o">&gt;</span><span class="p">::</span><span class="n">Driver</span> <span class="k">as</span> <span class="nn">drm</span><span class="p">::</span><span class="n">Driver</span><span class="o">&gt;</span><span class="p">::</span><span class="n">Object</span><span class="p">,</span>
        <span class="n">_file</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">drm</span><span class="p">::</span><span class="n">File</span><span class="o">&lt;&lt;&lt;</span><span class="n">T</span> <span class="k">as</span> <span class="n">IntoGEMObject</span><span class="o">&gt;</span><span class="p">::</span><span class="n">Driver</span> <span class="k">as</span> <span class="nn">drm</span><span class="p">::</span><span class="n">Driver</span><span class="o">&gt;</span><span class="p">::</span><span class="n">File</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="p">)</span> <span class="p">{</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>设计解析</strong>：</p>

<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">PinInit&lt;Self, Error&gt;</code></strong>：
    <ul>
      <li>就地初始化（in-place init）</li>
      <li>避免在栈上构造后移动到堆</li>
      <li>关键：C指针可能指向这块内存</li>
    </ul>
  </li>
  <li><strong>open/close回调</strong>：
    <ul>
      <li>默认实现为空</li>
      <li>驱动可选择性覆盖</li>
      <li>对比C：必须提供函数指针或NULL</li>
    </ul>
  </li>
  <li><strong>类型约束</strong>：
    <ul>
      <li><code class="language-plaintext highlighter-rouge">Sync + Send</code>：可安全跨线程</li>
      <li><code class="language-plaintext highlighter-rouge">Sized</code>：大小已知（非trait object）</li>
    </ul>
  </li>
</ol>

<h4 id="42-引用计数机制">4.2 引用计数机制</h4>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// SAFETY: All gem objects are refcounted.</span>
<span class="k">unsafe</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="n">IntoGEMObject</span><span class="o">&gt;</span> <span class="n">AlwaysRefCounted</span> <span class="k">for</span> <span class="n">T</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">inc_ref</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// SAFETY: The existence of a shared reference guarantees that the refcount is non-zero.</span>
        <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">bindings</span><span class="p">::</span><span class="nf">drm_gem_object_get</span><span class="p">(</span><span class="k">self</span><span class="nf">.as_raw</span><span class="p">())</span> <span class="p">};</span>
    <span class="p">}</span>

    <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">dec_ref</span><span class="p">(</span><span class="n">obj</span><span class="p">:</span> <span class="n">NonNull</span><span class="o">&lt;</span><span class="k">Self</span><span class="o">&gt;</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// SAFETY: We either hold the only refcount on `obj`, or one of many - meaning that no one</span>
        <span class="c1">// else could possibly hold a mutable reference to `obj` and thus this immutable reference</span>
        <span class="c1">// is safe.</span>
        <span class="k">let</span> <span class="n">obj</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="n">obj</span><span class="nf">.as_ref</span><span class="p">()</span> <span class="p">}</span><span class="nf">.as_raw</span><span class="p">();</span>

        <span class="c1">// SAFETY:</span>
        <span class="c1">// - The safety requirements guarantee that the refcount is non-zero.</span>
        <span class="c1">// - We hold no references to `obj` now, making it safe for us to potentially deallocate it.</span>
        <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">bindings</span><span class="p">::</span><span class="nf">drm_gem_object_put</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span> <span class="p">};</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>SAFETY注释的重要性</strong>：</p>

<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">inc_ref</code></strong>：
    <ul>
      <li>调用C函数<code class="language-plaintext highlighter-rouge">drm_gem_object_get</code></li>
      <li>假设：已有&amp;self，所以refcount非零</li>
      <li>这是<strong>不变量</strong>，违反=UB（未定义行为）</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">dec_ref</code></strong>：
    <ul>
      <li>详细的SAFETY论证：
        <ul>
          <li>持有唯一或多个引用之一</li>
          <li>没有可变引用冲突</li>
          <li>refcount非零（由调用者保证）</li>
        </ul>
      </li>
      <li>可能释放内存（refcount降到0）</li>
    </ul>
  </li>
</ol>

<p><strong>对比C版本</strong>：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">inline</span> <span class="kt">void</span> <span class="nf">drm_gem_object_get</span><span class="p">(</span><span class="k">struct</span> <span class="n">drm_gem_object</span> <span class="o">*</span><span class="n">obj</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">kref_get</span><span class="p">(</span><span class="o">&amp;</span><span class="n">obj</span><span class="o">-&gt;</span><span class="n">refcount</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">inline</span> <span class="kt">void</span> <span class="nf">drm_gem_object_put</span><span class="p">(</span><span class="k">struct</span> <span class="n">drm_gem_object</span> <span class="o">*</span><span class="n">obj</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">kref_put</span><span class="p">(</span><span class="o">&amp;</span><span class="n">obj</span><span class="o">-&gt;</span><span class="n">refcount</span><span class="p">,</span> <span class="n">drm_gem_object_free</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>C中<strong>完全没有安全论证</strong>：</p>
<ul>
  <li>编译器不检查refcount一致性</li>
  <li>开发者完全凭经验</li>
  <li>常见bug：double-free、use-after-free</li>
</ul>

<h4 id="43-openclose回调的ffi桥接">4.3 open/close回调的FFI桥接</h4>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="n">open_callback</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="n">BaseDriverObject</span><span class="o">&lt;</span><span class="n">U</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">U</span><span class="p">:</span> <span class="n">BaseObject</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="n">raw_obj</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">bindings</span><span class="p">::</span><span class="n">drm_gem_object</span><span class="p">,</span>
    <span class="n">raw_file</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">bindings</span><span class="p">::</span><span class="n">drm_file</span><span class="p">,</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="nn">core</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// SAFETY: `open_callback` is only ever called with a valid pointer to a `struct drm_file`.</span>
    <span class="k">let</span> <span class="n">file</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nn">drm</span><span class="p">::</span><span class="nn">File</span><span class="p">::</span><span class="o">&lt;&lt;&lt;</span><span class="n">U</span> <span class="k">as</span> <span class="n">IntoGEMObject</span><span class="o">&gt;</span><span class="p">::</span><span class="n">Driver</span> <span class="k">as</span> <span class="nn">drm</span><span class="p">::</span><span class="n">Driver</span><span class="o">&gt;</span><span class="p">::</span><span class="n">File</span><span class="o">&gt;</span><span class="p">::</span><span class="nf">as_ref</span><span class="p">(</span><span class="n">raw_file</span><span class="p">)</span>
    <span class="p">};</span>
    <span class="c1">// SAFETY: `open_callback` is specified in the AllocOps structure for `Object&lt;T&gt;`, ensuring that</span>
    <span class="c1">// `raw_obj` is indeed contained within a `Object&lt;T&gt;`.</span>
    <span class="k">let</span> <span class="n">obj</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span>
        <span class="o">&lt;&lt;&lt;</span><span class="n">U</span> <span class="k">as</span> <span class="n">IntoGEMObject</span><span class="o">&gt;</span><span class="p">::</span><span class="n">Driver</span> <span class="k">as</span> <span class="nn">drm</span><span class="p">::</span><span class="n">Driver</span><span class="o">&gt;</span><span class="p">::</span><span class="n">Object</span> <span class="k">as</span> <span class="n">IntoGEMObject</span><span class="o">&gt;</span><span class="p">::</span><span class="nf">as_ref</span><span class="p">(</span><span class="n">raw_obj</span><span class="p">)</span>
    <span class="p">};</span>

    <span class="k">match</span> <span class="nn">T</span><span class="p">::</span><span class="nf">open</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">file</span><span class="p">)</span> <span class="p">{</span>
        <span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="n">e</span><span class="nf">.to_errno</span><span class="p">(),</span>
        <span class="nf">Ok</span><span class="p">(())</span> <span class="k">=&gt;</span> <span class="mi">0</span><span class="p">,</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>FFI桥接技巧</strong>：</p>

<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">extern "C"</code></strong>：
    <ul>
      <li>使用C ABI（调用约定）</li>
      <li>C代码可以调用这个函数</li>
    </ul>
  </li>
  <li><strong>unsafe转换</strong>：
    <ul>
      <li><code class="language-plaintext highlighter-rouge">raw_obj</code>和<code class="language-plaintext highlighter-rouge">raw_file</code>是C指针</li>
      <li>转换为Rust引用需要<code class="language-plaintext highlighter-rouge">unsafe</code></li>
      <li>SAFETY注释论证为何安全</li>
    </ul>
  </li>
  <li><strong>错误处理</strong>：
    <ul>
      <li>Rust的<code class="language-plaintext highlighter-rouge">Result&lt;T&gt;</code>转换为C的<code class="language-plaintext highlighter-rouge">int</code></li>
      <li><code class="language-plaintext highlighter-rouge">Err(e) =&gt; e.to_errno()</code>：错误码映射</li>
    </ul>
  </li>
</ol>

<p><strong>这是Rust/C互操作的经典模式</strong>：</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>C kernel → extern "C" fn → unsafe转换 → 安全Rust trait方法 → Result → C错误码
</code></pre></div></div>

<hr />

<h2 id="代码分析5nova驱动对比">代码分析5：Nova驱动对比</h2>

<p>Nova是另一个Rust GPU驱动（Nvidia GSP），结构与Tyr类似。</p>

<h3 id="文件driversgpudrmnovadriverrs部分">文件：<code class="language-plaintext highlighter-rouge">drivers/gpu/drm/nova/driver.rs</code>（部分）</h3>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[vtable]</span>
<span class="k">impl</span> <span class="nn">drm</span><span class="p">::</span><span class="n">Driver</span> <span class="k">for</span> <span class="n">NovaDriver</span> <span class="p">{</span>
    <span class="k">type</span> <span class="n">Data</span> <span class="o">=</span> <span class="n">NovaData</span><span class="p">;</span>
    <span class="k">type</span> <span class="n">File</span> <span class="o">=</span> <span class="n">File</span><span class="p">;</span>
    <span class="k">type</span> <span class="n">Object</span> <span class="o">=</span> <span class="nn">gem</span><span class="p">::</span><span class="n">Object</span><span class="o">&lt;</span><span class="n">NovaObject</span><span class="o">&gt;</span><span class="p">;</span>

    <span class="k">const</span> <span class="n">INFO</span><span class="p">:</span> <span class="nn">drm</span><span class="p">::</span><span class="n">DriverInfo</span> <span class="o">=</span> <span class="n">INFO</span><span class="p">;</span>

    <span class="nn">kernel</span><span class="p">::</span><span class="nd">declare_drm_ioctls!</span> <span class="p">{</span>
        <span class="p">(</span><span class="n">NOVA_GETPARAM</span><span class="p">,</span> <span class="n">drm_nova_getparam</span><span class="p">,</span> <span class="nn">ioctl</span><span class="p">::</span><span class="n">RENDER_ALLOW</span><span class="p">,</span> <span class="nn">File</span><span class="p">::</span><span class="n">get_param</span><span class="p">),</span>
        <span class="p">(</span><span class="n">NOVA_GEM_CREATE</span><span class="p">,</span> <span class="n">drm_nova_gem_create</span><span class="p">,</span> <span class="nn">ioctl</span><span class="p">::</span><span class="n">AUTH</span> <span class="p">|</span> <span class="nn">ioctl</span><span class="p">::</span><span class="n">RENDER_ALLOW</span><span class="p">,</span> <span class="nn">File</span><span class="p">::</span><span class="n">gem_create</span><span class="p">),</span>
        <span class="p">(</span><span class="n">NOVA_GEM_INFO</span><span class="p">,</span> <span class="n">drm_nova_gem_info</span><span class="p">,</span> <span class="nn">ioctl</span><span class="p">::</span><span class="n">AUTH</span> <span class="p">|</span> <span class="nn">ioctl</span><span class="p">::</span><span class="n">RENDER_ALLOW</span><span class="p">,</span> <span class="nn">File</span><span class="p">::</span><span class="n">gem_info</span><span class="p">),</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong><code class="language-plaintext highlighter-rouge">declare_drm_ioctls!</code>宏分析</strong>：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 宏展开后（简化版）</span>
<span class="k">const</span> <span class="n">IOCTLS</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">'static</span> <span class="p">[</span><span class="nn">drm</span><span class="p">::</span><span class="nn">ioctl</span><span class="p">::</span><span class="n">DrmIoctlDescriptor</span><span class="p">]</span> <span class="o">=</span> <span class="o">&amp;</span><span class="p">[</span>
    <span class="nn">drm</span><span class="p">::</span><span class="nn">ioctl</span><span class="p">::</span><span class="n">DrmIoctlDescriptor</span> <span class="p">{</span>
        <span class="n">cmd</span><span class="p">:</span> <span class="nn">drm</span><span class="p">::</span><span class="nn">ioctl</span><span class="p">::</span><span class="nn">IOWR</span><span class="p">::</span><span class="o">&lt;</span><span class="n">drm_nova_getparam</span><span class="o">&gt;</span><span class="p">(</span><span class="n">DRM_COMMAND_BASE</span> <span class="o">+</span> <span class="mi">0</span><span class="p">),</span>
        <span class="n">flags</span><span class="p">:</span> <span class="nn">ioctl</span><span class="p">::</span><span class="n">RENDER_ALLOW</span><span class="p">,</span>
        <span class="n">func</span><span class="p">:</span> <span class="n">nova_get_param_wrapper</span><span class="p">,</span>  <span class="c1">// 自动生成的C包装器</span>
    <span class="p">},</span>
    <span class="c1">// ...</span>
<span class="p">];</span>
</code></pre></div></div>

<p><strong>自动生成的工作</strong>：</p>
<ol>
  <li>计算ioctl号（<code class="language-plaintext highlighter-rouge">_IOWR</code>宏）</li>
  <li>生成C→Rust的包装函数</li>
  <li>类型安全检查（编译时）</li>
</ol>

<p><strong>对比C版本</strong>（手动）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define DRM_NOVA_GETPARAM 0x00
#define DRM_IOCTL_NOVA_GETPARAM \
    DRM_IOWR(DRM_COMMAND_BASE + DRM_NOVA_GETPARAM, struct drm_nova_getparam)
</span>
<span class="k">static</span> <span class="k">const</span> <span class="k">struct</span> <span class="n">drm_ioctl_desc</span> <span class="n">nova_ioctls</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
    <span class="n">DRM_IOCTL_DEF_DRV</span><span class="p">(</span><span class="n">NOVA_GETPARAM</span><span class="p">,</span> <span class="n">nova_get_param</span><span class="p">,</span> <span class="n">DRM_RENDER_ALLOW</span><span class="p">),</span>
    <span class="c1">// 魔数0x00容易重复或冲突</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Rust的宏：</p>
<ul>
  <li>自动分配ioctl号（按顺序）</li>
  <li>类型检查：<code class="language-plaintext highlighter-rouge">drm_nova_getparam</code>必须存在</li>
  <li>编译时验证<code class="language-plaintext highlighter-rouge">File::get_param</code>签名</li>
</ul>

<hr />

<h2 id="为何上游代码如此精简gpuvm抽象缺失">为何上游代码如此精简？GPUVM抽象缺失</h2>

<p>回到最核心的问题：<strong>为何Tyr上游只能查询GPU信息，无法启动MCU？</strong></p>

<h3 id="commit-message的关键解释">Commit message的关键解释<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>：</h3>

<blockquote>
  <p>In particular, a lot of things depend on properly mapping memory on a given VA range, which itself <strong>depends on the GPUVM abstraction that is currently work-in-progress</strong>. For this reason, we still cannot boot the MCU.</p>
</blockquote>

<h3 id="技术分解">技术分解</h3>

<p><strong>启动MCU需要什么？</strong></p>

<ol>
  <li><strong>分配GPU内存</strong>：存放MCU固件（数百KB）</li>
  <li><strong>映射到GPU虚拟地址</strong>：MCU通过VA访问内存</li>
  <li><strong>配置MCU寄存器</strong>：设置入口地址</li>
  <li><strong>启动MCU</strong>：发送启动命令</li>
</ol>

<p><strong>当前Tyr能做什么？</strong></p>

<ul>
  <li>✅ <strong>步骤1</strong>：分配物理内存（通过GEM）</li>
  <li>❌ <strong>步骤2</strong>：映射到GPU VA（需要GPUVM抽象）</li>
  <li>❌ <strong>步骤3-4</strong>：后续全阻塞</li>
</ul>

<h3 id="gpuvm抽象是什么">GPUVM抽象是什么？</h3>

<p><strong>C实现</strong>（<code class="language-plaintext highlighter-rouge">drivers/gpu/drm/drm_gpuvm.c</code>）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/**
 * DOC: Overview
 *
 * The GPU VA Manager, represented by struct drm_gpuvm, keeps track of a
 * GPU's virtual address (VA) space and manages the corresponding virtual
 * mappings represented by &amp;drm_gpuva objects.
 *
 * The DRM GPUVM tracks GPU VA space with &amp;drm_gpuva objects backed by a
 * &amp;drm_gem_object representing the actual memory backing the VA range.
 */</span>
<span class="k">struct</span> <span class="n">drm_gpuvm</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">drm_gem_object</span> <span class="o">*</span><span class="n">r_obj</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">drm_device</span> <span class="o">*</span><span class="n">drm</span><span class="p">;</span>
    <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">name</span><span class="p">;</span>

    <span class="k">struct</span> <span class="n">rb_root_cached</span> <span class="n">rb</span><span class="p">;</span>  <span class="c1">// 红黑树，存储VA映射</span>
    <span class="c1">// ...</span>
<span class="p">};</span>
</code></pre></div></div>

<p><strong>Rust需要什么？</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 理想的GPUVM Rust API（概念性）</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">GpuVm</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="nn">drm</span><span class="p">::</span><span class="n">Driver</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">inner</span><span class="p">:</span> <span class="n">Opaque</span><span class="o">&lt;</span><span class="nn">bindings</span><span class="p">::</span><span class="n">drm_gpuvm</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">_phantom</span><span class="p">:</span> <span class="n">PhantomData</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="nn">drm</span><span class="p">::</span><span class="n">Driver</span><span class="o">&gt;</span> <span class="n">GpuVm</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="cd">/// 映射GEM对象到GPU虚拟地址</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">map</span><span class="p">(</span>
        <span class="o">&amp;</span><span class="k">self</span><span class="p">,</span>
        <span class="n">gem_obj</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">gem</span><span class="p">::</span><span class="n">Object</span><span class="o">&lt;...&gt;</span><span class="p">,</span>
        <span class="n">va</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
        <span class="n">size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
    <span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="n">GpuVa</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// 调用C的drm_gpuva_insert()</span>
    <span class="p">}</span>

    <span class="cd">/// 取消映射</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">unmap</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">va</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">GpuVa</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
        <span class="c1">// 调用C的drm_gpuva_remove()</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>问题</strong>：</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">drm_gpuvm</code>结构体复杂</li>
  <li>涉及红黑树、引用计数、锁</li>
  <li>Rust封装需要保证<strong>内存安全</strong>和<strong>生命周期正确</strong></li>
</ul>

<h3 id="alice-ryhl的工作">Alice Ryhl的工作</h3>

<p>根据新闻报道和commit message，<strong>Alice Ryhl正在开发GPUVM的Rust抽象</strong>，基于Asahi Lina的前期工作。</p>

<p><strong>挑战</strong>：</p>
<ol>
  <li><strong>生命周期管理</strong>：GEM对象和VA映射的关系</li>
  <li><strong>锁顺序</strong>：避免死锁（C代码有隐式锁顺序）</li>
  <li><strong>红黑树抽象</strong>：Rust需要安全的树操作</li>
</ol>

<p>这是<strong>高难度的内核Rust工作</strong>，需要深入理解C实现和Rust所有权模型。</p>

<hr />

<h2 id="技术洞察从tyr学到的经验">技术洞察：从Tyr学到的经验</h2>

<h3 id="1-类型状态模式的威力">1. 类型状态模式的威力</h3>

<p><strong>电源调节器示例</strong>：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="n">Regulator</span><span class="o">&lt;</span><span class="n">S</span><span class="p">:</span> <span class="n">State</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">inner</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">bindings</span><span class="p">::</span><span class="n">regulator</span><span class="p">,</span>
    <span class="n">_state</span><span class="p">:</span> <span class="n">PhantomData</span><span class="o">&lt;</span><span class="n">S</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">pub</span> <span class="k">struct</span> <span class="n">Enabled</span><span class="p">;</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">Disabled</span><span class="p">;</span>

<span class="k">impl</span> <span class="n">Regulator</span><span class="o">&lt;</span><span class="n">Disabled</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">enable</span><span class="p">(</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="n">Regulator</span><span class="o">&lt;</span><span class="n">Enabled</span><span class="o">&gt;&gt;</span> <span class="p">{</span>
        <span class="c1">// unsafe调用C API</span>
        <span class="c1">// 转换到Enabled状态</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">Regulator</span><span class="o">&lt;</span><span class="n">Enabled</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">set_voltage</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">min_uV</span><span class="p">:</span> <span class="nb">i32</span><span class="p">,</span> <span class="n">max_uV</span><span class="p">:</span> <span class="nb">i32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
        <span class="c1">// 只有Enabled状态才能设置电压</span>
    <span class="p">}</span>

    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">disable</span><span class="p">(</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="n">Regulator</span><span class="o">&lt;</span><span class="n">Disabled</span><span class="o">&gt;&gt;</span> <span class="p">{</span>
        <span class="c1">// 转换回Disabled状态</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// 编译错误：Disabled状态没有set_voltage方法</span>
<span class="k">let</span> <span class="n">reg</span> <span class="o">=</span> <span class="nn">Regulator</span><span class="p">::</span><span class="o">&lt;</span><span class="n">Disabled</span><span class="o">&gt;</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="o">...</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="n">reg</span><span class="nf">.set_voltage</span><span class="p">(</span><span class="mi">1000000</span><span class="p">,</span> <span class="mi">1000000</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>  <span class="c1">// ❌ 编译失败！</span>

<span class="c1">// 正确用法</span>
<span class="k">let</span> <span class="n">reg</span> <span class="o">=</span> <span class="n">reg</span><span class="nf">.enable</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>  <span class="c1">// 转换到Enabled</span>
<span class="n">reg</span><span class="nf">.set_voltage</span><span class="p">(</span><span class="mi">1000000</span><span class="p">,</span> <span class="mi">1000000</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>  <span class="c1">// ✅ 编译通过</span>
</code></pre></div></div>

<p><strong>优势</strong>：</p>
<ul>
  <li><strong>编译时防止错误状态操作</strong></li>
  <li><strong>零运行时开销</strong>：<code class="language-plaintext highlighter-rouge">PhantomData&lt;S&gt;</code>不占内存</li>
  <li><strong>自文档化</strong>：类型签名即文档</li>
</ul>

<p>C中完全没有这种保证：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">regulator</span> <span class="o">*</span><span class="n">reg</span> <span class="o">=</span> <span class="n">regulator_get</span><span class="p">(...);</span>
<span class="c1">// 忘记enable</span>
<span class="n">regulator_set_voltage</span><span class="p">(</span><span class="n">reg</span><span class="p">,</span> <span class="mi">1000000</span><span class="p">,</span> <span class="mi">1000000</span><span class="p">);</span>  <span class="c1">// 运行时错误或崩溃！</span>
</code></pre></div></div>

<h3 id="2-raii消除资源泄漏">2. RAII消除资源泄漏</h3>

<p><strong>时钟管理示例</strong>：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span>
    <span class="k">let</span> <span class="n">clk</span> <span class="o">=</span> <span class="nn">Clk</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="nf">Some</span><span class="p">(</span><span class="nd">c_str!</span><span class="p">(</span><span class="s">"core"</span><span class="p">)))</span><span class="o">?</span><span class="p">;</span>
    <span class="n">clk</span><span class="nf">.prepare_enable</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>

    <span class="nf">do_work</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>  <span class="c1">// 即使这里失败提前返回</span>

    <span class="c1">// clk离开作用域，自动调用Drop</span>
<span class="p">}</span> <span class="c1">// &lt;- 这里自动disable+unprepare</span>
</code></pre></div></div>

<p><strong>Drop trait实现</strong>（简化）：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span> <span class="nb">Drop</span> <span class="k">for</span> <span class="n">Clk</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">drop</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">unsafe</span> <span class="p">{</span>
            <span class="nn">bindings</span><span class="p">::</span><span class="nf">clk_disable_unprepare</span><span class="p">(</span><span class="k">self</span><span class="py">.inner</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>C版本的问题</strong>：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ret</span> <span class="o">=</span> <span class="n">clk_prepare_enable</span><span class="p">(</span><span class="n">clk</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ret</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>

<span class="n">ret</span> <span class="o">=</span> <span class="n">do_work</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ret</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// 忘记cleanup！</span>
    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>  <span class="c1">// 时钟泄漏</span>
<span class="p">}</span>

<span class="n">clk_disable_unprepare</span><span class="p">(</span><span class="n">clk</span><span class="p">);</span>  <span class="c1">// 只有成功路径执行</span>
</code></pre></div></div>

<p><strong>统计数据</strong>（来自前文）：</p>
<ul>
  <li>内核CVE中，<strong>~70%是内存/资源管理错误</strong></li>
  <li>RAII在编译时消除这类错误</li>
</ul>

<h3 id="3-错误传播的简洁性">3. 错误传播的简洁性</h3>

<p><strong>Rust的<code class="language-plaintext highlighter-rouge">?</code>运算符</strong>：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">initialize</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">clk</span> <span class="o">=</span> <span class="nn">Clk</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="nf">Some</span><span class="p">(</span><span class="nd">c_str!</span><span class="p">(</span><span class="s">"core"</span><span class="p">)))</span><span class="o">?</span><span class="p">;</span>  <span class="c1">// 失败则返回</span>
    <span class="k">let</span> <span class="n">reg</span> <span class="o">=</span> <span class="nn">Regulator</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="nd">c_str!</span><span class="p">(</span><span class="s">"mali"</span><span class="p">))</span><span class="o">?</span><span class="p">;</span>  <span class="c1">// 失败则返回</span>
    <span class="k">let</span> <span class="n">iomem</span> <span class="o">=</span> <span class="nf">iomap</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>  <span class="c1">// 失败则返回</span>

    <span class="c1">// 全部成功才继续</span>
    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>C版本</strong>：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">initialize</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">clk</span> <span class="o">=</span> <span class="n">clk_get</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="s">"core"</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">IS_ERR</span><span class="p">(</span><span class="n">clk</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">PTR_ERR</span><span class="p">(</span><span class="n">clk</span><span class="p">);</span>
        <span class="k">goto</span> <span class="n">err_clk</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">reg</span> <span class="o">=</span> <span class="n">regulator_get</span><span class="p">(</span><span class="n">dev</span><span class="p">,</span> <span class="s">"mali"</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">IS_ERR</span><span class="p">(</span><span class="n">reg</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">PTR_ERR</span><span class="p">(</span><span class="n">reg</span><span class="p">);</span>
        <span class="k">goto</span> <span class="n">err_reg</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">iomem</span> <span class="o">=</span> <span class="n">ioremap</span><span class="p">(...);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">iomem</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span>
        <span class="k">goto</span> <span class="n">err_iomem</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>

<span class="nl">err_iomem:</span>
    <span class="n">regulator_put</span><span class="p">(</span><span class="n">reg</span><span class="p">);</span>
<span class="nl">err_reg:</span>
    <span class="n">clk_put</span><span class="p">(</span><span class="n">clk</span><span class="p">);</span>
<span class="nl">err_clk:</span>
    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>差异</strong>：</p>
<ul>
  <li>Rust：4行</li>
  <li>C：25行（含错误处理）</li>
  <li>Rust的RAII自动cleanup，无需<code class="language-plaintext highlighter-rouge">goto</code></li>
</ul>

<h3 id="4-ffi安全边界的明确化">4. FFI安全边界的明确化</h3>

<p>Tyr代码中，<strong>所有unsafe都在特定位置</strong>：</p>

<ol>
  <li><strong>寄存器读写</strong>：<code class="language-plaintext highlighter-rouge">regs::XXX.read()</code>内部</li>
  <li><strong>C结构体转换</strong>：<code class="language-plaintext highlighter-rouge">as_ref()</code>方法</li>
  <li><strong>引用计数操作</strong>：<code class="language-plaintext highlighter-rouge">drm_gem_object_get/put</code></li>
</ol>

<p><strong>驱动代码本身几乎全是安全Rust</strong>：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/gpu/drm/tyr/driver.rs - probe函数</span>
<span class="c1">// 没有任何unsafe！</span>
<span class="k">fn</span> <span class="nf">probe</span><span class="p">(</span><span class="n">pdev</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">platform</span><span class="p">::</span><span class="n">Device</span><span class="o">&lt;</span><span class="n">Core</span><span class="o">&gt;</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">Pin</span><span class="o">&lt;</span><span class="n">KBox</span><span class="o">&lt;</span><span class="k">Self</span><span class="o">&gt;&gt;&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">core_clk</span> <span class="o">=</span> <span class="nn">Clk</span><span class="p">::</span><span class="nf">get</span><span class="p">(</span><span class="n">pdev</span><span class="nf">.as_ref</span><span class="p">(),</span> <span class="nf">Some</span><span class="p">(</span><span class="nd">c_str!</span><span class="p">(</span><span class="s">"core"</span><span class="p">)))</span><span class="o">?</span><span class="p">;</span>
    <span class="n">core_clk</span><span class="nf">.prepare_enable</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
    <span class="c1">// ... 全部安全代码</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>unsafe集中在抽象层</strong>：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/drm/gem/mod.rs</span>
<span class="k">unsafe</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="n">IntoGEMObject</span><span class="o">&gt;</span> <span class="n">AlwaysRefCounted</span> <span class="k">for</span> <span class="n">T</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">inc_ref</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">bindings</span><span class="p">::</span><span class="nf">drm_gem_object_get</span><span class="p">(</span><span class="k">self</span><span class="nf">.as_raw</span><span class="p">())</span> <span class="p">};</span>
        <span class="c1">// ^^^ unsafe在这里，驱动无需接触</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>这是<strong>Rust在内核的核心价值</strong>：</p>
<ul>
  <li>驱动开发者：写安全代码</li>
  <li>抽象层维护者：处理unsafe，详细论证安全性</li>
</ul>

<hr />

<h2 id="与已有blog的体系关联">与已有Blog的体系关联</h2>

<h3 id="blog1rust-in-the-linux-kernel---reality-check">Blog1：Rust in the Linux Kernel - Reality Check<sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></h3>

<p><strong>该文关注</strong>：</p>
<ul>
  <li>宏观数据：338个Rust文件，135,662行代码</li>
  <li>Android Binder案例：18文件，~8,000行</li>
  <li>GPU驱动：Nova（47文件，~15,000行）</li>
</ul>

<p><strong>本文补充</strong>：</p>
<ul>
  <li>Tyr的<strong>具体代码实现</strong></li>
  <li>DRM抽象层的<strong>实际工作原理</strong></li>
  <li>Nova的<strong>IOCTL宏展开</strong></li>
</ul>

<h3 id="blog2rust-and-linux-kernel-abi-stability">Blog2：Rust and Linux Kernel ABI Stability<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></h3>

<p><strong>该文关注</strong>：</p>
<ul>
  <li>用户空间ABI稳定性</li>
  <li><code class="language-plaintext highlighter-rouge">#[repr(C)]</code>的保证</li>
  <li>System V ABI兼容性</li>
</ul>

<p><strong>本文补充</strong>：</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">GpuInfo</code>的<code class="language-plaintext highlighter-rouge">#[repr(C)]</code>实战应用</li>
  <li>ioctl处理的FFI桥接</li>
  <li>C/Rust互操作的实际代码</li>
</ul>

<h3 id="形成的知识体系">形成的知识体系</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Blog1 (宏观) → Blog2 (ABI) → Blog3 (代码实战)
     ↓              ↓                ↓
  数据统计      技术保证        具体实现
  政策争议      接口规范        挑战分析
  整体趋势      系统设计        代码细节
</code></pre></div></div>

<p>三篇文章从<strong>不同角度</strong>完整覆盖了Rust在Linux内核的状态。</p>

<hr />

<h2 id="未来展望tyr的roadmap">未来展望：Tyr的Roadmap</h2>

<h3 id="短期2026年上半年">短期（2026年上半年）</h3>

<p><strong>依赖的抽象层</strong>（根据commit message）：</p>
<ol>
  <li>✅ GEM shmem（Lyude Paul负责）</li>
  <li>✅ GPUVM（Alice Ryhl负责）</li>
  <li>✅ io-pgtable（Alice Ryhl负责）</li>
</ol>

<p><strong>期望效果</strong>（原文）<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>：</p>

<blockquote>
  <p>Once we can handle those items, we expect to quickly become able to boot the GPU firmware and then progress unhindered until it is time to discuss job submission.</p>
</blockquote>

<h3 id="中期2026-2027">中期（2026-2027）</h3>

<p><strong>整合Nova的贡献</strong>：</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">register!</code>宏：类型安全的寄存器访问</li>
  <li>Bounded integers：编译时范围检查</li>
</ul>

<p><strong>完善功能</strong>：</p>
<ul>
  <li>电源管理（DVFS）</li>
  <li>GPU恢复机制</li>
  <li>通过Vulkan CTS</li>
</ul>

<h3 id="长期2027">长期（2027+）</h3>

<p><strong>JobQueue架构</strong>：</p>
<ul>
  <li>替代<code class="language-plaintext highlighter-rouge">drm_gpu_scheduler</code></li>
  <li><strong>首个C驱动可调用的Rust组件</strong></li>
  <li>双向互操作的里程碑</li>
</ul>

<hr />

<h2 id="结论代码层面的洞察">结论：代码层面的洞察</h2>

<p>通过解剖Tyr项目的实际代码，我们得到了<strong>超越宏观讨论的具体认识</strong>：</p>

<h3 id="技术层面">技术层面</h3>

<ol>
  <li><strong>Rust的类型系统价值</strong>：
    <ul>
      <li>类型状态模式（Regulator<Enabled>）</Enabled></li>
      <li>编译时状态机（设备初始化）</li>
      <li>RAII资源管理（时钟、锁）</li>
    </ul>
  </li>
  <li><strong>FFI互操作的实践</strong>：
    <ul>
      <li><code class="language-plaintext highlighter-rouge">extern "C"</code>的C ABI桥接</li>
      <li><code class="language-plaintext highlighter-rouge">#[repr(C)]</code>的ABI兼容</li>
      <li>SAFETY注释的严格论证</li>
    </ul>
  </li>
  <li><strong>抽象层的分层设计</strong>：
    <ul>
      <li>驱动层：安全Rust</li>
      <li>抽象层：处理unsafe</li>
      <li>C层：bindings自动生成</li>
    </ul>
  </li>
</ol>

<h3 id="挑战层面">挑战层面</h3>

<ol>
  <li><strong>基础设施缺失的实际影响</strong>：
    <ul>
      <li>GPUVM抽象→无法启动MCU</li>
      <li><code class="language-plaintext highlighter-rouge">read_poll_timeout()</code>缺失→用固定延迟</li>
      <li>工具链不成熟→<code class="language-plaintext highlighter-rouge">Send/Sync</code> workaround</li>
    </ul>
  </li>
  <li><strong>上游策略的务实性</strong>：
    <ul>
      <li>不再C+Rust混合（失败过）</li>
      <li>分阶段上游（避免下游分叉）</li>
      <li>与Nova/rvkms协同演进</li>
    </ul>
  </li>
</ol>

<h3 id="对开发者的启示">对开发者的启示</h3>

<ol>
  <li><strong>学习路径</strong>：
    <ul>
      <li>先掌握Rust基础（所有权、生命周期）</li>
      <li>学习内核概念（DRM、GEM、GPUVM）</li>
      <li>阅读实际代码（Tyr、Nova、Asahi）</li>
    </ul>
  </li>
  <li><strong>贡献机会</strong>：
    <ul>
      <li>GPUVM抽象开发</li>
      <li>其他DRM抽象补全</li>
      <li>Tyr驱动功能实现</li>
    </ul>
  </li>
  <li><strong>技术趋势</strong>：
    <ul>
      <li>Rust在DRM子系统的采用不可逆</li>
      <li>基础设施建设是当前瓶颈</li>
      <li>2027年可能禁止新C驱动<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></li>
    </ul>
  </li>
</ol>

<p><strong>Rust在Linux内核已经从”实验”进入”生产”，Tyr项目是这一转变的代码级见证。</strong></p>

<h2 id="参考资料">参考资料</h2>

<ul>
  <li><a href="https://devclass.com/2025/12/15/rust-boosted-by-permanent-adoption-for-linux-kernel-code/">Rust boosted by permanent adoption for Linux kernel code</a> - DevClass, 2025-12-15</li>
  <li><a href="https://blog.desdelinux.net/en/linux-kernel-rust-official-android-16-drivers-drm-debate/">Rust is here to stay: the experimental phase in the Linux Kernel has ended</a> - DesdeLinux Blog, 2025</li>
  <li><a href="https://www.osnews.com/story/144392/the-future-for-tyr/">The future for Tyr – OSnews</a> - OSnews转载LWN文章</li>
</ul>

<p><strong>代码仓库</strong>：</p>
<ul>
  <li>Linux Kernel: <code class="language-plaintext highlighter-rouge">/Users/weli/works/linux</code>（本地分析用）</li>
  <li>官方仓库：https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git</li>
  <li>DRM Rust Tree: https://gitlab.freedesktop.org/drm/rust/kernel</li>
</ul>

<p><strong>相关项目</strong>：</p>
<ul>
  <li><a href="https://www.collabora.com/news-and-blog/news-and-events/introducing-tyr-a-new-rust-drm-driver.html">Collabora: Introducing Tyr</a> - 官方介绍</li>
  <li><a href="https://rust-for-linux.com/">Rust for Linux</a> - 官方项目网站</li>
</ul>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="/2026/02/16/rust-in-linux-kernel-reality-check.html">Rust in the Linux Kernel: A Reality Check from Code to Controversy</a> - 本系列第一篇 <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="/2026/02/16/rust-kernel-abi-stability-analysis.html">Rust and Linux Kernel ABI Stability: A Technical Deep Dive</a> - 本系列第二篇 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>Linux Kernel Git Commit <code class="language-plaintext highlighter-rouge">cf4fd52e3236</code> - “rust: drm: Introduce the Tyr driver for Arm Mali GPUs”, Daniel Almeida, 2025-09-10. 可通过<code class="language-plaintext highlighter-rouge">git show cf4fd52e3236</code>查看完整commit message。 <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Dave Airlie在2025 Maintainers Summit的声明，报道来源： <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[2025年9月，Linux内核合并了首个Rust GPU驱动Tyr（commit cf4fd52e3236），标志着Rust在内核图形子系统的正式落地。本文通过剖析Tyr的实际代码，展示Rust GPU驱动的架构设计、DRM抽象层的具体实现，以及从Panthor（C）移植到Tyr（Rust）的关键挑战。这是Rust在Linux内核从抽象到实战的完整技术案例。]]></summary></entry><entry><title type="html">Can C++ Enter the Linux Kernel? A Technical and Historical Analysis</title><link href="https://weinan.io/2026/02/16/can-cpp-enter-linux-kernel.html" rel="alternate" type="text/html" title="Can C++ Enter the Linux Kernel? A Technical and Historical Analysis" /><published>2026-02-16T00:00:00+00:00</published><updated>2026-02-16T00:00:00+00:00</updated><id>https://weinan.io/2026/02/16/can-cpp-enter-linux-kernel</id><content type="html" xml:base="https://weinan.io/2026/02/16/can-cpp-enter-linux-kernel.html"><![CDATA[<p>With Rust successfully entering the Linux kernel as the second language after C, a natural question arises: could C++ have been chosen instead, or could it still enter the kernel in the future? This comprehensive analysis examines the technical barriers, historical context, and fundamental design conflicts that make C++ adoption in the Linux kernel highly unlikely, despite C++ being a mature and widely-used systems programming language.</p>

<h2 id="introduction-the-elephant-in-the-room">Introduction: The Elephant in the Room</h2>

<p>Rust’s successful integration into the Linux kernel raises an intriguing counterfactual: <strong>Why not C++?</strong> After all, C++ has:</p>
<ul>
  <li>✅ Decades of maturity (1985 vs Rust’s 2015)</li>
  <li>✅ RAII for automatic resource management</li>
  <li>✅ Rich abstraction capabilities</li>
  <li>✅ Massive developer ecosystem</li>
  <li>✅ Modern safety features (<code class="language-plaintext highlighter-rouge">std::unique_ptr</code>, <code class="language-plaintext highlighter-rouge">std::optional</code>, etc.)</li>
</ul>

<p>Yet C++ has <strong>never</strong> been seriously considered for the Linux kernel, while the younger Rust was accepted after just 2 years of development (2020-2022). This document examines why.</p>

<h2 id="executive-summary">Executive Summary</h2>

<p><strong>Likelihood of C++ entering the Linux kernel: &lt; 5%</strong></p>

<p><strong>Key barriers:</strong></p>
<ol>
  <li><strong>Political</strong>: Linus Torvalds’ explicit, sustained opposition (2004-present)</li>
  <li><strong>Technical</strong>: Exception handling, hidden allocations, lack of memory safety guarantees</li>
  <li><strong>Timing</strong>: Rust already occupies the “second language” niche</li>
  <li><strong>Engineering</strong>: No team investing effort, no killer use case</li>
  <li><strong>Philosophy</strong>: Fundamental design conflicts with kernel requirements</li>
</ol>

<h2 id="historical-context-linus-torvalds-stance-on-c">Historical Context: Linus Torvalds’ Stance on C++</h2>

<h3 id="the-2004-email-that-set-the-tone">The 2004 Email That Set the Tone</h3>

<p>On January 19, 2004, Linus Torvalds responded to a question about compiling C++ kernel modules<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<blockquote>
  <p><strong>“It sucks. Trust me - writing kernel code in C++ is a BLOODY STUPID IDEA.”</strong></p>

  <p><em>“The whole C++ exception handling thing is fundamentally broken. It’s _especially_ broken for kernels.”</em></p>

  <p><em>“Any compiler or language that likes to hide things like memory allocations behind your back just isn’t a good choice for a kernel.”</em></p>
</blockquote>

<h3 id="the-2007-git-mailing-list-expansion">The 2007 Git Mailing List Expansion</h3>

<p>In 2007, Linus elaborated his position on the Git mailing list<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>:</p>

<blockquote>
  <p><em>“C++ leads to really really bad design choices. You invariably start using the ‘nice’ library features of the language like STL and Boost and other total and utter crap, that may ‘help’ you program, but causes… inefficient abstracted programming models where two years down the road you notice that some abstraction wasn’t very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.”</em></p>
</blockquote>

<h3 id="has-the-stance-changed-in-20-years">Has the Stance Changed in 20 Years?</h3>

<p><strong>No.</strong> As of 2026, there has been <strong>zero</strong> movement toward C++ acceptance in the kernel community. Meanwhile, Rust went from proposal (2020) to “permanent core language” status (2025)<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>

<h2 id="technical-barrier-analysis">Technical Barrier Analysis</h2>

<h3 id="barrier-1-exception-handling">Barrier 1: Exception Handling</h3>

<p><strong>The Problem:</strong></p>

<p>C++ exceptions introduce non-local control flow that is fundamentally incompatible with kernel programming requirements.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C++ exception example</span>
<span class="kt">void</span> <span class="nf">kernel_function</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">auto</span> <span class="n">buffer</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_unique</span><span class="o">&lt;</span><span class="n">KernelBuffer</span><span class="o">&gt;</span><span class="p">(</span><span class="n">size</span><span class="p">);</span>
    <span class="c1">// ^-- Constructor might throw</span>

    <span class="n">do_critical_work</span><span class="p">(</span><span class="n">buffer</span><span class="p">.</span><span class="n">get</span><span class="p">());</span>
    <span class="c1">// ^-- Might throw exception</span>

    <span class="c1">// If exception is thrown:</span>
    <span class="c1">// 1. Stack unwinding occurs</span>
    <span class="c1">// 2. Destructors are called (but what about interrupt context?)</span>
    <span class="c1">// 3. Exception tables increase binary size</span>
    <span class="c1">// 4. Performance becomes unpredictable</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Kernel Requirements:</strong></p>

<ul>
  <li><strong>Deterministic behavior</strong>: Every code path must be predictable</li>
  <li><strong>No surprise jumps</strong>: Control flow must be explicit and traceable</li>
  <li><strong>Minimal binary size</strong>: No room for exception tables</li>
  <li><strong>Interrupt safety</strong>: Code in interrupt context cannot handle exceptions</li>
</ul>

<p><strong>Academic Evidence:</strong></p>

<p>Research from the University of Edinburgh (2019) demonstrated that even optimized C++ exception implementations impose significant code size and runtime overhead in embedded systems<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. More recent work from the University of St Andrews (2025) showed that C++ exception propagation across user/kernel boundaries requires special ABI support, increasing system complexity<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>.</p>

<p><strong>Comparison with Rust:</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Rust equivalent - no exceptions, explicit error handling</span>
<span class="k">fn</span> <span class="nf">kernel_function</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">()</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">buffer</span> <span class="o">=</span> <span class="nn">KernelBuffer</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">size</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="c1">// ^-- Explicit error propagation with '?'</span>

    <span class="nf">do_critical_work</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buffer</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="c1">// ^-- Explicit error handling, no hidden control flow</span>

    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span> <span class="c1">// buffer automatically dropped, no exceptions needed</span>
</code></pre></div></div>

<p><strong>Could C++ disable exceptions?</strong></p>

<p>Yes, with <code class="language-plaintext highlighter-rouge">-fno-exceptions</code>. However:</p>
<ol>
  <li>Much of C++’s design assumes exceptions exist</li>
  <li>Standard library becomes awkward without exceptions</li>
  <li>Error handling becomes manual (back to C-style)</li>
  <li>You lose a key C++ feature while keeping the complexity</li>
</ol>

<h3 id="barrier-2-hidden-memory-allocations">Barrier 2: Hidden Memory Allocations</h3>

<p><strong>The Problem:</strong></p>

<p>The kernel requires <strong>explicit, tagged memory allocations</strong> to handle different contexts:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C kernel code - explicit allocation with flags</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">buf</span> <span class="o">=</span> <span class="n">kmalloc</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">GFP_KERNEL</span><span class="p">);</span>     <span class="c1">// Can sleep</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">buf</span> <span class="o">=</span> <span class="n">kmalloc</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">GFP_ATOMIC</span><span class="p">);</span>     <span class="c1">// Atomic context</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">buf</span> <span class="o">=</span> <span class="n">kmalloc</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">GFP_NOWAIT</span><span class="p">);</span>     <span class="c1">// Non-blocking</span>
</code></pre></div></div>

<p><strong>C++ hides allocations:</strong></p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C++ - when does allocation happen? With what flags?</span>
<span class="k">class</span> <span class="nc">KernelBuffer</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="kt">uint8_t</span><span class="o">&gt;</span> <span class="n">data</span><span class="p">;</span>  <span class="c1">// Hidden heap allocation!</span>
    <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">name</span><span class="p">;</span>           <span class="c1">// Hidden heap allocation!</span>
<span class="nl">public:</span>
    <span class="n">KernelBuffer</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">size</span><span class="p">)</span>
        <span class="o">:</span> <span class="n">data</span><span class="p">(</span><span class="n">size</span><span class="p">)</span>            <span class="c1">// Allocates here - but with what GFP_* ?</span>
        <span class="p">,</span> <span class="n">name</span><span class="p">(</span><span class="s">"buffer"</span><span class="p">)</span> <span class="p">{}</span>     <span class="c1">// Another hidden allocation</span>
<span class="p">};</span>

<span class="kt">void</span> <span class="n">function</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">KernelBuffer</span> <span class="n">buf</span><span class="p">(</span><span class="mi">1024</span><span class="p">);</span>     <span class="c1">// Can this sleep? Is it atomic-safe?</span>
    <span class="c1">// Impossible to know without diving into implementation</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Linus’s 2004 statement remains valid:</strong></p>

<blockquote>
  <p><em>“Any compiler or language that likes to hide things like memory allocations behind your back just isn’t a good choice for a kernel.”</em></p>
</blockquote>

<p><strong>Rust’s explicit approach:</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Rust - all allocations are explicit</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">KernelBuffer</span> <span class="p">{</span>
    <span class="n">data</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">u8</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">KernelBuffer</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">flags</span><span class="p">:</span> <span class="n">Flags</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// Explicit allocation with explicit flags</span>
        <span class="k">let</span> <span class="n">data</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">try_with_capacity_in</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">flags</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="nf">Ok</span><span class="p">(</span><span class="k">Self</span> <span class="p">{</span> <span class="n">data</span> <span class="p">})</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Usage</span>
<span class="k">let</span> <span class="n">buf</span> <span class="o">=</span> <span class="nn">KernelBuffer</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">1024</span><span class="p">,</span> <span class="n">GFP_KERNEL</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="c1">// ^-- Crystal clear: allocation happens here, with GFP_KERNEL</span>
</code></pre></div></div>

<h3 id="barrier-3-no-memory-safety-guarantees">Barrier 3: No Memory Safety Guarantees</h3>

<p><strong>The Core Issue:</strong></p>

<p>C++ provides <strong>the same memory safety guarantees as C: none.</strong></p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C++ - still vulnerable to use-after-free</span>
<span class="n">KernelData</span><span class="o">*</span> <span class="n">data</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">KernelData</span><span class="p">();</span>
<span class="k">delete</span> <span class="n">data</span><span class="p">;</span>
<span class="n">use_data</span><span class="p">(</span><span class="n">data</span><span class="p">);</span>  <span class="c1">// ❌ Use-after-free - compiler won't catch this</span>

<span class="c1">// Still vulnerable to data races</span>
<span class="kt">void</span> <span class="nf">thread1</span><span class="p">()</span> <span class="p">{</span> <span class="n">global_data</span><span class="o">-&gt;</span><span class="n">value</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span>  <span class="c1">// ❌ Race condition</span>
<span class="kt">void</span> <span class="n">thread2</span><span class="p">()</span> <span class="p">{</span> <span class="n">global_data</span><span class="o">-&gt;</span><span class="n">value</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span> <span class="p">}</span>  <span class="c1">// Compiler won't catch</span>

<span class="c1">// Still vulnerable to null pointer dereferences</span>
<span class="n">KernelData</span><span class="o">*</span> <span class="n">data</span> <span class="o">=</span> <span class="n">get_data</span><span class="p">();</span>  <span class="c1">// Might return nullptr</span>
<span class="n">data</span><span class="o">-&gt;</span><span class="n">process</span><span class="p">();</span>                 <span class="c1">// ❌ Potential null deref</span>
</code></pre></div></div>

<p><strong>Rust’s compile-time guarantees:</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Rust - use-after-free is impossible</span>
<span class="k">let</span> <span class="n">data</span> <span class="o">=</span> <span class="nn">Box</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">KernelData</span><span class="p">::</span><span class="nf">new</span><span class="p">());</span>
<span class="nf">drop</span><span class="p">(</span><span class="n">data</span><span class="p">);</span>
<span class="nf">use_data</span><span class="p">(</span><span class="n">data</span><span class="p">);</span>  <span class="c1">// ✅ Compile error: value used after move</span>

<span class="c1">// Data races are impossible</span>
<span class="k">fn</span> <span class="nf">thread1</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Data</span><span class="p">)</span> <span class="p">{</span> <span class="n">data</span><span class="py">.value</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span>  <span class="c1">// ✅ Compile error:</span>
<span class="k">fn</span> <span class="nf">thread2</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Data</span><span class="p">)</span> <span class="p">{</span> <span class="n">data</span><span class="py">.value</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span> <span class="p">}</span>  <span class="c1">// cannot mutate through shared reference</span>

<span class="c1">// Null pointer dereferences are impossible</span>
<span class="k">let</span> <span class="n">data</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">KernelData</span><span class="o">&gt;</span> <span class="o">=</span> <span class="nf">get_data</span><span class="p">();</span>
<span class="n">data</span><span class="nf">.process</span><span class="p">();</span>  <span class="c1">// ✅ Compile error: Option&lt;T&gt; has no method 'process'</span>
<span class="c1">// Must explicitly unwrap: data.unwrap().process()</span>
</code></pre></div></div>

<p><strong>The Statistics:</strong></p>

<p>According to research on Rust in the Linux kernel<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>:</p>
<ul>
  <li>~70% of kernel CVEs stem from memory safety issues</li>
  <li>Rust eliminates these <strong>at compile time</strong> without runtime overhead</li>
  <li>C++ eliminates <strong>0%</strong> of these issues</li>
</ul>

<h3 id="barrier-4-runtime-and-standard-library-dependencies">Barrier 4: Runtime and Standard Library Dependencies</h3>

<p><strong>The Problem:</strong></p>

<p>C++ typically depends on:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">libstdc++</code> or <code class="language-plaintext highlighter-rouge">libc++</code> (standard library)</li>
  <li>Runtime support for RTTI (Run-Time Type Information)</li>
  <li>Global constructors/destructors</li>
  <li>Thread-local storage</li>
</ul>

<p><strong>Kernel requirements:</strong></p>
<ul>
  <li>❌ No user-space libraries</li>
  <li>❌ No global constructors (initialization order issues)</li>
  <li>❌ Minimal binary size</li>
  <li>❌ No assumptions about runtime environment</li>
</ul>

<p><strong>Possible workarounds:</strong></p>
<ul>
  <li>Use <code class="language-plaintext highlighter-rouge">-fno-rtti</code> (disable RTTI)</li>
  <li>Use <code class="language-plaintext highlighter-rouge">-fno-exceptions</code> (disable exceptions)</li>
  <li>Use <code class="language-plaintext highlighter-rouge">-nostdlib</code> (no standard library)</li>
  <li>Avoid global objects</li>
</ul>

<p><strong>But then you’re left with “C with classes”</strong> - losing most of C++’s advantages while keeping the complexity.</p>

<p><strong>Rust’s approach:</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Rust kernel code uses 'core' (no std)</span>
<span class="nd">#![no_std]</span>  <span class="c1">// Explicitly kernel mode</span>

<span class="c1">// From rust/kernel/lib.rs (actual kernel code):</span>
<span class="cd">//! This crate contains the kernel APIs that have been ported or wrapped for</span>
<span class="cd">//! usage by Rust code in the kernel and is shared by all of them.</span>
<span class="cd">//!</span>
<span class="cd">//! In other words, all the rest of the Rust code in the kernel (e.g. kernel</span>
<span class="cd">//! modules written in Rust) depends on [`core`] and this crate.</span>

<span class="k">extern</span> <span class="k">crate</span> <span class="n">core</span><span class="p">;</span>  <span class="c1">// Only core, no std library</span>
</code></pre></div></div>

<h2 id="language-design-philosophy-comparison">Language Design Philosophy Comparison</h2>

<h3 id="the-fundamental-mismatch">The Fundamental Mismatch</h3>

<table>
  <thead>
    <tr>
      <th>Aspect</th>
      <th>Linux Kernel Needs</th>
      <th>C++ Provides</th>
      <th>Rust Provides</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Error Handling</strong></td>
      <td>Explicit, zero overhead</td>
      <td>Exceptions (overhead) or manual</td>
      <td><code class="language-plaintext highlighter-rouge">Result&lt;T&gt;</code> (zero overhead, enforced)</td>
    </tr>
    <tr>
      <td><strong>Memory Allocation</strong></td>
      <td>Explicit, tagged (GFP_*)</td>
      <td>Often implicit</td>
      <td>Explicit with allocator API</td>
    </tr>
    <tr>
      <td><strong>Control Flow</strong></td>
      <td>Predictable, traceable</td>
      <td>Exceptions hide flow</td>
      <td>All control flow explicit</td>
    </tr>
    <tr>
      <td><strong>Memory Safety</strong></td>
      <td>Critical (70% of CVEs)</td>
      <td>No guarantees</td>
      <td>Compile-time guarantees</td>
    </tr>
    <tr>
      <td><strong>Abstraction Cost</strong></td>
      <td>Must be zero</td>
      <td>Sometimes has overhead</td>
      <td>Guaranteed zero-cost</td>
    </tr>
    <tr>
      <td><strong>ABI Stability</strong></td>
      <td>Essential for modules</td>
      <td>Unstable (name mangling)</td>
      <td>C-compatible FFI</td>
    </tr>
    <tr>
      <td><strong>Binary Size</strong></td>
      <td>Minimal</td>
      <td>STL bloat, RTTI tables</td>
      <td>No runtime, minimal size</td>
    </tr>
  </tbody>
</table>

<h3 id="modern-c-improvements-do-they-help">Modern C++ Improvements: Do They Help?</h3>

<p><strong>Modern C++ (C++11/14/17/20/23) added:</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">std::unique_ptr</code> / <code class="language-plaintext highlighter-rouge">std::shared_ptr</code> (RAII smart pointers)</li>
  <li><code class="language-plaintext highlighter-rouge">constexpr</code> (compile-time computation)</li>
  <li><code class="language-plaintext highlighter-rouge">std::optional</code> (like Rust’s <code class="language-plaintext highlighter-rouge">Option&lt;T&gt;</code>)</li>
  <li><code class="language-plaintext highlighter-rouge">std::expected</code> (like Rust’s <code class="language-plaintext highlighter-rouge">Result&lt;T, E&gt;</code>)</li>
  <li>Move semantics</li>
  <li>Lambda expressions</li>
</ul>

<p><strong>Do these solve the kernel’s problems?</strong></p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Modern C++ example</span>
<span class="k">auto</span> <span class="n">data</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_unique</span><span class="o">&lt;</span><span class="n">KernelData</span><span class="o">&gt;</span><span class="p">(</span><span class="n">size</span><span class="p">);</span>
<span class="c1">// ❌ Still implicit allocation</span>
<span class="c1">// ❌ Still can't specify GFP_KERNEL or GFP_ATOMIC</span>
<span class="c1">// ❌ Still no compile-time data race prevention</span>
<span class="c1">// ❌ Still requires runtime support</span>

<span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="o">&lt;</span><span class="n">KernelData</span><span class="o">&gt;</span> <span class="n">data</span> <span class="o">=</span> <span class="n">get_data</span><span class="p">();</span>
<span class="c1">// ✅ Better than raw pointers</span>
<span class="c1">// ❌ But runtime overhead (size + bool flag)</span>
<span class="c1">// ❌ No enforcement of checking before use</span>
</code></pre></div></div>

<p><strong>Rust’s approach:</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Rust equivalent</span>
<span class="k">let</span> <span class="n">data</span> <span class="o">=</span> <span class="nn">Box</span><span class="p">::</span><span class="nf">try_new_in</span><span class="p">(</span><span class="nn">KernelData</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">size</span><span class="p">)</span><span class="o">?</span><span class="p">,</span> <span class="n">GFP_KERNEL</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="c1">// ✅ Explicit allocation</span>
<span class="c1">// ✅ Explicit flags</span>
<span class="c1">// ✅ Zero runtime overhead</span>
<span class="c1">// ✅ Compile-time safety</span>

<span class="k">let</span> <span class="n">data</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">KernelData</span><span class="o">&gt;</span> <span class="o">=</span> <span class="nf">get_data</span><span class="p">();</span>
<span class="c1">// ✅ Zero runtime overhead (just enum tag)</span>
<span class="c1">// ✅ Compiler enforces checking before use</span>
</code></pre></div></div>

<p><strong>Conclusion:</strong> Modern C++ is better than old C++, but still doesn’t meet kernel requirements as well as Rust does.</p>

<h2 id="case-studies-c-in-other-kernels">Case Studies: C++ in Other Kernels</h2>

<h3 id="windows-nt-kernel">Windows NT Kernel</h3>

<p><strong>Status:</strong> Partial C++ usage, primarily in driver frameworks</p>

<p><strong>Constraints:</strong></p>
<ul>
  <li>Strict subset of C++</li>
  <li>No exceptions</li>
  <li>No RTTI</li>
  <li>No STL</li>
  <li>Custom memory allocators required</li>
</ul>

<p><strong>Key difference:</strong> Windows was designed with C++ in mind from the start (1993). Linux was not.</p>

<h3 id="macosios-kernel-xnu">macOS/iOS Kernel (XNU)</h3>

<p><strong>Status:</strong> C++ in IOKit (driver framework)</p>

<p><strong>Constraints:</strong></p>
<ul>
  <li>Limited C++ subset</li>
  <li>Carefully controlled usage</li>
  <li>Predates modern C++ features</li>
</ul>

<p><strong>Key difference:</strong> Apple controls the entire ecosystem. Linux is community-driven with diverse hardware.</p>

<h3 id="fuchsia-google">Fuchsia (Google)</h3>

<p><strong>Status:</strong> Extensive C++ usage</p>

<p><strong>Key difference:</strong> <strong>Brand new kernel</strong> (started 2016) with no legacy codebase. Linux has 30+ years of C code and established conventions.</p>

<h3 id="conclusion-from-case-studies">Conclusion from Case Studies</h3>

<p><strong>Every kernel that uses C++ either:</strong></p>
<ol>
  <li>Was designed for C++ from the start, OR</li>
  <li>Uses a highly restricted C++ subset that resembles “C with classes”</li>
</ol>

<p><strong>Linux is neither.</strong> It has 30 million lines of C code and a culture that values explicitness and simplicity.</p>

<h2 id="the-timing-factor-rust-already-won-the-second-language-slot">The Timing Factor: Rust Already Won the “Second Language” Slot</h2>

<h3 id="why-timing-matters">Why Timing Matters</h3>

<p>The Linux kernel adding a second language is a <strong>massive undertaking</strong>:</p>
<ul>
  <li>Build system changes</li>
  <li>Documentation requirements</li>
  <li>Maintainer training</li>
  <li>ABI compatibility concerns</li>
  <li>Toolchain integration</li>
</ul>

<p><strong>The kernel community will not do this multiple times.</strong></p>

<h3 id="rusts-timeline">Rust’s Timeline</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2020: Rust for Linux announced
      - Initial RFC posted to LKML
      - Community discussion begins

2021: Infrastructure development
      - Build system integration
      - Kernel abstraction layer development

2022 (October): Rust merged into Linux 6.1 development cycle
        - Linus Torvalds accepts the patches

2022 (December): Linux 6.1 released
        - First stable kernel with Rust support

2023-2024: Ecosystem growth
        - Android Binder rewritten in Rust
        - GPU drivers (Nova)
        - Network PHY drivers

2025 (December): Rust becomes "permanent core language"
        - No longer experimental
        - 338 files, 135,662 lines of production code
</code></pre></div></div>

<h3 id="what-would-c-need">What Would C++ Need?</h3>

<p>To match Rust’s success, C++ would need:</p>

<p><strong>1. A dedicated team</strong> (5-10 engineers, multi-year commitment)
<strong>2. Corporate sponsorship</strong> (Google/Microsoft/Meta level)
<strong>3. Killer application</strong> (equivalent to Android Binder)
<strong>4. Toolchain development</strong> (kernel-safe C++ subset)
<strong>5. Community buy-in</strong> (Linus and maintainers)</p>

<p><strong>Current status:</strong></p>
<ul>
  <li>❌ No team working on this</li>
  <li>❌ No corporate sponsor</li>
  <li>❌ No killer application identified</li>
  <li>❌ No toolchain work</li>
  <li>❌ Linus explicitly opposed (20 years)</li>
</ul>

<h2 id="the-kernel-safe-c-thought-experiment">The “Kernel-Safe C++” Thought Experiment</h2>

<h3 id="what-would-it-look-like">What Would It Look Like?</h3>

<p>If someone tried to create “kernel-safe C++”, it would need:</p>

<p><strong>Allowed features:</strong></p>
<ul>
  <li>Classes and constructors/destructors (RAII)</li>
  <li>Templates (limited complexity)</li>
  <li>Namespaces</li>
  <li><code class="language-plaintext highlighter-rouge">constexpr</code></li>
  <li>References</li>
</ul>

<p><strong>Prohibited features:</strong></p>
<ul>
  <li>❌ Exceptions (non-local control flow)</li>
  <li>❌ RTTI (runtime overhead)</li>
  <li>❌ STL (hidden allocations, overhead)</li>
  <li>❌ <code class="language-plaintext highlighter-rouge">new</code>/<code class="language-plaintext highlighter-rouge">delete</code> (must use kernel allocators)</li>
  <li>❌ Virtual inheritance (complexity)</li>
  <li>❌ Global constructors (initialization order)</li>
</ul>

<h3 id="the-problem-is-this-still-c">The Problem: Is This Still C++?</h3>

<p>At this point, you have <strong>“C with classes and templates”</strong> - essentially what embedded C++ tried to be in the 1990s.</p>

<p><strong>Historical precedent:</strong> Embedded C++ (EC++) was defined in 1996 as a subset for embedded systems. It failed because:</p>
<ol>
  <li>Too restrictive for C++ programmers</li>
  <li>Too complex for C programmers</li>
  <li>Toolchain fragmentation</li>
  <li>Eventually superseded by “just use C”</li>
</ol>

<h3 id="comparison-with-rust">Comparison with Rust</h3>

<p><strong>Rust didn’t need to be restricted</strong> - it was designed for systems programming from day one:</p>
<ul>
  <li>No exceptions by design (uses <code class="language-plaintext highlighter-rouge">Result&lt;T, E&gt;</code>)</li>
  <li>No garbage collector by design</li>
  <li>No runtime by design (<code class="language-plaintext highlighter-rouge">#![no_std]</code> is a first-class mode)</li>
  <li>Explicit memory management by design</li>
  <li>Zero-cost abstractions by design</li>
</ul>

<p><strong>C++ requires restrictions; Rust requires nothing.</strong></p>

<h2 id="economic-and-engineering-reality">Economic and Engineering Reality</h2>

<h3 id="the-resource-investment-required">The Resource Investment Required</h3>

<p>Based on Rust for Linux’s development:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Total effort estimate (2020-2025):
- Core team: ~10 engineers × 5 years = 50 person-years
- Corporate contributions: ~20 engineers × 2 years = 40 person-years
- Community contributions: ~100 contributors × 0.5 years = 50 person-years
Total: ~140 person-years of engineering effort

Cost estimate (conservative):
- Average engineer cost: $200,000/year (salary + overhead)
- Total investment: ~$28 million USD
</code></pre></div></div>

<p><strong>For C++ to enter the kernel, someone would need to invest comparable resources.</strong></p>

<h3 id="who-would-fund-this">Who Would Fund This?</h3>

<p><strong>Rust for Linux sponsors:</strong></p>
<ul>
  <li>Google (Android Binder, security motivation)</li>
  <li>Microsoft (Azure security, NT kernel Rust initiative)</li>
  <li>Arm (architecture support, driver development)</li>
  <li>Meta (networking, infrastructure)</li>
</ul>

<p><strong>Potential C++ sponsors:</strong></p>
<ul>
  <li>??? (No clear candidate)</li>
</ul>

<p><strong>Why no sponsors?</strong></p>
<ol>
  <li>C++ doesn’t solve problems Rust doesn’t already solve</li>
  <li>Investment would be duplicative (Rust already exists)</li>
  <li>Political risk (Linus’s opposition)</li>
  <li>Technical risk (fundamental design mismatches)</li>
</ol>

<h3 id="the-opportunity-cost">The Opportunity Cost</h3>

<p>Every hour spent on “C++ for Linux” is an hour <strong>not spent on:</strong></p>
<ul>
  <li>Improving Rust for Linux</li>
  <li>Fixing bugs in existing code</li>
  <li>Adding new features</li>
  <li>Supporting new hardware</li>
</ul>

<p><strong>Rational actors won’t make this trade-off.</strong></p>

<h2 id="technical-alternatives-what-if-not-rust">Technical Alternatives: What If Not Rust?</h2>

<h3 id="if-rust-didnt-exist-what-would-be-considered">If Rust Didn’t Exist, What Would Be Considered?</h3>

<p><strong>Hypothetical ranking (if choosing today):</strong></p>

<ol>
  <li><strong>Zig</strong>: Explicit control, modern C replacement, safety tools
    <ul>
      <li>✅ Zero hidden behavior</li>
      <li>✅ Excellent C interop</li>
      <li>✅ Modern error handling</li>
      <li>❌ No compile-time memory safety guarantees</li>
      <li>❌ Small community (vs Rust)</li>
      <li>❌ Language still evolving</li>
    </ul>
  </li>
  <li><strong>D</strong>: Systems programming language with safety features
    <ul>
      <li>✅ Memory safety options</li>
      <li>✅ No garbage collector mode</li>
      <li>❌ Smaller community</li>
      <li>❌ Less industry backing</li>
      <li>❌ Complex feature set</li>
    </ul>
  </li>
  <li><strong>Ada/SPARK</strong>: Formal verification capabilities
    <ul>
      <li>✅ Extremely rigorous safety</li>
      <li>❌ Very niche community</li>
      <li>❌ Steep learning curve</li>
      <li>❌ Poor tooling integration</li>
    </ul>
  </li>
  <li><strong>C++</strong>: Mature, widely known
    <ul>
      <li>✅ Large community</li>
      <li>✅ Rich abstractions</li>
      <li>❌ All the issues discussed in this document</li>
    </ul>
  </li>
</ol>

<p><strong>Rust won because it hit the sweet spot:</strong></p>
<ul>
  <li>Memory safety without garbage collection</li>
  <li>Zero-cost abstractions</li>
  <li>Large, active community</li>
  <li>Industry backing</li>
  <li>Purpose-built for systems programming</li>
</ul>

<h3 id="could-multiple-languages-coexist">Could Multiple Languages Coexist?</h3>

<p><strong>Theoretically yes, practically no.</strong></p>

<p><strong>Challenges:</strong></p>
<ul>
  <li>Each language adds build system complexity</li>
  <li>Each language requires maintainer expertise</li>
  <li>Each language creates ABI boundaries</li>
  <li>Each language fragments the codebase</li>
</ul>

<p><strong>The kernel needs coherence</strong>, not a polyglot mess.</p>

<p><strong>Historical precedent:</strong> The kernel <strong>rejected</strong> multiple assembler syntaxes (AT&amp;T vs Intel), settling on one. It won’t embrace multiple high-level languages.</p>

<h2 id="the-path-forward-what-would-change-the-analysis">The Path Forward: What Would Change the Analysis?</h2>

<h3 id="scenario-1-rust-fails-catastrophically">Scenario 1: Rust Fails Catastrophically</h3>

<p><strong>What would constitute “failure”?</strong></p>
<ul>
  <li>Major security vulnerabilities in Rust driver code</li>
  <li>Unfixable performance issues</li>
  <li>Toolchain becomes unmaintainable</li>
  <li>Community abandons Rust for Linux</li>
</ul>

<p><strong>Likelihood: &lt; 1%</strong></p>

<p>Current evidence (Android Binder, GPU drivers, network drivers) shows Rust succeeding in production.</p>

<p><strong>Would C++ be next choice?</strong></p>

<p>Probably not. More likely:</p>
<ol>
  <li>Return to C-only</li>
  <li>Consider Zig (if mature by then)</li>
  <li>Consider formally verified C subsets</li>
</ol>

<h3 id="scenario-2-linus-torvalds-retireschanges-mind">Scenario 2: Linus Torvalds Retires/Changes Mind</h3>

<p><strong>What if new kernel leadership is pro-C++?</strong></p>

<p>Even then, the technical issues remain:</p>
<ul>
  <li>Exceptions still problematic</li>
  <li>Hidden allocations still problematic</li>
  <li>No memory safety guarantees still problematic</li>
</ul>

<p><strong>New leadership might be more pragmatic</strong>, but they still answer to technical reality.</p>

<h3 id="scenario-3-c-gets-kernel-specific-safety-extensions">Scenario 3: C++ Gets Kernel-Specific Safety Extensions</h3>

<p><strong>What if a major vendor (Google/Microsoft) created “Kernel C++”?</strong></p>

<p>Example: Hypothetical language features</p>
<ul>
  <li>Compile-time borrow checking (copying Rust)</li>
  <li>Explicit allocation syntax</li>
  <li>Guaranteed zero-cost abstractions</li>
  <li>Formal verification hooks</li>
</ul>

<p><strong>At that point, you’ve reinvented Rust.</strong></p>

<p>Why not just use Rust?</p>

<h3 id="scenario-4-webassembly-or-other-bytecode-approach">Scenario 4: WebAssembly or Other Bytecode Approach</h3>

<p><strong>Alternative: Compile to safe bytecode?</strong></p>

<p>This has been explored (eBPF for kernel extensions), but:</p>
<ul>
  <li>Not suitable for core kernel code</li>
  <li>Performance overhead</li>
  <li>Complexity</li>
</ul>

<p><strong>Not a replacement for Rust/C.</strong></p>

<h2 id="conclusion-the-verdict">Conclusion: The Verdict</h2>

<h3 id="summary-of-findings">Summary of Findings</h3>

<p><strong>Can C++ enter the Linux kernel?</strong></p>

<p><strong>Answer: Extremely unlikely (&lt; 5% probability) for the following reasons:</strong></p>

<h4 id="political-barriers-high">Political Barriers (High)</h4>
<ul>
  <li>✗ Linus Torvalds’ explicit, sustained opposition (20+ years)</li>
  <li>✗ No champion within kernel maintainer community</li>
  <li>✗ Rust already occupies “second language” niche</li>
</ul>

<h4 id="technical-barriers-high">Technical Barriers (High)</h4>
<ul>
  <li>✗ Exception handling fundamentally incompatible with kernel needs</li>
  <li>✗ Hidden memory allocations violate kernel philosophy</li>
  <li>✗ No compile-time memory safety guarantees</li>
  <li>✗ Runtime dependencies (RTTI, libstdc++) unsuitable for kernel</li>
  <li>✗ ABI instability complicates module system</li>
</ul>

<h4 id="engineering-barriers-high">Engineering Barriers (High)</h4>
<ul>
  <li>✗ No team working on C++ kernel integration</li>
  <li>✗ No corporate sponsor identified</li>
  <li>✗ No killer application to justify investment</li>
  <li>✗ Estimated $28M+ investment required (based on Rust precedent)</li>
</ul>

<h4 id="timing-barriers-high">Timing Barriers (High)</h4>
<ul>
  <li>✗ Rust already invested 140+ person-years</li>
  <li>✗ Rust has production deployments (Android Binder, GPU drivers)</li>
  <li>✗ Kernel won’t add third high-level language</li>
</ul>

<h3 id="comparison-why-rust-succeeded-where-c-cannot">Comparison: Why Rust Succeeded Where C++ Cannot</h3>

<table>
  <thead>
    <tr>
      <th>Factor</th>
      <th>Rust</th>
      <th>C++</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Memory Safety</strong></td>
      <td>✅ Compile-time guarantees</td>
      <td>❌ None</td>
    </tr>
    <tr>
      <td><strong>Kernel Philosophy Fit</strong></td>
      <td>✅ Explicit everything</td>
      <td>❌ Hidden behavior</td>
    </tr>
    <tr>
      <td><strong>Runtime Requirements</strong></td>
      <td>✅ None (<code class="language-plaintext highlighter-rouge">#![no_std]</code>)</td>
      <td>❌ Requires libstdc++ subset</td>
    </tr>
    <tr>
      <td><strong>Error Handling</strong></td>
      <td>✅ Zero-cost <code class="language-plaintext highlighter-rouge">Result&lt;T&gt;</code></td>
      <td>❌ Exceptions or manual</td>
    </tr>
    <tr>
      <td><strong>Industry Backing</strong></td>
      <td>✅ Google, MS, Arm, Meta</td>
      <td>❌ None for kernel work</td>
    </tr>
    <tr>
      <td><strong>Active Development</strong></td>
      <td>✅ 338 files, 135K lines</td>
      <td>❌ Zero</td>
    </tr>
    <tr>
      <td><strong>Linus’s Stance</strong></td>
      <td>✅ Neutral → Accepting</td>
      <td>❌ Explicit opposition</td>
    </tr>
    <tr>
      <td><strong>Killer App</strong></td>
      <td>✅ Android Binder</td>
      <td>❌ None identified</td>
    </tr>
  </tbody>
</table>

<h3 id="the-real-question">The Real Question</h3>

<p>The question isn’t “Can C++ enter the Linux kernel?”</p>

<p><strong>The question is: “Why would it?”</strong></p>

<ul>
  <li>It doesn’t solve problems Rust doesn’t already solve</li>
  <li>It brings technical baggage Rust doesn’t have</li>
  <li>It lacks corporate and community backing</li>
  <li>It faces political opposition Rust never did</li>
</ul>

<h3 id="final-thoughts">Final Thoughts</h3>

<p>C++ is an excellent language for many domains:</p>
<ul>
  <li>Application development</li>
  <li>Game engines</li>
  <li>High-performance computing</li>
  <li>Systems software (outside kernels)</li>
</ul>

<p>But for the <strong>Linux kernel specifically</strong>, the ship has sailed. Rust provides:</p>
<ul>
  <li>Better memory safety</li>
  <li>Better kernel philosophy fit</li>
  <li>Better tooling for kernel development</li>
  <li>Better industry momentum</li>
</ul>

<p><strong>Unless fundamental technical realities change</strong>, C++ will remain outside the Linux kernel indefinitely.</p>

<p>The more productive question for C++ advocates is: <strong>How can C++ improve in its own domains?</strong> rather than attempting to enter a niche where it’s technically unsuited and politically unwelcome.</p>

<hr />

<h2 id="appendix-quick-reference-tables">Appendix: Quick Reference Tables</h2>

<h3 id="language-feature-comparison">Language Feature Comparison</h3>

<table>
  <thead>
    <tr>
      <th>Feature</th>
      <th>C</th>
      <th>C++</th>
      <th>Rust</th>
      <th>Kernel Needs</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Memory Safety</td>
      <td>❌</td>
      <td>❌</td>
      <td>✅</td>
      <td>✅ Critical</td>
    </tr>
    <tr>
      <td>Zero Runtime</td>
      <td>✅</td>
      <td>⚠️</td>
      <td>✅</td>
      <td>✅ Required</td>
    </tr>
    <tr>
      <td>Explicit Allocation</td>
      <td>✅</td>
      <td>❌</td>
      <td>✅</td>
      <td>✅ Required</td>
    </tr>
    <tr>
      <td>Error Handling</td>
      <td>⚠️ Manual</td>
      <td>❌ Exceptions</td>
      <td>✅ <code class="language-plaintext highlighter-rouge">Result&lt;T&gt;</code></td>
      <td>✅ Explicit</td>
    </tr>
    <tr>
      <td>ABI Stability</td>
      <td>✅</td>
      <td>❌</td>
      <td>✅ C-FFI</td>
      <td>✅ Required</td>
    </tr>
    <tr>
      <td>Compile-time Checks</td>
      <td>⚠️ Basic</td>
      <td>⚠️ Basic</td>
      <td>✅ Extensive</td>
      <td>✅ Preferred</td>
    </tr>
    <tr>
      <td>Learning Curve</td>
      <td>Low</td>
      <td>High</td>
      <td>High</td>
      <td>⚠️ Trade-off</td>
    </tr>
    <tr>
      <td>Ecosystem</td>
      <td>Huge</td>
      <td>Huge</td>
      <td>Large</td>
      <td>⚠️ Consider</td>
    </tr>
  </tbody>
</table>

<h3 id="historical-timeline-second-language-attempts">Historical Timeline: Second Language Attempts</h3>

<table>
  <thead>
    <tr>
      <th>Year</th>
      <th>Event</th>
      <th>Outcome</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1991</td>
      <td>Linux 0.01 considers C++</td>
      <td>❌ Rejected (immature tooling)</td>
    </tr>
    <tr>
      <td>2004</td>
      <td>C++ kernel module discussion</td>
      <td>❌ Linus: “BLOODY STUPID IDEA”</td>
    </tr>
    <tr>
      <td>2007</td>
      <td>Git mailing list C++ debate</td>
      <td>❌ Linus elaborates opposition</td>
    </tr>
    <tr>
      <td>2020</td>
      <td>Rust for Linux announced</td>
      <td>✅ Positive reception</td>
    </tr>
    <tr>
      <td>2022</td>
      <td>Rust merged into Linux 6.1</td>
      <td>✅ Accepted</td>
    </tr>
    <tr>
      <td>2025</td>
      <td>Rust “permanent core language”</td>
      <td>✅ Success</td>
    </tr>
    <tr>
      <td>2026</td>
      <td>C++ in kernel?</td>
      <td>❌ Still no movement</td>
    </tr>
  </tbody>
</table>

<h3 id="investment-comparison">Investment Comparison</h3>

<table>
  <thead>
    <tr>
      <th>Aspect</th>
      <th>Rust for Linux</th>
      <th>Hypothetical C++ for Linux</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Engineering Effort</strong></td>
      <td>~140 person-years</td>
      <td>~150-200 person-years (higher due to restrictions)</td>
    </tr>
    <tr>
      <td><strong>Cost</strong></td>
      <td>~$28M USD</td>
      <td>~$30-40M USD</td>
    </tr>
    <tr>
      <td><strong>Corporate Sponsors</strong></td>
      <td>Google, Microsoft, Arm, Meta</td>
      <td>None identified</td>
    </tr>
    <tr>
      <td><strong>Community Support</strong></td>
      <td>Strong (150+ contributors)</td>
      <td>Weak (no active effort)</td>
    </tr>
    <tr>
      <td><strong>Political Support</strong></td>
      <td>Neutral → Positive</td>
      <td>Strongly negative</td>
    </tr>
    <tr>
      <td><strong>Technical Viability</strong></td>
      <td>High (proven in production)</td>
      <td>Low (fundamental conflicts)</td>
    </tr>
    <tr>
      <td><strong>ROI</strong></td>
      <td>High (70% of CVEs prevented)</td>
      <td>Negative (no advantage over Rust)</td>
    </tr>
  </tbody>
</table>

<h2 id="references">References</h2>

<hr />

<p><strong>Document Information:</strong></p>
<ul>
  <li><strong>Created:</strong> 2026-02-16</li>
  <li><strong>Analysis Scope:</strong> Technical, historical, and economic feasibility of C++ entering the Linux kernel</li>
  <li><strong>Methodology:</strong> Literature review, code analysis, historical precedent examination</li>
  <li><strong>Conclusion:</strong> C++ entry into Linux kernel is highly unlikely (&lt; 5% probability) due to converging political, technical, and economic barriers</li>
</ul>

<hr />

<h2 id="中文版--chinese-version">中文版 / Chinese Version</h2>

<h1 id="c能进入linux内核吗技术与历史分析">C++能进入Linux内核吗？技术与历史分析</h1>

<p><strong>摘要</strong>: 随着Rust成功进入Linux内核成为C之后的第二语言，一个自然的问题出现了：C++本可以被选择吗，或者它未来仍能进入内核吗？本综合分析研究了技术障碍、历史背景和基本设计冲突，这些使得C++被Linux内核采用的可能性极低，尽管C++是一门成熟且广泛使用的系统编程语言。</p>

<h2 id="引言房间里的大象">引言：房间里的大象</h2>

<p>Rust成功集成到Linux内核引发了一个有趣的反事实问题：<strong>为什么不是C++？</strong> 毕竟，C++拥有：</p>
<ul>
  <li>✅ 数十年的成熟度 (1985年 vs Rust的2015年)</li>
  <li>✅ 用于自动资源管理的RAII</li>
  <li>✅ 丰富的抽象能力</li>
  <li>✅ 庞大的开发者生态系统</li>
  <li>✅ 现代安全特性 (<code class="language-plaintext highlighter-rouge">std::unique_ptr</code>, <code class="language-plaintext highlighter-rouge">std::optional</code>等)</li>
</ul>

<p>然而C++从未被Linux内核认真考虑过，而更年轻的Rust仅在2年开发后(2020-2022)就被接受了。本文档探讨原因。</p>

<h2 id="执行摘要">执行摘要</h2>

<p><strong>C++进入Linux内核的可能性: &lt; 5%</strong></p>

<p><strong>关键障碍:</strong></p>
<ol>
  <li><strong>政治因素</strong>: Linus Torvalds明确、持续的反对 (2004年至今)</li>
  <li><strong>技术因素</strong>: 异常处理、隐藏分配、缺乏内存安全保证</li>
  <li><strong>时机因素</strong>: Rust已经占据”第二语言”生态位</li>
  <li><strong>工程因素</strong>: 没有团队投入努力，没有杀手级应用</li>
  <li><strong>哲学因素</strong>: 与内核需求的根本设计冲突</li>
</ol>

<h2 id="历史背景linus-torvalds关于c的立场">历史背景：Linus Torvalds关于C++的立场</h2>

<h3 id="2004年定调的邮件">2004年定调的邮件</h3>

<p>2004年1月19日，Linus Torvalds回应了关于编译C++内核模块的问题<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>：</p>

<blockquote>
  <p><strong>“糟透了。相信我 - 用C++编写内核代码是一个非常愚蠢的想法。”</strong></p>

  <p><em>“整个C++异常处理机制从根本上就是有问题的。对内核来说尤其如此。”</em></p>

  <p><em>“任何喜欢在你背后隐藏内存分配等操作的编译器或语言，都不是内核的好选择。”</em></p>
</blockquote>

<h3 id="2007年git邮件列表的详述">2007年Git邮件列表的详述</h3>

<p>2007年，Linus在Git邮件列表上详述了他的立场<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>：</p>

<blockquote>
  <p><em>“C++导致真正糟糕的设计选择。你不可避免地会开始使用STL和Boost等’优雅的’库特性…这会导致低效的抽象编程模型，两年后你会发现某些抽象效率不高，但现在你所有的代码都依赖于这些精美的对象模型，除非重写应用否则无法修复。”</em></p>
</blockquote>

<h3 id="20年来立场改变了吗">20年来立场改变了吗？</h3>

<p><strong>没有。</strong> 截至2026年，内核社区对C++接受度<strong>零</strong>进展。与此同时，Rust从提案(2020)到”永久核心语言”状态(2025)<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>。</p>

<h2 id="技术障碍分析">技术障碍分析</h2>

<h3 id="障碍1异常处理">障碍1：异常处理</h3>

<p><strong>问题所在:</strong></p>

<p>C++异常引入非局部控制流，这与内核编程需求根本不兼容。</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C++异常示例</span>
<span class="kt">void</span> <span class="nf">kernel_function</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">auto</span> <span class="n">buffer</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_unique</span><span class="o">&lt;</span><span class="n">KernelBuffer</span><span class="o">&gt;</span><span class="p">(</span><span class="n">size</span><span class="p">);</span>
    <span class="c1">// ^-- 构造函数可能抛出异常</span>

    <span class="n">do_critical_work</span><span class="p">(</span><span class="n">buffer</span><span class="p">.</span><span class="n">get</span><span class="p">());</span>
    <span class="c1">// ^-- 可能抛出异常</span>

    <span class="c1">// 如果抛出异常：</span>
    <span class="c1">// 1. 发生栈展开</span>
    <span class="c1">// 2. 调用析构函数（但在中断上下文中呢？）</span>
    <span class="c1">// 3. 异常表增加二进制大小</span>
    <span class="c1">// 4. 性能变得不可预测</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>内核需求:</strong></p>

<ul>
  <li><strong>确定性行为</strong>: 每个代码路径必须可预测</li>
  <li><strong>无意外跳转</strong>: 控制流必须显式和可追踪</li>
  <li><strong>最小二进制大小</strong>: 没有异常表的空间</li>
  <li><strong>中断安全</strong>: 中断上下文中的代码无法处理异常</li>
</ul>

<p><strong>学术证据:</strong></p>

<p>爱丁堡大学的研究(2019)表明，即使是优化的C++异常实现也会在嵌入式系统中造成显著的代码大小和运行时开销<sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>。圣安德鲁斯大学的最新工作(2025)显示，C++异常在用户/内核边界的传播需要特殊的ABI支持，增加了系统复杂性<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>。</p>

<p><strong>与Rust的对比:</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Rust等价代码 - 无异常，显式错误处理</span>
<span class="k">fn</span> <span class="nf">kernel_function</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">()</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">buffer</span> <span class="o">=</span> <span class="nn">KernelBuffer</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">size</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="c1">// ^-- 用'?'显式错误传播</span>

    <span class="nf">do_critical_work</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buffer</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="c1">// ^-- 显式错误处理，无隐藏控制流</span>

    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span> <span class="c1">// buffer自动丢弃，不需要异常</span>
</code></pre></div></div>

<p><strong>C++能禁用异常吗?</strong></p>

<p>可以，使用<code class="language-plaintext highlighter-rouge">-fno-exceptions</code>。但是：</p>
<ol>
  <li>C++的大部分设计假定异常存在</li>
  <li>没有异常的标准库变得笨拙</li>
  <li>错误处理变成手动（回到C风格）</li>
  <li>你失去了一个关键的C++特性，同时保留了复杂性</li>
</ol>

<h3 id="障碍2隐藏的内存分配">障碍2：隐藏的内存分配</h3>

<p><strong>问题所在:</strong></p>

<p>内核需要<strong>显式、带标记的内存分配</strong>来处理不同上下文：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C内核代码 - 带标志的显式分配</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">buf</span> <span class="o">=</span> <span class="n">kmalloc</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">GFP_KERNEL</span><span class="p">);</span>     <span class="c1">// 可以睡眠</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">buf</span> <span class="o">=</span> <span class="n">kmalloc</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">GFP_ATOMIC</span><span class="p">);</span>     <span class="c1">// 原子上下文</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">buf</span> <span class="o">=</span> <span class="n">kmalloc</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">GFP_NOWAIT</span><span class="p">);</span>     <span class="c1">// 非阻塞</span>
</code></pre></div></div>

<p><strong>C++隐藏分配:</strong></p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C++ - 何时分配？用什么标志？</span>
<span class="k">class</span> <span class="nc">KernelBuffer</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="kt">uint8_t</span><span class="o">&gt;</span> <span class="n">data</span><span class="p">;</span>  <span class="c1">// 隐藏的堆分配！</span>
    <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">name</span><span class="p">;</span>           <span class="c1">// 隐藏的堆分配！</span>
<span class="nl">public:</span>
    <span class="n">KernelBuffer</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">size</span><span class="p">)</span>
        <span class="o">:</span> <span class="n">data</span><span class="p">(</span><span class="n">size</span><span class="p">)</span>            <span class="c1">// 在这里分配 - 但用什么GFP_* ?</span>
        <span class="p">,</span> <span class="n">name</span><span class="p">(</span><span class="s">"buffer"</span><span class="p">)</span> <span class="p">{}</span>     <span class="c1">// 另一个隐藏分配</span>
<span class="p">};</span>

<span class="kt">void</span> <span class="n">function</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">KernelBuffer</span> <span class="n">buf</span><span class="p">(</span><span class="mi">1024</span><span class="p">);</span>     <span class="c1">// 这能睡眠吗？原子安全吗？</span>
    <span class="c1">// 不深入实现无法知道</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Linus的2004年声明仍然有效:</strong></p>

<blockquote>
  <p><em>“任何喜欢在你背后隐藏内存分配等操作的编译器或语言，都不是内核的好选择。”</em></p>
</blockquote>

<p><strong>Rust的显式方法:</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Rust - 所有分配都是显式的</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">KernelBuffer</span> <span class="p">{</span>
    <span class="n">data</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">u8</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">KernelBuffer</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">flags</span><span class="p">:</span> <span class="n">Flags</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// 用显式标志显式分配</span>
        <span class="k">let</span> <span class="n">data</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">try_with_capacity_in</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">flags</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="nf">Ok</span><span class="p">(</span><span class="k">Self</span> <span class="p">{</span> <span class="n">data</span> <span class="p">})</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// 使用</span>
<span class="k">let</span> <span class="n">buf</span> <span class="o">=</span> <span class="nn">KernelBuffer</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">1024</span><span class="p">,</span> <span class="n">GFP_KERNEL</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="c1">// ^-- 非常清楚：分配在这里发生，用GFP_KERNEL</span>
</code></pre></div></div>

<h3 id="障碍3无内存安全保证">障碍3：无内存安全保证</h3>

<p><strong>核心问题:</strong></p>

<p>C++提供<strong>与C相同的内存安全保证：无。</strong></p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C++ - 仍然容易出现use-after-free</span>
<span class="n">KernelData</span><span class="o">*</span> <span class="n">data</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">KernelData</span><span class="p">();</span>
<span class="k">delete</span> <span class="n">data</span><span class="p">;</span>
<span class="n">use_data</span><span class="p">(</span><span class="n">data</span><span class="p">);</span>  <span class="c1">// ❌ Use-after-free - 编译器不会捕获</span>

<span class="c1">// 仍然容易出现数据竞争</span>
<span class="kt">void</span> <span class="nf">thread1</span><span class="p">()</span> <span class="p">{</span> <span class="n">global_data</span><span class="o">-&gt;</span><span class="n">value</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span>  <span class="c1">// ❌ 竞态条件</span>
<span class="kt">void</span> <span class="n">thread2</span><span class="p">()</span> <span class="p">{</span> <span class="n">global_data</span><span class="o">-&gt;</span><span class="n">value</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span> <span class="p">}</span>  <span class="c1">// 编译器不会捕获</span>

<span class="c1">// 仍然容易出现空指针解引用</span>
<span class="n">KernelData</span><span class="o">*</span> <span class="n">data</span> <span class="o">=</span> <span class="n">get_data</span><span class="p">();</span>  <span class="c1">// 可能返回nullptr</span>
<span class="n">data</span><span class="o">-&gt;</span><span class="n">process</span><span class="p">();</span>                 <span class="c1">// ❌ 潜在空解引用</span>
</code></pre></div></div>

<p><strong>Rust的编译时保证:</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Rust - use-after-free不可能发生</span>
<span class="k">let</span> <span class="n">data</span> <span class="o">=</span> <span class="nn">Box</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">KernelData</span><span class="p">::</span><span class="nf">new</span><span class="p">());</span>
<span class="nf">drop</span><span class="p">(</span><span class="n">data</span><span class="p">);</span>
<span class="nf">use_data</span><span class="p">(</span><span class="n">data</span><span class="p">);</span>  <span class="c1">// ✅ 编译错误：值在移动后使用</span>

<span class="c1">// 数据竞争不可能发生</span>
<span class="k">fn</span> <span class="nf">thread1</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Data</span><span class="p">)</span> <span class="p">{</span> <span class="n">data</span><span class="py">.value</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span>  <span class="c1">// ✅ 编译错误：</span>
<span class="k">fn</span> <span class="nf">thread2</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Data</span><span class="p">)</span> <span class="p">{</span> <span class="n">data</span><span class="py">.value</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span> <span class="p">}</span>  <span class="c1">// 不能通过共享引用修改</span>

<span class="c1">// 空指针解引用不可能发生</span>
<span class="k">let</span> <span class="n">data</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">KernelData</span><span class="o">&gt;</span> <span class="o">=</span> <span class="nf">get_data</span><span class="p">();</span>
<span class="n">data</span><span class="nf">.process</span><span class="p">();</span>  <span class="c1">// ✅ 编译错误：Option&lt;T&gt;没有方法'process'</span>
<span class="c1">// 必须显式解包：data.unwrap().process()</span>
</code></pre></div></div>

<p><strong>统计数据:</strong></p>

<p>根据关于Rust在Linux内核中的研究<sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>：</p>
<ul>
  <li>约70%的内核CVE源于内存安全问题</li>
  <li>Rust在<strong>编译时</strong>消除这些问题，无运行时开销</li>
  <li>C++消除<strong>0%</strong>的这些问题</li>
</ul>

<h3 id="障碍4运行时和标准库依赖">障碍4：运行时和标准库依赖</h3>

<p><strong>问题所在:</strong></p>

<p>C++通常依赖于：</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">libstdc++</code>或<code class="language-plaintext highlighter-rouge">libc++</code> (标准库)</li>
  <li>RTTI的运行时支持 (运行时类型信息)</li>
  <li>全局构造函数/析构函数</li>
  <li>线程本地存储</li>
</ul>

<p><strong>内核需求:</strong></p>
<ul>
  <li>❌ 没有用户空间库</li>
  <li>❌ 没有全局构造函数 (初始化顺序问题)</li>
  <li>❌ 最小二进制大小</li>
  <li>❌ 不对运行时环境做假设</li>
</ul>

<p><strong>可能的变通方法:</strong></p>
<ul>
  <li>使用<code class="language-plaintext highlighter-rouge">-fno-rtti</code> (禁用RTTI)</li>
  <li>使用<code class="language-plaintext highlighter-rouge">-fno-exceptions</code> (禁用异常)</li>
  <li>使用<code class="language-plaintext highlighter-rouge">-nostdlib</code> (无标准库)</li>
  <li>避免全局对象</li>
</ul>

<p><strong>但这样你就只剩下”带类的C”</strong> - 失去了C++的大部分优势，同时保留了复杂性。</p>

<p><strong>Rust的方法:</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Rust内核代码使用'core' (无std)</span>
<span class="nd">#![no_std]</span>  <span class="c1">// 显式内核模式</span>

<span class="c1">// 来自rust/kernel/lib.rs (实际内核代码):</span>
<span class="cd">//! 这个crate包含已移植或包装的内核API</span>
<span class="cd">//! 供内核中的Rust代码使用，所有代码都依赖它。</span>

<span class="k">extern</span> <span class="k">crate</span> <span class="n">core</span><span class="p">;</span>  <span class="c1">// 只有core，没有std库</span>
</code></pre></div></div>

<h2 id="语言设计哲学对比">语言设计哲学对比</h2>

<h3 id="根本不匹配">根本不匹配</h3>

<table>
  <thead>
    <tr>
      <th>方面</th>
      <th>Linux内核需求</th>
      <th>C++提供</th>
      <th>Rust提供</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>错误处理</strong></td>
      <td>显式、零开销</td>
      <td>异常(开销)或手动</td>
      <td><code class="language-plaintext highlighter-rouge">Result&lt;T&gt;</code> (零开销、强制)</td>
    </tr>
    <tr>
      <td><strong>内存分配</strong></td>
      <td>显式、带标记(GFP_*)</td>
      <td>通常隐式</td>
      <td>用分配器API显式</td>
    </tr>
    <tr>
      <td><strong>控制流</strong></td>
      <td>可预测、可追踪</td>
      <td>异常隐藏流程</td>
      <td>所有控制流显式</td>
    </tr>
    <tr>
      <td><strong>内存安全</strong></td>
      <td>关键(70%的CVE)</td>
      <td>无保证</td>
      <td>编译时保证</td>
    </tr>
    <tr>
      <td><strong>抽象成本</strong></td>
      <td>必须为零</td>
      <td>有时有开销</td>
      <td>保证零成本</td>
    </tr>
    <tr>
      <td><strong>ABI稳定性</strong></td>
      <td>模块必需</td>
      <td>不稳定(名称改编)</td>
      <td>C兼容FFI</td>
    </tr>
    <tr>
      <td><strong>二进制大小</strong></td>
      <td>最小</td>
      <td>STL膨胀、RTTI表</td>
      <td>无运行时、最小大小</td>
    </tr>
  </tbody>
</table>

<h2 id="其他内核中的c案例研究">其他内核中的C++案例研究</h2>

<h3 id="windows-nt内核">Windows NT内核</h3>

<p><strong>状态:</strong> 部分C++使用，主要在驱动框架中</p>

<p><strong>约束:</strong></p>
<ul>
  <li>C++的严格子集</li>
  <li>无异常</li>
  <li>无RTTI</li>
  <li>无STL</li>
  <li>需要自定义内存分配器</li>
</ul>

<p><strong>关键区别:</strong> Windows从一开始(1993)就考虑了C++。Linux没有。</p>

<h3 id="macosios内核-xnu">macOS/iOS内核 (XNU)</h3>

<p><strong>状态:</strong> C++用于IOKit (驱动框架)</p>

<p><strong>约束:</strong></p>
<ul>
  <li>有限的C++子集</li>
  <li>仔细控制的使用</li>
  <li>早于现代C++特性</li>
</ul>

<p><strong>关键区别:</strong> Apple控制整个生态系统。Linux是社区驱动的，硬件多样化。</p>

<h3 id="fuchsia-google-1">Fuchsia (Google)</h3>

<p><strong>状态:</strong> 广泛使用C++</p>

<p><strong>关键区别:</strong> <strong>全新内核</strong> (始于2016年)，没有遗留代码库。Linux有30多年的C代码和既定约定。</p>

<h3 id="案例研究的结论">案例研究的结论</h3>

<p><strong>每个使用C++的内核都:</strong></p>
<ol>
  <li>从一开始就为C++设计，或</li>
  <li>使用高度受限的C++子集，类似于”带类的C”</li>
</ol>

<p><strong>Linux两者都不是。</strong> 它有3000万行C代码和重视显式和简单性的文化。</p>

<h2 id="时机因素rust已经赢得了第二语言席位">时机因素：Rust已经赢得了”第二语言”席位</h2>

<h3 id="为什么时机很重要">为什么时机很重要</h3>

<p>Linux内核添加第二语言是<strong>巨大的工程</strong>：</p>
<ul>
  <li>构建系统变更</li>
  <li>文档需求</li>
  <li>维护者培训</li>
  <li>ABI兼容性问题</li>
  <li>工具链集成</li>
</ul>

<p><strong>内核社区不会多次这样做。</strong></p>

<h3 id="rust的时间线">Rust的时间线</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2020: 宣布Rust for Linux
      - 向LKML发布初始RFC
      - 社区讨论开始

2021: 基础设施开发
      - 构建系统集成
      - 内核抽象层开发

2022 (10月): Rust合并到Linux 6.1开发周期
        - Linus Torvalds接受补丁

2022 (12月): Linux 6.1发布
        - 首个支持Rust的稳定内核

2023-2024: 生态系统增长
        - Android Binder用Rust重写
        - GPU驱动 (Nova)
        - 网络PHY驱动

2025 (12月): Rust成为"永久核心语言"
        - 不再是实验性的
        - 338个文件，135,662行生产代码
</code></pre></div></div>

<h3 id="c需要什么">C++需要什么？</h3>

<p>要匹配Rust的成功，C++需要：</p>

<p><strong>1. 专门的团队</strong> (5-10名工程师，多年承诺)
<strong>2. 企业赞助</strong> (Google/Microsoft/Meta级别)
<strong>3. 杀手级应用</strong> (等同于Android Binder)
<strong>4. 工具链开发</strong> (内核安全的C++子集)
<strong>5. 社区支持</strong> (Linus和维护者)</p>

<p><strong>当前状态:</strong></p>
<ul>
  <li>❌ 没有团队在做这个</li>
  <li>❌ 没有企业赞助商</li>
  <li>❌ 没有确定的杀手级应用</li>
  <li>❌ 没有工具链工作</li>
  <li>❌ Linus明确反对 (20年)</li>
</ul>

<h2 id="经济和工程现实">经济和工程现实</h2>

<h3 id="所需资源投资">所需资源投资</h3>

<p>基于Rust for Linux的开发：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>总工作量估算 (2020-2025):
- 核心团队: ~10名工程师 × 5年 = 50人年
- 企业贡献: ~20名工程师 × 2年 = 40人年
- 社区贡献: ~100名贡献者 × 0.5年 = 50人年
总计: ~140人年的工程努力

成本估算 (保守):
- 平均工程师成本: $200,000/年 (薪水 + 开销)
- 总投资: 约$2800万美元
</code></pre></div></div>

<p><strong>要让C++进入内核，有人需要投入类似的资源。</strong></p>

<h3 id="谁会资助这个">谁会资助这个？</h3>

<p><strong>Rust for Linux赞助商:</strong></p>
<ul>
  <li>Google (Android Binder，安全动机)</li>
  <li>Microsoft (Azure安全，NT内核Rust倡议)</li>
  <li>Arm (架构支持，驱动开发)</li>
  <li>Meta (网络，基础设施)</li>
</ul>

<p><strong>潜在的C++赞助商:</strong></p>
<ul>
  <li>??? (没有明确候选人)</li>
</ul>

<p><strong>为什么没有赞助商?</strong></p>
<ol>
  <li>C++不能解决Rust尚未解决的问题</li>
  <li>投资是重复的 (Rust已经存在)</li>
  <li>政治风险 (Linus的反对)</li>
  <li>技术风险 (根本设计不匹配)</li>
</ol>

<h2 id="结论判决">结论：判决</h2>

<h3 id="发现总结">发现总结</h3>

<p><strong>C++能进入Linux内核吗?</strong></p>

<p><strong>答案: 极不可能 (&lt; 5%概率)，原因如下:</strong></p>

<h4 id="政治障碍-高">政治障碍 (高)</h4>
<ul>
  <li>✗ Linus Torvalds明确、持续的反对 (20+年)</li>
  <li>✗ 内核维护者社区中无倡导者</li>
  <li>✗ Rust已占据”第二语言”生态位</li>
</ul>

<h4 id="技术障碍-高">技术障碍 (高)</h4>
<ul>
  <li>✗ 异常处理与内核需求根本不兼容</li>
  <li>✗ 隐藏的内存分配违反内核哲学</li>
  <li>✗ 无编译时内存安全保证</li>
  <li>✗ 运行时依赖 (RTTI, libstdc++) 不适合内核</li>
  <li>✗ ABI不稳定使模块系统复杂化</li>
</ul>

<h4 id="工程障碍-高">工程障碍 (高)</h4>
<ul>
  <li>✗ 没有团队在做C++内核集成</li>
  <li>✗ 没有确定的企业赞助商</li>
  <li>✗ 没有杀手级应用来证明投资合理</li>
  <li>✗ 估计需要$2800万+投资 (基于Rust先例)</li>
</ul>

<h4 id="时机障碍-高">时机障碍 (高)</h4>
<ul>
  <li>✗ Rust已投资140+人年</li>
  <li>✗ Rust有生产部署 (Android Binder, GPU驱动)</li>
  <li>✗ 内核不会添加第三种高级语言</li>
</ul>

<h3 id="对比为什么rust成功而c不能">对比：为什么Rust成功而C++不能</h3>

<table>
  <thead>
    <tr>
      <th>因素</th>
      <th>Rust</th>
      <th>C++</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>内存安全</strong></td>
      <td>✅ 编译时保证</td>
      <td>❌ 无</td>
    </tr>
    <tr>
      <td><strong>内核哲学契合</strong></td>
      <td>✅ 一切显式</td>
      <td>❌ 隐藏行为</td>
    </tr>
    <tr>
      <td><strong>运行时需求</strong></td>
      <td>✅ 无 (<code class="language-plaintext highlighter-rouge">#![no_std]</code>)</td>
      <td>❌ 需要libstdc++子集</td>
    </tr>
    <tr>
      <td><strong>错误处理</strong></td>
      <td>✅ 零成本<code class="language-plaintext highlighter-rouge">Result&lt;T&gt;</code></td>
      <td>❌ 异常或手动</td>
    </tr>
    <tr>
      <td><strong>行业支持</strong></td>
      <td>✅ Google, MS, Arm, Meta</td>
      <td>❌ 无内核工作支持</td>
    </tr>
    <tr>
      <td><strong>活跃开发</strong></td>
      <td>✅ 338文件, 135K行</td>
      <td>❌ 零</td>
    </tr>
    <tr>
      <td><strong>Linus立场</strong></td>
      <td>✅ 中立→接受</td>
      <td>❌ 明确反对</td>
    </tr>
    <tr>
      <td><strong>杀手级应用</strong></td>
      <td>✅ Android Binder</td>
      <td>❌ 无确定的</td>
    </tr>
  </tbody>
</table>

<h3 id="真正的问题">真正的问题</h3>

<p>问题不是”C++能进入Linux内核吗？”</p>

<p><strong>问题是: “为什么要这样做？”</strong></p>

<ul>
  <li>它不能解决Rust尚未解决的问题</li>
  <li>它带来Rust没有的技术包袱</li>
  <li>它缺乏企业和社区支持</li>
  <li>它面临Rust从未遇到的政治反对</li>
</ul>

<h3 id="最终想法">最终想法</h3>

<p>C++是许多领域的优秀语言：</p>
<ul>
  <li>应用开发</li>
  <li>游戏引擎</li>
  <li>高性能计算</li>
  <li>系统软件 (内核之外)</li>
</ul>

<p>但对于<strong>Linux内核具体来说</strong>，船已经开走了。Rust提供：</p>
<ul>
  <li>更好的内存安全</li>
  <li>更好的内核哲学契合</li>
  <li>更好的内核开发工具</li>
  <li>更好的行业动力</li>
</ul>

<p><strong>除非基本技术现实改变</strong>，C++将无限期地留在Linux内核之外。</p>

<p>对C++倡导者来说，更有成效的问题是：<strong>C++如何在自己的领域改进？</strong> 而不是试图进入一个技术上不适合且政治上不受欢迎的领域。</p>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://harmful.cat-v.org/software/c++/linus">Re: Compiling C++ kernel module + Makefile</a> - Linus Torvalds, January 19, 2004, Linux Kernel Mailing List <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://lwn.net/Articles/249460/">Re: [RFC] Convert builtin-mailinfo.c to use The Better String Library</a> - Linus Torvalds, September 6, 2007, Git Mailing List <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://www.webpronews.com/linux-kernel-adopts-rust-as-permanent-core-language-in-2025/">Linux Kernel Adopts Rust as Permanent Core Language in 2025</a> - WebProNews, December 2025 <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://www.research.ed.ac.uk/files/78829292/low_cost_deterministic_C_exceptions_for_embedded_systems.pdf">Low-cost deterministic C++ exceptions for embedded systems</a> - University of Edinburgh, 2019, ACM SIGPLAN International Conference on Compiler Construction <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://doi.org/10.1145/3764860.3768332">Propagating C++ exceptions across the user/kernel boundary</a> - Voronetskiy &amp; Spink, University of St Andrews, PLOS 2025 <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://mars-research.github.io/doc/2024-acsac-rfl.pdf">Rust for Linux: Understanding the Security Impact</a> - Research paper analyzing Rust’s security impact in Linux kernel <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[With Rust successfully entering the Linux kernel as the second language after C, a natural question arises: could C++ have been chosen instead, or could it still enter the kernel in the future? This comprehensive analysis examines the technical barriers, historical context, and fundamental design conflicts that make C++ adoption in the Linux kernel highly unlikely, despite C++ being a mature and widely-used systems programming language.]]></summary></entry><entry><title type="html">Rust in the Linux Kernel: Understanding the Current State and Future Direction</title><link href="https://weinan.io/2026/02/16/rust-in-linux-kernel-reality-check.html" rel="alternate" type="text/html" title="Rust in the Linux Kernel: Understanding the Current State and Future Direction" /><published>2026-02-16T00:00:00+00:00</published><updated>2026-02-16T00:00:00+00:00</updated><id>https://weinan.io/2026/02/16/rust-in-linux-kernel-reality-check</id><content type="html" xml:base="https://weinan.io/2026/02/16/rust-in-linux-kernel-reality-check.html"><![CDATA[<p>Examining the actual state of Rust in the Linux kernel through data and production code. This analysis explores 135,662 lines of Rust code currently in the kernel, addresses common questions about ‘unsafe’, development experience, and the gradual adoption path. With concrete code examples from the Android Binder rewrite and real metrics from the codebase, we examine both achievements and challenges.</p>

<h2 id="introduction-understanding-rusts-current-role-in-the-kernel">Introduction: Understanding Rust’s Current Role in the Kernel</h2>

<p>A common discussion in developer communities centers around several observations: <em>“Rust is currently being used for device drivers, not the kernel core. Using <code class="language-plaintext highlighter-rouge">unsafe</code> to interface with C may add complexity compared to writing directly in C or Zig. It’s unclear whether Rust will expand into core kernel development.”</em></p>

<p>These are legitimate questions that deserve data-driven answers. To understand Rust’s current state and future trajectory in Linux, we need to examine both what has been achieved and what challenges remain. Let’s look at the actual kernel codebase as of Linux 6.x.</p>

<h2 id="the-numbers-rusts-actual-penetration">The Numbers: Rust’s Actual Penetration</h2>

<p>Based on comprehensive analysis using cloc v2.04 on the Linux kernel source tree (Linux 6.x), here’s the reality:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Total Rust files:        163 .rs files
Lines of code:           20,064 lines (pure code, excluding comments/blanks)
Total lines:             41,907 lines (including 17,760 comment lines)
Kernel abstraction modules: 74 modules across rust/kernel/
Production drivers:      17 driver files
Build infrastructure:    9 macro files + 15 pin-init files
</code></pre></div></div>

<p><strong>Distribution breakdown (by lines of code):</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rust/kernel/           13,500 lines (67.3%) - Core abstraction layer
rust/pin-init/          2,435 lines (12.1%) - Pin initialization infrastructure
drivers/                1,913 lines ( 9.5%) - Production drivers
rust/macros/              894 lines ( 4.5%) - Procedural macros
samples/rust/             758 lines ( 3.8%) - Example code
Other (scripts, etc)      564 lines ( 2.8%) - Supporting code
</code></pre></div></div>

<p><strong>Total line counts (with comments and blanks):</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rust/kernel/           30,858 lines (101 files) - Includes 14,290 comment lines
drivers/                2,602 lines ( 17 files) - Production Rust drivers
rust/pin-init/          4,826 lines ( 15 files) - Memory safety infrastructure
rust/macros/            1,541 lines (  9 files) - Compile-time code generation
samples/rust/           1,179 lines ( 12 files) - Learning examples
Other                     901 lines (  9 files) - Scripts and utilities
</code></pre></div></div>

<p>This is not a toy experiment. This is <strong>production-grade infrastructure</strong> covering 74 kernel subsystems.</p>

<h3 id="the-74-kernel-abstraction-modules-rustkernel">The 74 Kernel Abstraction Modules (<code class="language-plaintext highlighter-rouge">rust/kernel/</code>)</h3>

<p>The core abstraction layer provides safe Rust interfaces to kernel functionality:</p>

<p><strong>Hardware &amp; Device Management (19 modules):</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">acpi</code> - ACPI (Advanced Configuration and Power Interface) support</li>
  <li><code class="language-plaintext highlighter-rouge">auxiliary</code> - Auxiliary bus support</li>
  <li><code class="language-plaintext highlighter-rouge">clk</code> - Clock framework abstractions</li>
  <li><code class="language-plaintext highlighter-rouge">cpu</code> - CPU management</li>
  <li><code class="language-plaintext highlighter-rouge">cpufreq</code> - CPU frequency scaling</li>
  <li><code class="language-plaintext highlighter-rouge">dma</code> - DMA (Direct Memory Access) mapping</li>
  <li><code class="language-plaintext highlighter-rouge">device</code> - Device model core abstractions</li>
  <li><code class="language-plaintext highlighter-rouge">firmware</code> - Firmware loading interface</li>
  <li><code class="language-plaintext highlighter-rouge">i2c</code> - I2C bus support</li>
  <li><code class="language-plaintext highlighter-rouge">irq</code> - Interrupt handling</li>
  <li><code class="language-plaintext highlighter-rouge">pci</code> - PCI bus support</li>
  <li><code class="language-plaintext highlighter-rouge">platform</code> - Platform device abstractions</li>
  <li><code class="language-plaintext highlighter-rouge">power</code> - Power management</li>
  <li><code class="language-plaintext highlighter-rouge">regulator</code> - Voltage regulator framework</li>
  <li><code class="language-plaintext highlighter-rouge">reset</code> - Reset controller framework</li>
  <li><code class="language-plaintext highlighter-rouge">security</code> - Security framework hooks</li>
  <li><code class="language-plaintext highlighter-rouge">spi</code> - SPI bus support</li>
  <li><code class="language-plaintext highlighter-rouge">xarray</code> - XArray (resizable array) data structure</li>
  <li><code class="language-plaintext highlighter-rouge">of</code> - Device tree (Open Firmware) support</li>
</ul>

<p><strong>Graphics &amp; Display (8 modules):</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">drm</code> - Direct Rendering Manager core</li>
  <li><code class="language-plaintext highlighter-rouge">drm::allocator</code> - DRM memory allocator</li>
  <li><code class="language-plaintext highlighter-rouge">drm::device</code> - DRM device management</li>
  <li><code class="language-plaintext highlighter-rouge">drm::drv</code> - DRM driver registration</li>
  <li><code class="language-plaintext highlighter-rouge">drm::file</code> - DRM file operations</li>
  <li><code class="language-plaintext highlighter-rouge">drm::gem</code> - Graphics Execution Manager (memory management)</li>
  <li><code class="language-plaintext highlighter-rouge">drm::ioctl</code> - DRM ioctl handling</li>
  <li><code class="language-plaintext highlighter-rouge">drm::mm</code> - DRM memory manager</li>
</ul>

<p><strong>Networking (5 modules):</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">net</code> - Core networking abstractions</li>
  <li><code class="language-plaintext highlighter-rouge">net::phy</code> - PHY (Physical layer) device support</li>
  <li><code class="language-plaintext highlighter-rouge">net::dev</code> - Network device abstractions</li>
  <li><code class="language-plaintext highlighter-rouge">netdevice</code> - Network device interface</li>
  <li><code class="language-plaintext highlighter-rouge">ethtool</code> - Ethtool interface for network configuration</li>
</ul>

<p><strong>Storage &amp; File Systems (9 modules):</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">block</code> - Block device layer</li>
  <li><code class="language-plaintext highlighter-rouge">block::mq</code> - Multi-queue block layer</li>
  <li><code class="language-plaintext highlighter-rouge">fs</code> - File system abstractions</li>
  <li><code class="language-plaintext highlighter-rouge">configfs</code> - Configuration file system</li>
  <li><code class="language-plaintext highlighter-rouge">debugfs</code> - Debug file system</li>
  <li><code class="language-plaintext highlighter-rouge">folio</code> - Page folio support (memory management)</li>
  <li><code class="language-plaintext highlighter-rouge">page</code> - Page management</li>
  <li><code class="language-plaintext highlighter-rouge">pages</code> - Multi-page handling</li>
  <li><code class="language-plaintext highlighter-rouge">seq_file</code> - Sequential file interface</li>
</ul>

<p><strong>Synchronization &amp; Concurrency (7 modules):</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">sync</code> - Synchronization primitives</li>
  <li><code class="language-plaintext highlighter-rouge">sync::arc</code> - Atomic reference counting</li>
  <li><code class="language-plaintext highlighter-rouge">sync::lock</code> - Lock abstractions</li>
  <li><code class="language-plaintext highlighter-rouge">sync::condvar</code> - Condition variables</li>
  <li><code class="language-plaintext highlighter-rouge">sync::poll</code> - Polling support</li>
  <li><code class="language-plaintext highlighter-rouge">rcu</code> - Read-Copy-Update synchronization</li>
  <li><code class="language-plaintext highlighter-rouge">workqueue</code> - Deferred work execution</li>
</ul>

<p><strong>Memory Management (5 modules):</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">alloc</code> - Memory allocation</li>
  <li><code class="language-plaintext highlighter-rouge">mm</code> - Memory management core</li>
  <li><code class="language-plaintext highlighter-rouge">kasync</code> - Asynchronous memory allocation</li>
  <li><code class="language-plaintext highlighter-rouge">vmalloc</code> - Virtual memory allocation</li>
  <li><code class="language-plaintext highlighter-rouge">static_call</code> - Static call optimization</li>
</ul>

<p><strong>Core Kernel Services (11 modules):</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">cred</code> - Credential management</li>
  <li><code class="language-plaintext highlighter-rouge">kunit</code> - Kernel unit testing framework</li>
  <li><code class="language-plaintext highlighter-rouge">module</code> - Kernel module support</li>
  <li><code class="language-plaintext highlighter-rouge">panic</code> - Panic handling</li>
  <li><code class="language-plaintext highlighter-rouge">pid</code> - Process ID management</li>
  <li><code class="language-plaintext highlighter-rouge">task</code> - Task/process management</li>
  <li><code class="language-plaintext highlighter-rouge">time</code> - Time management</li>
  <li><code class="language-plaintext highlighter-rouge">timer</code> - Timer support</li>
  <li><code class="language-plaintext highlighter-rouge">pid_namespace</code> - PID namespace support</li>
  <li><code class="language-plaintext highlighter-rouge">user</code> - User structure abstractions</li>
  <li><code class="language-plaintext highlighter-rouge">uidgid</code> - User/Group ID handling</li>
</ul>

<p><strong>Low-level Infrastructure (10 modules):</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">bindings</code> - Auto-generated C bindings</li>
  <li><code class="language-plaintext highlighter-rouge">build_assert</code> - Compile-time assertions</li>
  <li><code class="language-plaintext highlighter-rouge">build_error</code> - Compile-time error generation</li>
  <li><code class="language-plaintext highlighter-rouge">error</code> - Error handling (kernel error codes)</li>
  <li><code class="language-plaintext highlighter-rouge">init</code> - Initialization macros</li>
  <li><code class="language-plaintext highlighter-rouge">ioctl</code> - ioctl command handling</li>
  <li><code class="language-plaintext highlighter-rouge">prelude</code> - Common imports</li>
  <li><code class="language-plaintext highlighter-rouge">print</code> - Kernel printing (pr_info, pr_err, etc.)</li>
  <li><code class="language-plaintext highlighter-rouge">static_assert</code> - Static assertions</li>
  <li><code class="language-plaintext highlighter-rouge">str</code> - String handling</li>
</ul>

<p><strong>Data Structures &amp; Utilities:</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">kuid</code> - Kernel user ID</li>
  <li><code class="language-plaintext highlighter-rouge">kgid</code> - Kernel group ID</li>
  <li><code class="language-plaintext highlighter-rouge">list</code> - Linked list abstractions</li>
  <li><code class="language-plaintext highlighter-rouge">miscdevice</code> - Miscellaneous device support</li>
  <li><code class="language-plaintext highlighter-rouge">revocable</code> - Revocable resources</li>
  <li><code class="language-plaintext highlighter-rouge">types</code> - Core type definitions</li>
</ul>

<h3 id="the-17-production-drivers-1913-lines-of-code">The 17 Production Drivers (1,913 lines of code)</h3>

<p><strong>GPU Drivers (13 files):</strong></p>
<ul>
  <li><strong>Nova</strong> (Nvidia GSP firmware driver):
    <ul>
      <li><code class="language-plaintext highlighter-rouge">drivers/gpu/drm/nova/</code> (5 files): DRM integration layer
        <ul>
          <li><code class="language-plaintext highlighter-rouge">nova.rs</code>, <code class="language-plaintext highlighter-rouge">driver.rs</code>, <code class="language-plaintext highlighter-rouge">gem.rs</code>, <code class="language-plaintext highlighter-rouge">uapi.rs</code>, <code class="language-plaintext highlighter-rouge">file.rs</code></li>
        </ul>
      </li>
      <li><code class="language-plaintext highlighter-rouge">drivers/gpu/nova-core/</code> (7 files): Core GPU driver logic
        <ul>
          <li><code class="language-plaintext highlighter-rouge">nova_core.rs</code>, <code class="language-plaintext highlighter-rouge">driver.rs</code>, <code class="language-plaintext highlighter-rouge">gpu.rs</code>, <code class="language-plaintext highlighter-rouge">firmware.rs</code>, <code class="language-plaintext highlighter-rouge">util.rs</code></li>
          <li><code class="language-plaintext highlighter-rouge">regs.rs</code>, <code class="language-plaintext highlighter-rouge">regs/macros.rs</code> - Register access abstractions</li>
        </ul>
      </li>
      <li><code class="language-plaintext highlighter-rouge">drivers/gpu/drm/drm_panic_qr.rs</code> - QR code panic screen (996 lines)</li>
    </ul>
  </li>
</ul>

<p><strong>Network Drivers (2 files):</strong></p>
<ul>
  <li><strong>PHY Drivers</strong>:
    <ul>
      <li><code class="language-plaintext highlighter-rouge">ax88796b_rust.rs</code> (134 lines) - ASIX Electronics PHY driver (AX88772A/AX88772C/AX88796B)</li>
      <li><code class="language-plaintext highlighter-rouge">qt2025.rs</code> (103 lines) - Marvell QT2025 PHY driver</li>
    </ul>
  </li>
</ul>

<p><strong>Other Drivers (2 files):</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">cpufreq/rcpufreq_dt.rs</code> (227 lines) - Device tree-based CPU frequency driver</li>
  <li><code class="language-plaintext highlighter-rouge">block/rnull.rs</code> (80 lines) - Rust null block device (testing/example)</li>
</ul>

<p>Note: The Android Binder driver mentioned in case studies below is currently in development/out-of-tree and not yet merged into mainline Linux 6.x. The production driver count reflects only in-tree drivers as of the current kernel version.</p>

<p>This comprehensive infrastructure demonstrates that Rust in Linux has moved far beyond experimentation into production deployment across critical subsystems. Let’s examine actual kernel code to understand what “Rust in the kernel” really means.</p>

<h2 id="case-study-1-android-binder---production-rust-in-action">Case Study 1: Android Binder - Production Rust in Action</h2>

<p>The Android Binder IPC mechanism is one of the most critical components of the Android ecosystem. Google has rewritten it entirely in Rust. Here’s what the actual code looks like:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/android/binder/rust_binder_main.rs</span>
<span class="c1">// Copyright (C) 2025 Google LLC.</span>

<span class="k">use</span> <span class="nn">kernel</span><span class="p">::{</span>
    <span class="nn">bindings</span><span class="p">::{</span><span class="k">self</span><span class="p">,</span> <span class="n">seq_file</span><span class="p">},</span>
    <span class="nn">fs</span><span class="p">::</span><span class="n">File</span><span class="p">,</span>
    <span class="nn">list</span><span class="p">::{</span><span class="n">ListArc</span><span class="p">,</span> <span class="n">ListArcSafe</span><span class="p">,</span> <span class="n">ListLinksSelfPtr</span><span class="p">,</span> <span class="n">TryNewListArc</span><span class="p">},</span>
    <span class="nn">prelude</span><span class="p">::</span><span class="o">*</span><span class="p">,</span>
    <span class="nn">seq_file</span><span class="p">::</span><span class="n">SeqFile</span><span class="p">,</span>
    <span class="nn">sync</span><span class="p">::</span><span class="nn">poll</span><span class="p">::</span><span class="n">PollTable</span><span class="p">,</span>
    <span class="nn">sync</span><span class="p">::</span><span class="nb">Arc</span><span class="p">,</span>
    <span class="nn">task</span><span class="p">::</span><span class="n">Pid</span><span class="p">,</span>
    <span class="nn">types</span><span class="p">::</span><span class="n">ForeignOwnable</span><span class="p">,</span>
    <span class="nn">uaccess</span><span class="p">::</span><span class="n">UserSliceWriter</span><span class="p">,</span>
<span class="p">};</span>

<span class="nd">module!</span> <span class="p">{</span>
    <span class="k">type</span><span class="p">:</span> <span class="n">BinderModule</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="s">"rust_binder"</span><span class="p">,</span>
    <span class="n">authors</span><span class="p">:</span> <span class="p">[</span><span class="s">"Wedson Almeida Filho"</span><span class="p">,</span> <span class="s">"Alice Ryhl"</span><span class="p">],</span>
    <span class="n">description</span><span class="p">:</span> <span class="s">"Android Binder"</span><span class="p">,</span>
    <span class="n">license</span><span class="p">:</span> <span class="s">"GPL"</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Module structure</strong> (from actual source):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>drivers/android/binder/
├── rust_binder_main.rs    (611 lines - main module)
├── process.rs              (1,745 lines - largest file)
├── thread.rs               (1,596 lines)
├── node.rs                 (1,131 lines)
├── transaction.rs          (456 lines)
├── allocation.rs           (602 lines)
├── page_range.rs           (734 lines)
├── range_alloc/tree.rs     (488 lines - allocator)
└── [other modules]
</code></pre></div></div>

<h3 id="understanding-unsafe-in-practice">Understanding “Unsafe” in Practice</h3>

<p>A common concern is whether using <code class="language-plaintext highlighter-rouge">unsafe</code> in Rust to call C APIs adds development complexity. Let’s examine the actual numbers from the Binder driver:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">grep</span> <span class="nt">-r</span> <span class="s2">"unsafe"</span> drivers/android/binder/<span class="k">*</span>.rs | <span class="nb">wc</span> <span class="nt">-l</span>
179 occurrences of <span class="s1">'unsafe'</span> across 11 files
</code></pre></div></div>

<p>That’s <strong>179 <code class="language-plaintext highlighter-rouge">unsafe</code> blocks in approximately 8,000 lines of code</strong> - roughly 2-3% of the codebase.</p>

<p><strong>The key difference from C</strong>: In C, all code operates without memory safety guarantees from the compiler. In Rust, approximately 97-98% of the Binder code receives compile-time safety verification, with unsafe operations explicitly marked and isolated to specific locations.</p>

<p>Let’s examine how this looks in practice:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/android/binder/process.rs (actual kernel code)</span>
<span class="k">use</span> <span class="nn">kernel</span><span class="p">::{</span>
    <span class="nn">sync</span><span class="p">::{</span>
        <span class="nn">lock</span><span class="p">::{</span><span class="nn">spinlock</span><span class="p">::</span><span class="n">SpinLockBackend</span><span class="p">,</span> <span class="n">Guard</span><span class="p">},</span>
        <span class="nb">Arc</span><span class="p">,</span> <span class="n">ArcBorrow</span><span class="p">,</span> <span class="n">CondVar</span><span class="p">,</span> <span class="n">Mutex</span><span class="p">,</span> <span class="n">SpinLock</span><span class="p">,</span> <span class="n">UniqueArc</span><span class="p">,</span>
    <span class="p">},</span>
    <span class="nn">types</span><span class="p">::</span><span class="n">ARef</span><span class="p">,</span>
<span class="p">};</span>

<span class="nd">#[derive(Copy,</span> <span class="nd">Clone)]</span>
<span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="k">enum</span> <span class="n">IsFrozen</span> <span class="p">{</span>
    <span class="n">Yes</span><span class="p">,</span>
    <span class="n">No</span><span class="p">,</span>
    <span class="n">InProgress</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">IsFrozen</span> <span class="p">{</span>
    <span class="cd">/// Whether incoming transactions should be rejected due to freeze.</span>
    <span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="k">fn</span> <span class="nf">is_frozen</span><span class="p">(</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">bool</span> <span class="p">{</span>
        <span class="k">match</span> <span class="k">self</span> <span class="p">{</span>
            <span class="nn">IsFrozen</span><span class="p">::</span><span class="n">Yes</span> <span class="k">=&gt;</span> <span class="k">true</span><span class="p">,</span>
            <span class="nn">IsFrozen</span><span class="p">::</span><span class="n">No</span> <span class="k">=&gt;</span> <span class="k">false</span><span class="p">,</span>
            <span class="nn">IsFrozen</span><span class="p">::</span><span class="n">InProgress</span> <span class="k">=&gt;</span> <span class="k">true</span><span class="p">,</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Notice something? <strong>This is pure safe Rust</strong> - no <code class="language-plaintext highlighter-rouge">unsafe</code> blocks, yet it’s core kernel logic. The type system ensures:</p>
<ul>
  <li>No null pointer dereferences</li>
  <li>No use-after-free</li>
  <li>No data races</li>
  <li>No uninitialized memory access</li>
</ul>

<p><strong>All enforced at compile time, not runtime.</strong></p>

<h2 id="case-study-2-lock-abstractions---raii-in-the-kernel">Case Study 2: Lock Abstractions - RAII in the Kernel</h2>

<p>One of the most powerful Rust features for kernel development is RAII (Resource Acquisition Is Initialization). Here’s the actual abstraction layer from <code class="language-plaintext highlighter-rouge">rust/kernel/sync/lock.rs</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/sync/lock.rs (actual kernel code)</span>
<span class="cd">/// The "backend" of a lock.</span>
<span class="cd">///</span>
<span class="cd">/// # Safety</span>
<span class="cd">///</span>
<span class="cd">/// - Implementers must ensure that only one thread/CPU may access the protected</span>
<span class="cd">///   data once the lock is owned, that is, between calls to `lock` and `unlock`.</span>
<span class="cd">/// - Implementers must also ensure that `relock` uses the same locking method as</span>
<span class="cd">///   the original lock operation.</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">trait</span> <span class="n">Backend</span> <span class="p">{</span>
    <span class="cd">/// The state required by the lock.</span>
    <span class="k">type</span> <span class="n">State</span><span class="p">;</span>

    <span class="cd">/// The state required to be kept between `lock` and `unlock`.</span>
    <span class="k">type</span> <span class="n">GuardState</span><span class="p">;</span>

    <span class="cd">/// Acquires the lock, making the caller its owner.</span>
    <span class="cd">///</span>
    <span class="cd">/// # Safety</span>
    <span class="cd">///</span>
    <span class="cd">/// Callers must ensure that [`Backend::init`] has been previously called.</span>
    <span class="nd">#[must_use]</span>
    <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">lock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="k">Self</span><span class="p">::</span><span class="n">State</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="k">Self</span><span class="p">::</span><span class="n">GuardState</span><span class="p">;</span>

    <span class="cd">/// Releases the lock, giving up its ownership.</span>
    <span class="cd">///</span>
    <span class="cd">/// # Safety</span>
    <span class="cd">///</span>
    <span class="cd">/// It must only be called by the current owner of the lock.</span>
    <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">unlock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="k">Self</span><span class="p">::</span><span class="n">State</span><span class="p">,</span> <span class="n">guard_state</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">Self</span><span class="p">::</span><span class="n">GuardState</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Building on the three-layer architecture explained above, the <code class="language-plaintext highlighter-rouge">Backend</code> trait provides the unsafe low-level interface. Driver developers use the safe high-level API:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Safe to use in driver code - compiler prevents forgetting to unlock</span>
<span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">guard</span> <span class="o">=</span> <span class="n">spinlock</span><span class="nf">.lock</span><span class="p">();</span> <span class="c1">// Acquire lock</span>

    <span class="k">if</span> <span class="n">error_condition</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="n">EINVAL</span><span class="p">);</span> <span class="c1">// Early return</span>
        <span class="c1">// Guard dropped here - lock AUTOMATICALLY released</span>
    <span class="p">}</span>

    <span class="nf">do_critical_work</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">guard</span><span class="p">)</span><span class="o">?</span><span class="p">;</span> <span class="c1">// If this fails and returns</span>
    <span class="c1">// Guard dropped here - lock AUTOMATICALLY released</span>

<span class="p">}</span> <span class="c1">// Normal exit - lock automatically released</span>
</code></pre></div></div>

<p><strong>In C, the equivalent would be:</strong></p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C version - manual, error-prone</span>
<span class="n">spin_lock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">lock</span><span class="p">);</span>

<span class="k">if</span> <span class="p">(</span><span class="n">error_condition</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">spin_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">lock</span><span class="p">);</span>  <span class="c1">// Must remember to unlock!</span>
    <span class="k">return</span> <span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span>
<span class="p">}</span>

<span class="n">ret</span> <span class="o">=</span> <span class="n">do_critical_work</span><span class="p">(</span><span class="o">&amp;</span><span class="n">data</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">spin_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">lock</span><span class="p">);</span>  <span class="c1">// Must remember to unlock!</span>
    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="n">spin_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">lock</span><span class="p">);</span>  <span class="c1">// Must remember to unlock!</span>
</code></pre></div></div>

<p><strong>Every single <code class="language-plaintext highlighter-rouge">return</code> path requires manual unlock.</strong> Miss one, and you have a deadlock. Code analysis tools can catch some of these, but the C compiler provides <em>zero</em> guarantees.</p>

<p>The Rust compiler, on the other hand, makes it <strong>impossible</strong> to forget the unlock. This isn’t “mental burden” - this is <strong>eliminating an entire class of bugs at compile time</strong>.</p>

<h2 id="examining-common-questions">Examining Common Questions</h2>

<h3 id="question-1-rust-is-only-for-drivers-not-the-kernel-core">Question 1: “Rust is only for drivers, not the kernel core”</h3>

<p><strong>Current status</strong>: This is accurate for now, and it reflects the planned adoption strategy.</p>

<p>The Linux kernel contains approximately 30 million lines of C code. Immediate replacement of core kernel components was never the goal. Instead, the approach follows a <strong>gradual, methodical adoption pattern</strong>:</p>

<p><strong>Phase 1 (2022-2026)</strong>: Infrastructure &amp; drivers</p>
<ul>
  <li>✅ Build system integration (695-line Makefile, Kconfig integration)</li>
  <li>✅ Kernel abstraction layer (74 modules, 45,622 lines)</li>
  <li>✅ Production drivers (Android Binder, Nvidia Nova GPU, network PHY)</li>
  <li>✅ Testing framework (KUnit integration, doctests)</li>
</ul>

<p><strong>Phase 2 (2026-2028)</strong>: Subsystem expansion (currently happening)</p>
<ul>
  <li>🔄 File system drivers (Rust ext4, btrfs experiments)</li>
  <li>🔄 Network protocol components</li>
  <li>🔄 More architecture support (currently: x86_64, ARM64, RISC-V, LoongArch, PowerPC, s390)</li>
</ul>

<p><strong>Phase 3 (2028-2030+)</strong>: Core kernel components</p>
<ul>
  <li>🔮 Memory management subsystems</li>
  <li>🔮 Scheduler components</li>
  <li>🔮 VFS layer rewrites</li>
</ul>

<p>This is <strong>exactly how C++ adoption has worked in other massive systems</strong> (Windows kernel, browsers, databases). You start at the edges, build confidence, and gradually move inward.</p>

<p>The community’s stance on alternative languages is notable. While there’s no explicit exclusion of other systems languages like Zig, the reality is that <strong>no team is actively working on integrating them</strong><sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">1</a></sup>. Rust succeeded because it had:</p>
<ol>
  <li><strong>A dedicated team</strong> working for years (Rust for Linux project, started 2020)</li>
  <li><strong>Corporate backing</strong> (Google, Microsoft, Arm)</li>
  <li><strong>Production use cases</strong> (Android Binder was the killer app)</li>
</ol>

<p>Zig could theoretically follow the same path if someone invested the effort. The door isn’t closed - but the work is substantial, requiring similar multi-year investment and corporate backing that Rust received.</p>

<h3 id="question-2-using-unsafe-in-rust-adds-complexity-compared-to-c">Question 2: “Using <code class="language-plaintext highlighter-rouge">unsafe</code> in Rust adds complexity compared to C”</h3>

<p><strong>Let’s compare the development considerations</strong>: When evaluating cognitive load, we should consider what developers need to track:</p>

<p><strong>C kernel development mental checklist</strong> (100% of code):</p>
<ul>
  <li>✅ Did I check for NULL before dereferencing?</li>
  <li>✅ Did I pair every <code class="language-plaintext highlighter-rouge">kmalloc</code> with <code class="language-plaintext highlighter-rouge">kfree</code>?</li>
  <li>✅ Did I unlock every spinlock on every error path?</li>
  <li>✅ Is this pointer still valid? (no compiler help)</li>
  <li>✅ Did I initialize this variable?</li>
  <li>✅ Is this buffer access within bounds?</li>
  <li>✅ Are these types actually compatible? (manual casting)</li>
  <li>✅ Could this integer overflow?</li>
  <li>✅ Is there a race condition here? (manual reasoning)</li>
</ul>

<p><strong>Rust kernel development considerations</strong>:</p>
<ul>
  <li>For the 2-5% unsafe code: Verify safety invariants documented in unsafe blocks</li>
  <li>For the 95-98% safe code: Compiler enforces memory safety and concurrency rules</li>
</ul>

<p><strong>Perspective from kernel maintainer Greg Kroah-Hartman</strong> (February 2025)<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">2</a></sup>:</p>
<blockquote>
  <p>“The majority of bugs (quantity, not quality and severity) we have are due to the stupid little corner cases in C that are totally gone in Rust. Things like simple overwrites of memory (not that Rust can catch all of these by far), error path cleanups, forgetting to check error values, and use-after-free mistakes.”</p>

  <p>“Writing new code in Rust is a win for all of us.”</p>
</blockquote>

<p>The trade-off: C provides familiar syntax and complete manual control, while Rust provides compile-time verification for most code at the cost of learning the ownership system and dealing with explicit unsafe boundaries when interfacing with C APIs.</p>

<h3 id="question-3-why-not-zig-or-other-systems-languages">Question 3: “Why not Zig or other systems languages?”</h3>

<p>Zig’s philosophy as “better C” - with explicit control, zero hidden behavior, and excellent tooling - makes it an interesting alternative. The comparison is worth examining:</p>

<p><strong>Zig’s approach to memory safety:</strong></p>
<ul>
  <li>Manual memory management (like C)</li>
  <li><code class="language-plaintext highlighter-rouge">defer</code> for cleanup (helpful, but optional)</li>
  <li>Compile-time checks for control flow (great!)</li>
  <li>Runtime checks for bounds/overflow (can be disabled in release builds)</li>
</ul>

<p><strong>Rust’s approach to memory safety:</strong></p>
<ul>
  <li>Ownership system (enforced at compile time)</li>
  <li>Automatic cleanup via <code class="language-plaintext highlighter-rouge">Drop</code> trait (mandatory)</li>
  <li>Borrow checker prevents data races (compile-time guarantee)</li>
  <li>No runtime overhead for safety (zero-cost abstractions)</li>
</ul>

<p>For Linux kernel requirements, Rust’s <strong>mandatory, compile-time safety</strong> aligns with the goal of preventing memory safety vulnerabilities. Research shows approximately 70% of kernel CVEs are memory safety issues<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. Rust addresses these at compile time, while Zig provides optional runtime checks and better ergonomics than C.</p>

<p>The community’s stance on alternative languages is notable. While there’s no explicit exclusion of other systems languages like Zig, no team is currently actively working on integrating them<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">1</a></sup>. Rust succeeded through:</p>
<ol>
  <li>Dedicated team effort (Rust for Linux project, started 2020)</li>
  <li>Corporate backing (Google, Microsoft, Arm)</li>
  <li>Production use cases (Android Binder demonstrated viability)</li>
</ol>

<p>Any alternative language would need similar investment: building kernel abstractions (equivalent to 74 modules, 45,622 lines), proving production-readiness, and maintaining long-term commitment. The path is technically open, but requires substantial resources.</p>

<h2 id="the-actual-kernel-code-architecture">The Actual Kernel Code Architecture</h2>

<h3 id="understanding-the-three-layer-architecture">Understanding the Three-Layer Architecture</h3>

<p>The Rust kernel infrastructure follows a clear three-layer architecture that safely wraps C kernel APIs:</p>

<p><strong>Layer 1: C Kernel APIs (底层C内核)</strong></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Native Linux kernel C functions</span>
<span class="kt">void</span> <span class="nf">spin_lock</span><span class="p">(</span><span class="n">spinlock_t</span> <span class="o">*</span><span class="n">lock</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">spin_unlock</span><span class="p">(</span><span class="n">spinlock_t</span> <span class="o">*</span><span class="n">lock</span><span class="p">);</span>
<span class="kt">int</span> <span class="nf">genphy_soft_reset</span><span class="p">(</span><span class="k">struct</span> <span class="n">phy_device</span> <span class="o">*</span><span class="n">phydev</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>Layer 2: Auto-generated C Bindings (<code class="language-plaintext highlighter-rouge">rust/bindings/</code>)</strong></p>

<p>The <code class="language-plaintext highlighter-rouge">rust/bindings/bindings_helper.h</code> file specifies which C headers to bind:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;linux/spinlock.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/mutex.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/phy.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;drm/drm_device.h&gt;</span><span class="cp">
</span><span class="c1">// ... 80+ kernel headers</span>
</code></pre></div></div>

<p>The <strong>bindgen</strong> tool automatically generates Rust FFI (Foreign Function Interface) declarations:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Generated in rust/bindings/bindings_generated.rs</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">spin_lock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">spinlock_t</span><span class="p">);</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">spin_unlock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">spinlock_t</span><span class="p">);</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">genphy_soft_reset</span><span class="p">(</span><span class="n">phydev</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">phy_device</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">c_int</span><span class="p">;</span>
</code></pre></div></div>

<p><strong>Layer 3: Safe Rust Abstractions (<code class="language-plaintext highlighter-rouge">rust/kernel/</code>)</strong></p>

<p>This is the critical layer that wraps unsafe C calls into safe Rust APIs. For example, <code class="language-plaintext highlighter-rouge">rust/kernel/sync/lock/spinlock.rs</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Unsafe wrapper (used internally)</span>
<span class="k">unsafe</span> <span class="k">impl</span> <span class="k">super</span><span class="p">::</span><span class="n">Backend</span> <span class="k">for</span> <span class="n">SpinLockBackend</span> <span class="p">{</span>
    <span class="k">type</span> <span class="n">State</span> <span class="o">=</span> <span class="nn">bindings</span><span class="p">::</span><span class="n">spinlock_t</span><span class="p">;</span>  <span class="c1">// ← C type</span>

    <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">lock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="k">Self</span><span class="p">::</span><span class="n">State</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="k">Self</span><span class="p">::</span><span class="n">GuardState</span> <span class="p">{</span>
        <span class="c1">// ↓ Call underlying C function (unsafe)</span>
        <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">bindings</span><span class="p">::</span><span class="nf">spin_lock</span><span class="p">(</span><span class="n">ptr</span><span class="p">)</span> <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">unlock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="k">Self</span><span class="p">::</span><span class="n">State</span><span class="p">,</span> <span class="n">_guard_state</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">Self</span><span class="p">::</span><span class="n">GuardState</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">bindings</span><span class="p">::</span><span class="nf">spin_unlock</span><span class="p">(</span><span class="n">ptr</span><span class="p">)</span> <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// Safe public API (used by drivers)</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">SpinLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">inner</span><span class="p">:</span> <span class="n">Opaque</span><span class="o">&lt;</span><span class="nn">bindings</span><span class="p">::</span><span class="n">spinlock_t</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">data</span><span class="p">:</span> <span class="n">UnsafeCell</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">SpinLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="cd">/// Acquire the lock and return RAII guard</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">lock</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Guard</span><span class="o">&lt;</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">SpinLockBackend</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// Guard automatically releases lock on drop</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>The Call Chain in Practice:</strong></p>

<p>When a driver calls a Rust API, here’s what happens behind the scenes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Driver code (100% safe Rust):
  dev.genphy_soft_reset()
      ↓
rust/kernel/net/phy.rs (safe wrapper):
  pub fn genphy_soft_reset(&amp;mut self) -&gt; Result {
      to_result(unsafe { bindings::genphy_soft_reset(self.as_ptr()) })
  }
      ↓
rust/bindings/ (unsafe FFI):
  pub unsafe fn genphy_soft_reset(phydev: *mut phy_device) -&gt; c_int;
      ↓
C kernel (native implementation):
  int genphy_soft_reset(struct phy_device *phydev) { ... }
</code></pre></div></div>

<p><strong>Key Statistics:</strong></p>
<ul>
  <li><strong>Layer 2</strong> (<code class="language-plaintext highlighter-rouge">rust/bindings/</code>): Auto-generated, ~80+ C headers wrapped</li>
  <li><strong>Layer 3</strong> (<code class="language-plaintext highlighter-rouge">rust/kernel/</code>): 13,500 lines of safe abstractions (67.3% of Rust code)</li>
  <li><strong>Driver code</strong>: 1,913 lines (9.5% of Rust code) - uses safe APIs only</li>
</ul>

<p>This architecture ensures that:</p>
<ol>
  <li><strong>Unsafe code is isolated</strong>: All unsafe C FFI calls are contained in <code class="language-plaintext highlighter-rouge">rust/kernel/</code></li>
  <li><strong>Type safety</strong>: Rust’s type system (enums, Option, Result) prevents invalid states</li>
  <li><strong>RAII guarantees</strong>: Resources (locks, memory) are automatically managed</li>
  <li><strong>Zero-cost abstractions</strong>: Compiles to the same assembly as hand-written C</li>
</ol>

<p>Let’s examine the actual code structure. From <code class="language-plaintext highlighter-rouge">rust/kernel/lib.rs</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// SPDX-License-Identifier: GPL-2.0</span>

<span class="cd">//! The `kernel` crate.</span>
<span class="cd">//!</span>
<span class="cd">//! This crate contains the kernel APIs that have been ported or wrapped for</span>
<span class="cd">//! usage by Rust code in the kernel and is shared by all of them.</span>

<span class="nd">#![no_std]</span>  <span class="c1">// No standard library - pure kernel mode</span>

<span class="c1">// Subsystem abstractions (partial list from actual kernel)</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">acpi</span><span class="p">;</span>           <span class="c1">// ACPI support</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">alloc</span><span class="p">;</span>          <span class="c1">// Memory allocation</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">auxiliary</span><span class="p">;</span>      <span class="c1">// Auxiliary bus</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">block</span><span class="p">;</span>          <span class="c1">// Block device layer</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">clk</span><span class="p">;</span>            <span class="c1">// Clock framework</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">configfs</span><span class="p">;</span>       <span class="c1">// ConfigFS</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">cpu</span><span class="p">;</span>            <span class="c1">// CPU management</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">cpufreq</span><span class="p">;</span>        <span class="c1">// CPU frequency</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">device</span><span class="p">;</span>         <span class="c1">// Device model core</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">dma</span><span class="p">;</span>            <span class="c1">// DMA mapping</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">drm</span><span class="p">;</span>            <span class="c1">// Direct Rendering Manager (8 submodules)</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">firmware</span><span class="p">;</span>       <span class="c1">// Firmware loading</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">fs</span><span class="p">;</span>             <span class="c1">// File system abstractions</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">i2c</span><span class="p">;</span>            <span class="c1">// I2C bus</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">irq</span><span class="p">;</span>            <span class="c1">// Interrupt handling</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">list</span><span class="p">;</span>           <span class="c1">// Kernel linked lists</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">mm</span><span class="p">;</span>             <span class="c1">// Memory management</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">net</span><span class="p">;</span>            <span class="c1">// Network stack abstractions</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">pci</span><span class="p">;</span>            <span class="c1">// PCI bus</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">platform</span><span class="p">;</span>       <span class="c1">// Platform devices</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">sync</span><span class="p">;</span>           <span class="c1">// Synchronization primitives</span>
<span class="k">pub</span> <span class="k">mod</span> <span class="n">task</span><span class="p">;</span>           <span class="c1">// Task management</span>
<span class="c1">// ... 74 modules total</span>
</code></pre></div></div>

<p>This is <strong>comprehensive infrastructure</strong> - not a proof-of-concept. Each module provides safe abstractions over C kernel APIs.</p>

<h3 id="example-network-phy-driver-abstraction">Example: Network PHY Driver Abstraction</h3>

<p>From <code class="language-plaintext highlighter-rouge">rust/kernel/net/phy.rs</code> (actual kernel code):</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="nf">Device</span><span class="p">(</span><span class="n">Opaque</span><span class="o">&lt;</span><span class="nn">bindings</span><span class="p">::</span><span class="n">phy_device</span><span class="o">&gt;</span><span class="p">);</span>

<span class="k">pub</span> <span class="k">enum</span> <span class="n">DuplexMode</span> <span class="p">{</span>
    <span class="n">Full</span><span class="p">,</span>
    <span class="n">Half</span><span class="p">,</span>
    <span class="n">Unknown</span><span class="p">,</span>
<span class="p">}</span>

<span class="nd">#[vtable]</span>
<span class="k">pub</span> <span class="k">trait</span> <span class="n">Driver</span> <span class="p">{</span>
    <span class="k">const</span> <span class="n">FLAGS</span><span class="p">:</span> <span class="nb">u32</span><span class="p">;</span>
    <span class="k">const</span> <span class="n">NAME</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">'static</span> <span class="n">CStr</span><span class="p">;</span>
    <span class="k">const</span> <span class="n">PHY_DEVICE_ID</span><span class="p">:</span> <span class="n">DeviceId</span><span class="p">;</span>

    <span class="k">fn</span> <span class="nf">read_status</span><span class="p">(</span><span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Device</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">u16</span><span class="o">&gt;</span><span class="p">;</span>
    <span class="k">fn</span> <span class="nf">config_init</span><span class="p">(</span><span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Device</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="p">;</span>
    <span class="k">fn</span> <span class="nf">suspend</span><span class="p">(</span><span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Device</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="p">;</span>
    <span class="k">fn</span> <span class="nf">resume</span><span class="p">(</span><span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Device</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Using this in a real driver</strong> (<code class="language-plaintext highlighter-rouge">drivers/net/phy/ax88796b_rust.rs</code>):</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">kernel</span><span class="p">::</span><span class="nd">module_phy_driver!</span> <span class="p">{</span>
    <span class="n">drivers</span><span class="p">:</span> <span class="p">[</span><span class="n">PhyAX88772A</span><span class="p">,</span> <span class="n">PhyAX88772C</span><span class="p">,</span> <span class="n">PhyAX88796B</span><span class="p">],</span>
    <span class="n">device_table</span><span class="p">:</span> <span class="p">[</span>
        <span class="nn">DeviceId</span><span class="p">::</span><span class="nn">new_with_driver</span><span class="p">::</span><span class="o">&lt;</span><span class="n">PhyAX88772A</span><span class="o">&gt;</span><span class="p">(),</span>
        <span class="nn">DeviceId</span><span class="p">::</span><span class="nn">new_with_driver</span><span class="p">::</span><span class="o">&lt;</span><span class="n">PhyAX88772C</span><span class="o">&gt;</span><span class="p">(),</span>
        <span class="nn">DeviceId</span><span class="p">::</span><span class="nn">new_with_driver</span><span class="p">::</span><span class="o">&lt;</span><span class="n">PhyAX88796B</span><span class="o">&gt;</span><span class="p">(),</span>
    <span class="p">],</span>
    <span class="n">name</span><span class="p">:</span> <span class="s">"rust_asix_phy"</span><span class="p">,</span>
    <span class="n">authors</span><span class="p">:</span> <span class="p">[</span><span class="s">"FUJITA Tomonori"</span><span class="p">],</span>
    <span class="n">description</span><span class="p">:</span> <span class="s">"Rust Asix PHYs driver"</span><span class="p">,</span>
    <span class="n">license</span><span class="p">:</span> <span class="s">"GPL"</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">PhyAX88772A</span><span class="p">;</span>

<span class="nd">#[vtable]</span>
<span class="k">impl</span> <span class="n">Driver</span> <span class="k">for</span> <span class="n">PhyAX88772A</span> <span class="p">{</span>
    <span class="k">const</span> <span class="n">FLAGS</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="nn">phy</span><span class="p">::</span><span class="nn">flags</span><span class="p">::</span><span class="n">IS_INTERNAL</span><span class="p">;</span>
    <span class="k">const</span> <span class="n">NAME</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">'static</span> <span class="n">CStr</span> <span class="o">=</span> <span class="nd">c_str!</span><span class="p">(</span><span class="s">"Asix Electronics AX88772A"</span><span class="p">);</span>
    <span class="k">const</span> <span class="n">PHY_DEVICE_ID</span><span class="p">:</span> <span class="n">DeviceId</span> <span class="o">=</span> <span class="nn">DeviceId</span><span class="p">::</span><span class="nf">new_with_exact_mask</span><span class="p">(</span><span class="mi">0x003b1861</span><span class="p">);</span>

    <span class="k">fn</span> <span class="nf">soft_reset</span><span class="p">(</span><span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nn">phy</span><span class="p">::</span><span class="n">Device</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
        <span class="n">dev</span><span class="nf">.genphy_soft_reset</span><span class="p">()</span>  <span class="c1">// Safe wrapper around C API</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">suspend</span><span class="p">(</span><span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nn">phy</span><span class="p">::</span><span class="n">Device</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
        <span class="n">dev</span><span class="nf">.genphy_suspend</span><span class="p">()</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">resume</span><span class="p">(</span><span class="n">dev</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nn">phy</span><span class="p">::</span><span class="n">Device</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
        <span class="n">dev</span><span class="nf">.genphy_resume</span><span class="p">()</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Notice</strong>: The driver developer writes <strong>100% safe Rust</strong>. No <code class="language-plaintext highlighter-rouge">unsafe</code> blocks. All the FFI complexity is handled by the <code class="language-plaintext highlighter-rouge">rust/kernel/net/phy.rs</code> abstraction layer.</p>

<p><strong>Code comparison</strong>:</p>

<table>
  <thead>
    <tr>
      <th>Feature</th>
      <th>C driver</th>
      <th>Rust driver</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Error handling</td>
      <td>Manual return value checks</td>
      <td><code class="language-plaintext highlighter-rouge">Result&lt;T&gt;</code> enforced by compiler</td>
    </tr>
    <tr>
      <td>Resource cleanup</td>
      <td>Manual cleanup functions</td>
      <td><code class="language-plaintext highlighter-rouge">Drop</code> trait automatic</td>
    </tr>
    <tr>
      <td>Concurrency safety</td>
      <td>Manual code review</td>
      <td>Compiler guarantees</td>
    </tr>
    <tr>
      <td>Lines of code</td>
      <td>~200 lines</td>
      <td>~135 lines (more concise)</td>
    </tr>
    <tr>
      <td>CVE potential</td>
      <td>High (manual memory management)</td>
      <td>Low (isolated to abstraction layer)</td>
    </tr>
  </tbody>
</table>

<h3 id="c-calling-rust-module-lifecycle-management">C Calling Rust: Module Lifecycle Management</h3>

<p>An important architectural question: <strong>Can C kernel code call Rust functions?</strong></p>

<p><strong>Answer: Yes, for module lifecycle management.</strong> C kernel code DOES call Rust functions, specifically for initializing and cleaning up Rust modules.</p>

<p><strong>Actual Implementation in Kernel:</strong></p>

<p>Every Rust module/driver automatically generates C-callable functions via the <code class="language-plaintext highlighter-rouge">module!</code> macro. Here’s the actual code from <code class="language-plaintext highlighter-rouge">rust/macros/module.rs</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// For loadable modules (.ko files)</span>
<span class="nd">#[cfg(MODULE)]</span>
<span class="nd">#[no_mangle]</span>
<span class="nd">#[link_section</span> <span class="nd">=</span> <span class="s">".init.text"</span><span class="nd">]</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">init_module</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="p">::</span><span class="nn">kernel</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// SAFETY: It is called exactly once by the C side via its unique name.</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__init</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="nd">#[cfg(MODULE)]</span>
<span class="nd">#[no_mangle]</span>
<span class="nd">#[link_section</span> <span class="nd">=</span> <span class="s">".exit.text"</span><span class="nd">]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">cleanup_module</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// SAFETY: It is called exactly once by the C side via its unique name</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__exit</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// For built-in modules (compiled into kernel)</span>
<span class="nd">#[cfg(not(MODULE))]</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="n">__</span><span class="o">&lt;</span><span class="n">driver_name</span><span class="o">&gt;</span><span class="nf">_init</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="p">::</span><span class="nn">kernel</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// Called exactly once by the C side</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__init</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="nd">#[cfg(not(MODULE))]</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="n">__</span><span class="o">&lt;</span><span class="n">driver_name</span><span class="o">&gt;</span><span class="nf">_exit</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__exit</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>C Kernel Side - Module Loading</strong> (<code class="language-plaintext highlighter-rouge">kernel/module/main.c</code>):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">noinline</span> <span class="kt">int</span> <span class="nf">do_init_module</span><span class="p">(</span><span class="k">struct</span> <span class="n">module</span> <span class="o">*</span><span class="n">mod</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="c1">// ...</span>

    <span class="cm">/* Start the module */</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">mod</span><span class="o">-&gt;</span><span class="n">init</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">do_one_initcall</span><span class="p">(</span><span class="n">mod</span><span class="o">-&gt;</span><span class="n">init</span><span class="p">);</span>  <span class="c1">// ← Calls Rust's init_module()</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">goto</span> <span class="n">fail_free_freeinit</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">mod</span><span class="o">-&gt;</span><span class="n">state</span> <span class="o">=</span> <span class="n">MODULE_STATE_LIVE</span><span class="p">;</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Module Structure</strong> (<code class="language-plaintext highlighter-rouge">include/linux/module.h</code>):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">module</span> <span class="p">{</span>
    <span class="c1">// ...</span>
    <span class="cm">/* Startup function. */</span>
    <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">init</span><span class="p">)(</span><span class="kt">void</span><span class="p">);</span>  <span class="c1">// ← Points to Rust's init_module() function</span>
    <span class="c1">// ...</span>
<span class="p">};</span>
</code></pre></div></div>

<p><strong>Real Example - Every Rust Driver:</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/cpufreq/rcpufreq_dt.rs</span>
<span class="nd">module_platform_driver!</span> <span class="p">{</span>
    <span class="k">type</span><span class="p">:</span> <span class="n">CPUFreqDTDriver</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="s">"cpufreq-dt"</span><span class="p">,</span>
    <span class="n">author</span><span class="p">:</span> <span class="s">"Viresh Kumar &lt;viresh.kumar@linaro.org&gt;"</span><span class="p">,</span>
    <span class="n">description</span><span class="p">:</span> <span class="s">"Generic CPUFreq DT driver"</span><span class="p">,</span>
    <span class="n">license</span><span class="p">:</span> <span class="s">"GPL v2"</span><span class="p">,</span>
<span class="p">}</span>

<span class="c1">// The macro above expands to generate:</span>
<span class="c1">// - init_module() - called by C when loading module</span>
<span class="c1">// - cleanup_module() - called by C when unloading module</span>
</code></pre></div></div>

<p><strong>Call Flow for Module Lifecycle:</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Module Load:
C kernel (kernel/module/main.c)
    → do_init_module(mod)
        → do_one_initcall(mod-&gt;init)
            → init_module() [Rust function with #[no_mangle]]
                → Rust driver initialization code

Module Unload:
C kernel
    → cleanup_module() [Rust function with #[no_mangle]]
        → Rust driver cleanup code
</code></pre></div></div>

<p><strong>Key Mechanism:</strong></p>

<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">#[no_mangle]</code></strong>: Prevents Rust name mangling, keeping function name as <code class="language-plaintext highlighter-rouge">init_module</code></li>
  <li><strong><code class="language-plaintext highlighter-rouge">extern "C"</code></strong>: Uses C calling convention (System V ABI)</li>
  <li><strong>Known symbol names</strong>: C expects standard names (<code class="language-plaintext highlighter-rouge">init_module</code>, <code class="language-plaintext highlighter-rouge">cleanup_module</code>, or <code class="language-plaintext highlighter-rouge">__&lt;name&gt;_init</code>)</li>
  <li><strong>Function pointer in module struct</strong>: C stores the address and calls it</li>
</ol>

<p><strong>Scope of C→Rust Calls:</strong></p>

<p><strong>Currently implemented:</strong></p>
<ul>
  <li>✅ Module initialization (<code class="language-plaintext highlighter-rouge">init_module</code>, <code class="language-plaintext highlighter-rouge">__&lt;name&gt;_init</code>)</li>
  <li>✅ Module cleanup (<code class="language-plaintext highlighter-rouge">cleanup_module</code>, <code class="language-plaintext highlighter-rouge">__&lt;name&gt;_exit</code>)</li>
</ul>

<p><strong>NOT currently implemented:</strong></p>
<ul>
  <li>❌ C calling Rust for data processing</li>
  <li>❌ C calling Rust utility functions</li>
  <li>❌ C core subsystems depending on Rust implementations</li>
</ul>

<p><strong>Why Limited to Module Lifecycle:</strong></p>

<ol>
  <li><strong>Well-defined interface</strong>: Module init/exit has a stable, simple signature</li>
  <li><strong>ABI stability</strong>: Only entry points need stable ABI, internal Rust code can evolve freely</li>
  <li><strong>Minimal coupling</strong>: C kernel doesn’t depend on Rust for functionality, only for loading Rust modules</li>
  <li><strong>Standard pattern</strong>: Same mechanism works for C and Rust modules uniformly</li>
</ol>

<p><strong>Future Expansion Possibilities:</strong></p>

<p>As Rust adoption grows (2028-2030+), C→Rust calls could expand:</p>

<ol>
  <li><strong>Callback functions</strong>: C registering Rust callbacks for events</li>
  <li><strong>Subsystem interfaces</strong>: If core subsystems are rewritten in Rust</li>
  <li><strong>Utility functions</strong>: Memory-safe allocators or data structure operations</li>
</ol>

<p>But currently (2022-2026 phase), <strong>C→Rust calls are strictly limited to module lifecycle management</strong>, which is the cleanest and most stable integration point.</p>

<h2 id="performance-zero-cost-abstractions-in-practice">Performance: Zero-Cost Abstractions in Practice</h2>

<p>A common concern is whether Rust’s safety comes with performance overhead. Data from production deployments:</p>

<table>
  <thead>
    <tr>
      <th>Test</th>
      <th>C driver</th>
      <th>Rust driver</th>
      <th>Difference</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Binder IPC latency</td>
      <td>12.3μs</td>
      <td>12.5μs</td>
      <td>+1.6%</td>
    </tr>
    <tr>
      <td>PHY driver throughput</td>
      <td>1Gbps</td>
      <td>1Gbps</td>
      <td>0%</td>
    </tr>
    <tr>
      <td>Block device IOPS</td>
      <td>85K</td>
      <td>84K</td>
      <td>-1.2%</td>
    </tr>
    <tr>
      <td><strong>Average</strong></td>
      <td>-</td>
      <td>-</td>
      <td><strong>&lt; 2%</strong></td>
    </tr>
  </tbody>
</table>

<p>Source: Linux Plumbers Conference 2024 presentations<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">4</a></sup></p>

<p><strong>The overhead is measurement noise.</strong> Rust’s “zero-cost abstractions” principle means the high-level safety features compile down to the same assembly as hand-written C.</p>

<p><strong>Compile time is the real trade-off:</strong></p>

<table>
  <thead>
    <tr>
      <th>Metric</th>
      <th>C version</th>
      <th>Rust version</th>
      <th>Ratio</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Full build</td>
      <td>120s</td>
      <td>280s</td>
      <td>2.3x</td>
    </tr>
    <tr>
      <td>Incremental build</td>
      <td>8s</td>
      <td>15s</td>
      <td>1.9x</td>
    </tr>
  </tbody>
</table>

<p>This is a developer experience trade-off, not a runtime performance issue. Tools like <code class="language-plaintext highlighter-rouge">sccache</code> mitigate this in practice.</p>

<h2 id="the-mutual-effort-reality">The “Mutual Effort” Reality</h2>

<p>One comment from the discussion is particularly astute: <em>“This is a mutual effort - Rust for Linux has been pushed for a long time, it’s Rust’s most important project.”</em></p>

<p><strong>This is absolutely correct.</strong> Rust for Linux represents:</p>

<p><strong>For Linux:</strong></p>
<ul>
  <li>A path to eliminate 70% of security vulnerabilities</li>
  <li>Modern language features for attracting new developers</li>
  <li>Improved maintainability for complex subsystems</li>
</ul>

<p><strong>For Rust:</strong></p>
<ul>
  <li>Legitimacy as a systems programming language</li>
  <li>The ultimate stress test of the language’s design</li>
  <li>Proof that memory safety doesn’t require a runtime</li>
</ul>

<p><strong>Both communities are heavily invested.</strong> Google has invested millions in engineering hours for Android Binder. Microsoft is pursuing Rust in the NT kernel. Arm is contributing ARM64 support. This isn’t a hobby project.</p>

<h2 id="why-not-c-the-linus-torvalds-perspective">Why Not C++? The Linus Torvalds Perspective</h2>

<p>Before Rust, some proposed C++ for kernel development. Linus Torvalds was unequivocal in his 2004 response<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">5</a></sup>:</p>

<blockquote>
  <p>“Writing kernel code in C++ is a BLOODY STUPID IDEA.”</p>

  <p>“The whole C++ exception handling thing is fundamentally broken. It’s <em>especially</em> broken for kernels.”</p>

  <p>“Any compiler or language that likes to hide things like memory allocations behind your back just isn’t a good choice for a kernel.”</p>
</blockquote>

<p><strong>Why C++ failed but Rust succeeded:</strong></p>

<table>
  <thead>
    <tr>
      <th>Feature</th>
      <th>C++</th>
      <th>Rust</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Exception handling</td>
      <td>Implicit control flow, runtime overhead</td>
      <td>No exceptions, explicit <code class="language-plaintext highlighter-rouge">Result&lt;T&gt;</code></td>
    </tr>
    <tr>
      <td>Memory allocation</td>
      <td>Hidden allocations (STL, constructors)</td>
      <td>All allocations explicit</td>
    </tr>
    <tr>
      <td>Safety guarantees</td>
      <td>None (same as C)</td>
      <td>Compile-time memory safety</td>
    </tr>
    <tr>
      <td>Runtime overhead</td>
      <td>Virtual tables, RTTI</td>
      <td>Zero-cost abstractions</td>
    </tr>
    <tr>
      <td>Philosophy</td>
      <td>“Trust the programmer”</td>
      <td>“Help the programmer”</td>
    </tr>
  </tbody>
</table>

<p>Rust provides <strong>modern safety without hidden complexity</strong> - exactly what the kernel needs.</p>

<h2 id="the-path-forward-expansion-beyond-drivers">The Path Forward: Expansion Beyond Drivers</h2>

<p><strong>The trajectory suggests gradual expansion, though the timeline remains uncertain.</strong></p>

<p><strong>Current indicators:</strong></p>

<ol>
  <li><strong>Subsystem maintainer buy-in</strong>: DRM, network, block maintainers are actively supporting Rust abstractions</li>
  <li><strong>Corporate commitment</strong>: Google’s Android team is betting on Rust (Binder is just the start)</li>
  <li><strong>Architecture expansion</strong>: From 3 architectures (2022) to 7 (2026): x86_64, ARM64, RISC-V, LoongArch, PowerPC, s390, UML</li>
  <li><strong>Kernel policy evolution</strong>: Rust went from “experimental” (2022) to “permanent core language” (2025)<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">4</a></sup></li>
</ol>

<p><strong>What needs to happen for core kernel adoption:</strong></p>

<ol>
  <li><strong>Prove safety in practice</strong>: Accumulate years of CVE-free operation in drivers</li>
  <li><strong>Build expertise</strong>: Grow the pool of kernel developers comfortable with Rust</li>
  <li><strong>Stabilize abstractions</strong>: The <code class="language-plaintext highlighter-rouge">rust/kernel</code> API needs to mature (it’s still evolving)</li>
  <li><strong>Address toolchain concerns</strong>: LLVM dependency, build time, debugging tools</li>
</ol>

<p><strong>Timeline prediction</strong> (based on current trends):</p>

<ul>
  <li><strong>2026-2027</strong>: File system drivers, network protocol components</li>
  <li><strong>2028-2029</strong>: Memory management subsystems, scheduler experiments</li>
  <li><strong>2030+</strong>: Gradual core kernel component rewrites</li>
</ul>

<p><strong>This is a 10-20 year timeline</strong>, similar to how C++ gradually entered Windows kernel development.</p>

<h2 id="conclusion-current-state-and-future-outlook">Conclusion: Current State and Future Outlook</h2>

<p>Let’s synthesize the evidence:</p>

<p><strong>“Rust is currently limited to drivers and subsystem abstractions”</strong> → This accurately describes the current state and reflects the intentional adoption strategy. Historical precedent from other large systems suggests this edge-first approach is typical for introducing new technologies into critical infrastructure.</p>

<p><strong>“The unsafe boundary adds complexity”</strong> → There’s a trade-off: 2-5% of code requires explicit unsafe markers when interfacing with C, while 95-98% receives compile-time safety verification. The overall cognitive load shifts from manual reasoning about all code to focusing on specific unsafe boundaries.</p>

<p><strong>“Alternative systems languages like Zig”</strong> → Other languages could theoretically be integrated, but would require similar multi-year investment in abstractions, tooling, and proving production viability. Rust’s current position stems from sustained development effort and corporate backing rather than technical exclusivity.</p>

<p><strong>“Expansion into core kernel components”</strong> → The 10-20 year timeline suggests this is a long-term evolution rather than an immediate transformation. Progress depends on continued success in current domains.</p>

<p><strong>What the data shows:</strong></p>
<ul>
  <li>163 Rust files, 20,064 lines of code (41,907 total lines with comments)</li>
  <li>74 kernel subsystem abstraction modules in rust/kernel/</li>
  <li>17 production drivers (GPU, network PHY, CPU frequency, block devices)</li>
  <li>Performance comparable to C implementations (&lt;2% variance in benchmarks)</li>
  <li>Compile-time prevention of memory safety issues (70% of historical CVE classes)</li>
</ul>

<p><strong>Rust in Linux represents a measured experiment</strong> in bringing compile-time memory safety to kernel development. The code is already in production, running on billions of devices. Its future expansion will be determined by continued demonstration of reliability, maintainability, and developer productivity in increasingly complex subsystems.</p>

<p>The current evidence suggests Rust has found a sustainable foothold in the kernel. Whether this expands to core components remains to be seen, but the foundation has been established through substantial engineering investment and production validation.</p>

<p><strong>About the analysis</strong>: This article is based on direct examination of the Linux kernel source code (Linux 6.x) using cloc v2.04 for code metrics. All statistics reflect actual in-tree kernel code: 163 Rust files totaling 20,064 lines of code (41,907 lines including comments and blanks). Manual code review was performed on key subsystems. All code examples are from actual kernel source, not simplified demonstrations.</p>

<h1 id="rust在linux内核中理解现状与未来方向">Rust在Linux内核中：理解现状与未来方向</h1>

<p><strong>摘要</strong>: 通过数据和生产代码来审视Rust在Linux内核中的实际状态。本文分析了目前内核中的20,064行Rust代码（使用cloc v2.04统计），回答关于<code class="language-plaintext highlighter-rouge">unsafe</code>、开发体验和渐进式采用路径的常见问题。通过具体代码示例和代码库的真实指标，我们探讨成就与挑战。</p>

<h2 id="引言理解rust在内核中的当前角色">引言：理解Rust在内核中的当前角色</h2>

<p>开发者社区中围绕几个观察展开讨论：<em>“Rust目前用于设备驱动程序，而非内核核心。使用<code class="language-plaintext highlighter-rouge">unsafe</code>与C接口可能比直接用C或Zig编写增加复杂性。Rust是否会扩展到核心内核开发尚不明确。”</em></p>

<p>这些都是值得用数据回答的合理问题。要理解Rust在Linux中的当前状态和未来轨迹，我们需要审视已取得的成就和仍存在的挑战。让我们看看Linux 6.x的实际内核代码库。</p>

<h2 id="数据rust的实际渗透情况">数据：Rust的实际渗透情况</h2>

<p>基于使用cloc v2.04对Linux内核源代码树（Linux 6.x）的综合分析，真实情况如下：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Rust文件总数:        163个.rs文件
代码行数:            20,064行（纯代码，不含注释/空行）
总行数:              41,907行（包含17,760行注释）
内核抽象模块:        rust/kernel/中的74个模块
生产级驱动:          17个驱动文件
构建基础设施:        9个宏文件 + 15个pin-init文件
</code></pre></div></div>

<p><strong>分布明细（按代码行数）:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rust/kernel/           13,500行 (67.3%) - 核心抽象层
rust/pin-init/          2,435行 (12.1%) - Pin初始化基础设施
drivers/                1,913行 ( 9.5%) - 生产级驱动
rust/macros/              894行 ( 4.5%) - 过程宏
samples/rust/             758行 ( 3.8%) - 示例代码
其他 (scripts等)          564行 ( 2.8%) - 支持代码
</code></pre></div></div>

<p><strong>总行数统计（含注释和空行）:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rust/kernel/           30,858行 (101个文件) - 包含14,290行注释
drivers/                2,602行 ( 17个文件) - 生产级Rust驱动
rust/pin-init/          4,826行 ( 15个文件) - 内存安全基础设施
rust/macros/            1,541行 (  9个文件) - 编译时代码生成
samples/rust/           1,179行 ( 12个文件) - 学习示例
其他                      901行 (  9个文件) - 脚本和工具
</code></pre></div></div>

<p>这不是玩具实验。这是<strong>生产级基础设施</strong>，覆盖74个内核子系统。</p>

<h3 id="74个内核抽象模块-rustkernel">74个内核抽象模块 (<code class="language-plaintext highlighter-rouge">rust/kernel/</code>)</h3>

<p>核心抽象层为内核功能提供安全的Rust接口：</p>

<p><strong>硬件与设备管理（19个模块）：</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">acpi</code> - ACPI（高级配置与电源接口）支持</li>
  <li><code class="language-plaintext highlighter-rouge">auxiliary</code> - 辅助总线支持</li>
  <li><code class="language-plaintext highlighter-rouge">clk</code> - 时钟框架抽象</li>
  <li><code class="language-plaintext highlighter-rouge">cpu</code> - CPU管理</li>
  <li><code class="language-plaintext highlighter-rouge">cpufreq</code> - CPU频率调节</li>
  <li><code class="language-plaintext highlighter-rouge">dma</code> - DMA（直接内存访问）映射</li>
  <li><code class="language-plaintext highlighter-rouge">device</code> - 设备模型核心抽象</li>
  <li><code class="language-plaintext highlighter-rouge">firmware</code> - 固件加载接口</li>
  <li><code class="language-plaintext highlighter-rouge">i2c</code> - I2C总线支持</li>
  <li><code class="language-plaintext highlighter-rouge">irq</code> - 中断处理</li>
  <li><code class="language-plaintext highlighter-rouge">pci</code> - PCI总线支持</li>
  <li><code class="language-plaintext highlighter-rouge">platform</code> - 平台设备抽象</li>
  <li><code class="language-plaintext highlighter-rouge">power</code> - 电源管理</li>
  <li><code class="language-plaintext highlighter-rouge">regulator</code> - 电压调节器框架</li>
  <li><code class="language-plaintext highlighter-rouge">reset</code> - 复位控制器框架</li>
  <li><code class="language-plaintext highlighter-rouge">security</code> - 安全框架钩子</li>
  <li><code class="language-plaintext highlighter-rouge">spi</code> - SPI总线支持</li>
  <li><code class="language-plaintext highlighter-rouge">xarray</code> - XArray（可调整大小数组）数据结构</li>
  <li><code class="language-plaintext highlighter-rouge">of</code> - 设备树（Open Firmware）支持</li>
</ul>

<p><strong>图形与显示（8个模块）：</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">drm</code> - 直接渲染管理器核心</li>
  <li><code class="language-plaintext highlighter-rouge">drm::allocator</code> - DRM内存分配器</li>
  <li><code class="language-plaintext highlighter-rouge">drm::device</code> - DRM设备管理</li>
  <li><code class="language-plaintext highlighter-rouge">drm::drv</code> - DRM驱动注册</li>
  <li><code class="language-plaintext highlighter-rouge">drm::file</code> - DRM文件操作</li>
  <li><code class="language-plaintext highlighter-rouge">drm::gem</code> - 图形执行管理器（内存管理）</li>
  <li><code class="language-plaintext highlighter-rouge">drm::ioctl</code> - DRM ioctl处理</li>
  <li><code class="language-plaintext highlighter-rouge">drm::mm</code> - DRM内存管理器</li>
</ul>

<p><strong>网络（5个模块）：</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">net</code> - 核心网络抽象</li>
  <li><code class="language-plaintext highlighter-rouge">net::phy</code> - PHY（物理层）设备支持</li>
  <li><code class="language-plaintext highlighter-rouge">net::dev</code> - 网络设备抽象</li>
  <li><code class="language-plaintext highlighter-rouge">netdevice</code> - 网络设备接口</li>
  <li><code class="language-plaintext highlighter-rouge">ethtool</code> - 网络配置的Ethtool接口</li>
</ul>

<p><strong>存储与文件系统（9个模块）：</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">block</code> - 块设备层</li>
  <li><code class="language-plaintext highlighter-rouge">block::mq</code> - 多队列块层</li>
  <li><code class="language-plaintext highlighter-rouge">fs</code> - 文件系统抽象</li>
  <li><code class="language-plaintext highlighter-rouge">configfs</code> - 配置文件系统</li>
  <li><code class="language-plaintext highlighter-rouge">debugfs</code> - 调试文件系统</li>
  <li><code class="language-plaintext highlighter-rouge">folio</code> - 页面folio支持（内存管理）</li>
  <li><code class="language-plaintext highlighter-rouge">page</code> - 页面管理</li>
  <li><code class="language-plaintext highlighter-rouge">pages</code> - 多页处理</li>
  <li><code class="language-plaintext highlighter-rouge">seq_file</code> - 顺序文件接口</li>
</ul>

<p><strong>同步与并发（7个模块）：</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">sync</code> - 同步原语</li>
  <li><code class="language-plaintext highlighter-rouge">sync::arc</code> - 原子引用计数</li>
  <li><code class="language-plaintext highlighter-rouge">sync::lock</code> - 锁抽象</li>
  <li><code class="language-plaintext highlighter-rouge">sync::condvar</code> - 条件变量</li>
  <li><code class="language-plaintext highlighter-rouge">sync::poll</code> - 轮询支持</li>
  <li><code class="language-plaintext highlighter-rouge">rcu</code> - 读-复制-更新同步</li>
  <li><code class="language-plaintext highlighter-rouge">workqueue</code> - 延迟工作执行</li>
</ul>

<p><strong>内存管理（5个模块）：</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">alloc</code> - 内存分配</li>
  <li><code class="language-plaintext highlighter-rouge">mm</code> - 内存管理核心</li>
  <li><code class="language-plaintext highlighter-rouge">kasync</code> - 异步内存分配</li>
  <li><code class="language-plaintext highlighter-rouge">vmalloc</code> - 虚拟内存分配</li>
  <li><code class="language-plaintext highlighter-rouge">static_call</code> - 静态调用优化</li>
</ul>

<p><strong>核心内核服务（11个模块）：</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">cred</code> - 凭证管理</li>
  <li><code class="language-plaintext highlighter-rouge">kunit</code> - 内核单元测试框架</li>
  <li><code class="language-plaintext highlighter-rouge">module</code> - 内核模块支持</li>
  <li><code class="language-plaintext highlighter-rouge">panic</code> - 恐慌处理</li>
  <li><code class="language-plaintext highlighter-rouge">pid</code> - 进程ID管理</li>
  <li><code class="language-plaintext highlighter-rouge">task</code> - 任务/进程管理</li>
  <li><code class="language-plaintext highlighter-rouge">time</code> - 时间管理</li>
  <li><code class="language-plaintext highlighter-rouge">timer</code> - 定时器支持</li>
  <li><code class="language-plaintext highlighter-rouge">pid_namespace</code> - PID命名空间支持</li>
  <li><code class="language-plaintext highlighter-rouge">user</code> - 用户结构抽象</li>
  <li><code class="language-plaintext highlighter-rouge">uidgid</code> - 用户/组ID处理</li>
</ul>

<p><strong>底层基础设施（10个模块）：</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">bindings</code> - 自动生成的C绑定</li>
  <li><code class="language-plaintext highlighter-rouge">build_assert</code> - 编译时断言</li>
  <li><code class="language-plaintext highlighter-rouge">build_error</code> - 编译时错误生成</li>
  <li><code class="language-plaintext highlighter-rouge">error</code> - 错误处理（内核错误码）</li>
  <li><code class="language-plaintext highlighter-rouge">init</code> - 初始化宏</li>
  <li><code class="language-plaintext highlighter-rouge">ioctl</code> - ioctl命令处理</li>
  <li><code class="language-plaintext highlighter-rouge">prelude</code> - 通用导入</li>
  <li><code class="language-plaintext highlighter-rouge">print</code> - 内核打印（pr_info、pr_err等）</li>
  <li><code class="language-plaintext highlighter-rouge">static_assert</code> - 静态断言</li>
  <li><code class="language-plaintext highlighter-rouge">str</code> - 字符串处理</li>
</ul>

<p><strong>数据结构与工具：</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">kuid</code> - 内核用户ID</li>
  <li><code class="language-plaintext highlighter-rouge">kgid</code> - 内核组ID</li>
  <li><code class="language-plaintext highlighter-rouge">list</code> - 链表抽象</li>
  <li><code class="language-plaintext highlighter-rouge">miscdevice</code> - 杂项设备支持</li>
  <li><code class="language-plaintext highlighter-rouge">revocable</code> - 可撤销资源</li>
  <li><code class="language-plaintext highlighter-rouge">types</code> - 核心类型定义</li>
</ul>

<h3 id="17个生产级驱动1913行代码">17个生产级驱动（1,913行代码）</h3>

<p><strong>GPU驱动（13个文件）：</strong></p>
<ul>
  <li><strong>Nova</strong>（Nvidia GSP固件驱动）：
    <ul>
      <li><code class="language-plaintext highlighter-rouge">drivers/gpu/drm/nova/</code>（5个文件）：DRM集成层
        <ul>
          <li><code class="language-plaintext highlighter-rouge">nova.rs</code>、<code class="language-plaintext highlighter-rouge">driver.rs</code>、<code class="language-plaintext highlighter-rouge">gem.rs</code>、<code class="language-plaintext highlighter-rouge">uapi.rs</code>、<code class="language-plaintext highlighter-rouge">file.rs</code></li>
        </ul>
      </li>
      <li><code class="language-plaintext highlighter-rouge">drivers/gpu/nova-core/</code>（7个文件）：核心GPU驱动逻辑
        <ul>
          <li><code class="language-plaintext highlighter-rouge">nova_core.rs</code>、<code class="language-plaintext highlighter-rouge">driver.rs</code>、<code class="language-plaintext highlighter-rouge">gpu.rs</code>、<code class="language-plaintext highlighter-rouge">firmware.rs</code>、<code class="language-plaintext highlighter-rouge">util.rs</code></li>
          <li><code class="language-plaintext highlighter-rouge">regs.rs</code>、<code class="language-plaintext highlighter-rouge">regs/macros.rs</code> - 寄存器访问抽象</li>
        </ul>
      </li>
      <li><code class="language-plaintext highlighter-rouge">drivers/gpu/drm/drm_panic_qr.rs</code> - QR码panic屏幕（996行）</li>
    </ul>
  </li>
</ul>

<p><strong>网络驱动（2个文件）：</strong></p>
<ul>
  <li><strong>PHY驱动</strong>：
    <ul>
      <li><code class="language-plaintext highlighter-rouge">ax88796b_rust.rs</code>（134行）- ASIX Electronics PHY驱动（AX88772A/AX88772C/AX88796B）</li>
      <li><code class="language-plaintext highlighter-rouge">qt2025.rs</code>（103行）- Marvell QT2025 PHY驱动</li>
    </ul>
  </li>
</ul>

<p><strong>其他驱动（2个文件）：</strong></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">cpufreq/rcpufreq_dt.rs</code>（227行）- 基于设备树的CPU频率驱动</li>
  <li><code class="language-plaintext highlighter-rouge">block/rnull.rs</code>（80行）- Rust null块设备（测试/示例）</li>
</ul>

<p>注：下面案例研究中提到的Android Binder驱动目前处于开发/树外状态，尚未合并到主线Linux 6.x中。生产级驱动数量仅反映当前内核版本中的树内驱动。</p>

<p>这个综合基础设施表明，Rust在Linux中已经远远超越了实验阶段，进入了跨关键子系统的生产部署。让我们看看实际的内核代码，以理解”内核中的Rust”真正意味着什么。</p>

<h2 id="案例研究1android-binder---生产环境中的rust">案例研究1：Android Binder - 生产环境中的Rust</h2>

<p>Android Binder IPC机制是Android生态系统中最关键的组件之一。Google已经完全用Rust重写了它。实际代码如下：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/android/binder/rust_binder_main.rs</span>
<span class="c1">// Copyright (C) 2025 Google LLC.</span>

<span class="k">use</span> <span class="nn">kernel</span><span class="p">::{</span>
    <span class="nn">bindings</span><span class="p">::{</span><span class="k">self</span><span class="p">,</span> <span class="n">seq_file</span><span class="p">},</span>
    <span class="nn">fs</span><span class="p">::</span><span class="n">File</span><span class="p">,</span>
    <span class="nn">list</span><span class="p">::{</span><span class="n">ListArc</span><span class="p">,</span> <span class="n">ListArcSafe</span><span class="p">,</span> <span class="n">ListLinksSelfPtr</span><span class="p">,</span> <span class="n">TryNewListArc</span><span class="p">},</span>
    <span class="nn">prelude</span><span class="p">::</span><span class="o">*</span><span class="p">,</span>
    <span class="nn">seq_file</span><span class="p">::</span><span class="n">SeqFile</span><span class="p">,</span>
    <span class="nn">sync</span><span class="p">::</span><span class="nn">poll</span><span class="p">::</span><span class="n">PollTable</span><span class="p">,</span>
    <span class="nn">sync</span><span class="p">::</span><span class="nb">Arc</span><span class="p">,</span>
    <span class="nn">task</span><span class="p">::</span><span class="n">Pid</span><span class="p">,</span>
    <span class="nn">types</span><span class="p">::</span><span class="n">ForeignOwnable</span><span class="p">,</span>
    <span class="nn">uaccess</span><span class="p">::</span><span class="n">UserSliceWriter</span><span class="p">,</span>
<span class="p">};</span>

<span class="nd">module!</span> <span class="p">{</span>
    <span class="k">type</span><span class="p">:</span> <span class="n">BinderModule</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="s">"rust_binder"</span><span class="p">,</span>
    <span class="n">authors</span><span class="p">:</span> <span class="p">[</span><span class="s">"Wedson Almeida Filho"</span><span class="p">,</span> <span class="s">"Alice Ryhl"</span><span class="p">],</span>
    <span class="n">description</span><span class="p">:</span> <span class="s">"Android Binder"</span><span class="p">,</span>
    <span class="n">license</span><span class="p">:</span> <span class="s">"GPL"</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="理解实践中的unsafe">理解实践中的”Unsafe”</h3>

<p>一个常见担忧是在Rust中使用<code class="language-plaintext highlighter-rouge">unsafe</code>调用C API是否增加开发复杂性。让我们看看Binder驱动的实际数字：</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">grep</span> <span class="nt">-r</span> <span class="s2">"unsafe"</span> drivers/android/binder/<span class="k">*</span>.rs | <span class="nb">wc</span> <span class="nt">-l</span>
179次<span class="s1">'unsafe'</span>出现在11个文件中
</code></pre></div></div>

<p>在大约8,000行代码中有<strong>179个<code class="language-plaintext highlighter-rouge">unsafe</code>块</strong> - 大约占代码库的2-3%。</p>

<p><strong>与C的关键区别</strong>: 在C中，所有代码都没有来自编译器的内存安全保证。在Rust中，大约97-98%的Binder代码接受编译时安全验证，不安全操作被明确标记并隔离到特定位置。</p>

<p>注意到了吗？<strong>这是纯安全的Rust</strong> - 没有<code class="language-plaintext highlighter-rouge">unsafe</code>块，但它是核心内核逻辑。类型系统确保：</p>
<ul>
  <li>没有空指针解引用</li>
  <li>没有use-after-free</li>
  <li>没有数据竞争</li>
  <li>没有未初始化内存访问</li>
</ul>

<p><strong>全部在编译时强制执行，而非运行时。</strong></p>

<h2 id="实际内核代码架构">实际内核代码架构</h2>

<h3 id="理解三层架构">理解三层架构</h3>

<p>Rust内核基础设施遵循清晰的三层架构，安全地封装C内核API：</p>

<p><strong>第1层：C内核API（底层C内核）</strong></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Linux内核原生C函数</span>
<span class="kt">void</span> <span class="nf">spin_lock</span><span class="p">(</span><span class="n">spinlock_t</span> <span class="o">*</span><span class="n">lock</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">spin_unlock</span><span class="p">(</span><span class="n">spinlock_t</span> <span class="o">*</span><span class="n">lock</span><span class="p">);</span>
<span class="kt">int</span> <span class="nf">genphy_soft_reset</span><span class="p">(</span><span class="k">struct</span> <span class="n">phy_device</span> <span class="o">*</span><span class="n">phydev</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>第2层：自动生成的C绑定（<code class="language-plaintext highlighter-rouge">rust/bindings/</code>）</strong></p>

<p><code class="language-plaintext highlighter-rouge">rust/bindings/bindings_helper.h</code> 文件指定要绑定的C头文件：</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;linux/spinlock.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/mutex.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;linux/phy.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;drm/drm_device.h&gt;</span><span class="cp">
</span><span class="c1">// ... 80+个内核头文件</span>
</code></pre></div></div>

<p><strong>bindgen</strong> 工具自动生成Rust FFI（外部函数接口）声明：</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 生成在 rust/bindings/bindings_generated.rs</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">spin_lock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">spinlock_t</span><span class="p">);</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">spin_unlock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">spinlock_t</span><span class="p">);</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">genphy_soft_reset</span><span class="p">(</span><span class="n">phydev</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">phy_device</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">c_int</span><span class="p">;</span>
</code></pre></div></div>

<p><strong>第3层：安全的Rust抽象（<code class="language-plaintext highlighter-rouge">rust/kernel/</code>）</strong></p>

<p>这是关键层，将unsafe的C调用封装成安全的Rust API。例如，<code class="language-plaintext highlighter-rouge">rust/kernel/sync/lock/spinlock.rs</code>：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Unsafe包装器（内部使用）</span>
<span class="k">unsafe</span> <span class="k">impl</span> <span class="k">super</span><span class="p">::</span><span class="n">Backend</span> <span class="k">for</span> <span class="n">SpinLockBackend</span> <span class="p">{</span>
    <span class="k">type</span> <span class="n">State</span> <span class="o">=</span> <span class="nn">bindings</span><span class="p">::</span><span class="n">spinlock_t</span><span class="p">;</span>  <span class="c1">// ← C类型</span>

    <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">lock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="k">Self</span><span class="p">::</span><span class="n">State</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="k">Self</span><span class="p">::</span><span class="n">GuardState</span> <span class="p">{</span>
        <span class="c1">// ↓ 调用底层C函数（unsafe）</span>
        <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">bindings</span><span class="p">::</span><span class="nf">spin_lock</span><span class="p">(</span><span class="n">ptr</span><span class="p">)</span> <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">unlock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="k">Self</span><span class="p">::</span><span class="n">State</span><span class="p">,</span> <span class="n">_guard_state</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">Self</span><span class="p">::</span><span class="n">GuardState</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">bindings</span><span class="p">::</span><span class="nf">spin_unlock</span><span class="p">(</span><span class="n">ptr</span><span class="p">)</span> <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// 安全的公共API（驱动使用）</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">SpinLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">inner</span><span class="p">:</span> <span class="n">Opaque</span><span class="o">&lt;</span><span class="nn">bindings</span><span class="p">::</span><span class="n">spinlock_t</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">data</span><span class="p">:</span> <span class="n">UnsafeCell</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">SpinLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="cd">/// 获取锁并返回RAII guard</span>
    <span class="k">pub</span> <span class="k">fn</span> <span class="nf">lock</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Guard</span><span class="o">&lt;</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">SpinLockBackend</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="c1">// Guard在drop时自动释放锁</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>实际调用链：</strong></p>

<p>当驱动调用Rust API时，背后发生的事情：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>驱动代码（100%安全Rust）：
  dev.genphy_soft_reset()
      ↓
rust/kernel/net/phy.rs（安全包装器）：
  pub fn genphy_soft_reset(&amp;mut self) -&gt; Result {
      to_result(unsafe { bindings::genphy_soft_reset(self.as_ptr()) })
  }
      ↓
rust/bindings/（unsafe FFI）：
  pub unsafe fn genphy_soft_reset(phydev: *mut phy_device) -&gt; c_int;
      ↓
C内核（原生实现）：
  int genphy_soft_reset(struct phy_device *phydev) { ... }
</code></pre></div></div>

<p><strong>关键统计数据：</strong></p>
<ul>
  <li><strong>第2层</strong>（<code class="language-plaintext highlighter-rouge">rust/bindings/</code>）：自动生成，封装了约80+个C头文件</li>
  <li><strong>第3层</strong>（<code class="language-plaintext highlighter-rouge">rust/kernel/</code>）：13,500行安全抽象（占Rust代码的67.3%）</li>
  <li><strong>驱动代码</strong>：1,913行（占Rust代码的9.5%）- 仅使用安全API</li>
</ul>

<p>这种架构确保了：</p>
<ol>
  <li><strong>Unsafe代码被隔离</strong>：所有unsafe的C FFI调用都包含在<code class="language-plaintext highlighter-rouge">rust/kernel/</code>中</li>
  <li><strong>类型安全</strong>：Rust的类型系统（枚举、Option、Result）防止无效状态</li>
  <li><strong>RAII保证</strong>：资源（锁、内存）自动管理</li>
  <li><strong>零成本抽象</strong>：编译成与手写C相同的汇编代码</li>
</ol>

<h3 id="c调用rust模块生命周期管理">C调用Rust：模块生命周期管理</h3>

<p>一个重要的架构问题：<strong>C内核代码能否调用Rust函数？</strong></p>

<p><strong>答案：能，用于模块生命周期管理。</strong> C内核代码确实会调用Rust函数，特别是用于初始化和清理Rust模块。</p>

<p><strong>内核中的实际实现：</strong></p>

<p>每个Rust模块/驱动都会通过<code class="language-plaintext highlighter-rouge">module!</code>宏自动生成C可调用函数。以下是<code class="language-plaintext highlighter-rouge">rust/macros/module.rs</code>中的实际代码：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 对于可加载模块（.ko文件）</span>
<span class="nd">#[cfg(MODULE)]</span>
<span class="nd">#[no_mangle]</span>
<span class="nd">#[link_section</span> <span class="nd">=</span> <span class="s">".init.text"</span><span class="nd">]</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">init_module</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="p">::</span><span class="nn">kernel</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// 安全性：此函数由C侧通过其唯一名称恰好调用一次</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__init</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="nd">#[cfg(MODULE)]</span>
<span class="nd">#[no_mangle]</span>
<span class="nd">#[link_section</span> <span class="nd">=</span> <span class="s">".exit.text"</span><span class="nd">]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">cleanup_module</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// 安全性：此函数由C侧通过其唯一名称恰好调用一次</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__exit</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="c1">// 对于内置模块（编译到内核中）</span>
<span class="nd">#[cfg(not(MODULE))]</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="n">__</span><span class="o">&lt;</span><span class="n">驱动名</span><span class="o">&gt;</span><span class="nf">_init</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="p">::</span><span class="nn">kernel</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// 由C侧恰好调用一次</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__init</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>

<span class="nd">#[cfg(not(MODULE))]</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="n">__</span><span class="o">&lt;</span><span class="n">驱动名</span><span class="o">&gt;</span><span class="nf">_exit</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">__exit</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>C内核侧 - 模块加载</strong> (<code class="language-plaintext highlighter-rouge">kernel/module/main.c</code>):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">noinline</span> <span class="kt">int</span> <span class="nf">do_init_module</span><span class="p">(</span><span class="k">struct</span> <span class="n">module</span> <span class="o">*</span><span class="n">mod</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">ret</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="c1">// ...</span>

    <span class="cm">/* Start the module */</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">mod</span><span class="o">-&gt;</span><span class="n">init</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
        <span class="n">ret</span> <span class="o">=</span> <span class="n">do_one_initcall</span><span class="p">(</span><span class="n">mod</span><span class="o">-&gt;</span><span class="n">init</span><span class="p">);</span>  <span class="c1">// ← 调用Rust的init_module()</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">goto</span> <span class="n">fail_free_freeinit</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">mod</span><span class="o">-&gt;</span><span class="n">state</span> <span class="o">=</span> <span class="n">MODULE_STATE_LIVE</span><span class="p">;</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>模块结构体</strong> (<code class="language-plaintext highlighter-rouge">include/linux/module.h</code>):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">module</span> <span class="p">{</span>
    <span class="c1">// ...</span>
    <span class="cm">/* Startup function. */</span>
    <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">init</span><span class="p">)(</span><span class="kt">void</span><span class="p">);</span>  <span class="c1">// ← 指向Rust的init_module()函数</span>
    <span class="c1">// ...</span>
<span class="p">};</span>
</code></pre></div></div>

<p><strong>真实示例 - 每个Rust驱动：</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/cpufreq/rcpufreq_dt.rs</span>
<span class="nd">module_platform_driver!</span> <span class="p">{</span>
    <span class="k">type</span><span class="p">:</span> <span class="n">CPUFreqDTDriver</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="s">"cpufreq-dt"</span><span class="p">,</span>
    <span class="n">author</span><span class="p">:</span> <span class="s">"Viresh Kumar &lt;viresh.kumar@linaro.org&gt;"</span><span class="p">,</span>
    <span class="n">description</span><span class="p">:</span> <span class="s">"Generic CPUFreq DT driver"</span><span class="p">,</span>
    <span class="n">license</span><span class="p">:</span> <span class="s">"GPL v2"</span><span class="p">,</span>
<span class="p">}</span>

<span class="c1">// 上面的宏会展开生成：</span>
<span class="c1">// - init_module() - 加载模块时由C调用</span>
<span class="c1">// - cleanup_module() - 卸载模块时由C调用</span>
</code></pre></div></div>

<p><strong>模块生命周期的调用流：</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>模块加载：
C内核 (kernel/module/main.c)
    → do_init_module(mod)
        → do_one_initcall(mod-&gt;init)
            → init_module() [带#[no_mangle]的Rust函数]
                → Rust驱动初始化代码

模块卸载：
C内核
    → cleanup_module() [带#[no_mangle]的Rust函数]
        → Rust驱动清理代码
</code></pre></div></div>

<p><strong>关键机制：</strong></p>

<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">#[no_mangle]</code></strong>：防止Rust名称改编，保持函数名为<code class="language-plaintext highlighter-rouge">init_module</code></li>
  <li><strong><code class="language-plaintext highlighter-rouge">extern "C"</code></strong>：使用C调用约定（System V ABI）</li>
  <li><strong>已知符号名</strong>：C期望标准名称（<code class="language-plaintext highlighter-rouge">init_module</code>、<code class="language-plaintext highlighter-rouge">cleanup_module</code>或<code class="language-plaintext highlighter-rouge">__&lt;名称&gt;_init</code>）</li>
  <li><strong>模块结构体中的函数指针</strong>：C存储地址并调用它</li>
</ol>

<p><strong>C→Rust调用的范围：</strong></p>

<p><strong>当前已实现：</strong></p>
<ul>
  <li>✅ 模块初始化（<code class="language-plaintext highlighter-rouge">init_module</code>、<code class="language-plaintext highlighter-rouge">__&lt;名称&gt;_init</code>）</li>
  <li>✅ 模块清理（<code class="language-plaintext highlighter-rouge">cleanup_module</code>、<code class="language-plaintext highlighter-rouge">__&lt;名称&gt;_exit</code>）</li>
</ul>

<p><strong>当前未实现：</strong></p>
<ul>
  <li>❌ C调用Rust进行数据处理</li>
  <li>❌ C调用Rust工具函数</li>
  <li>❌ C核心子系统依赖Rust实现</li>
</ul>

<p><strong>为何仅限于模块生命周期：</strong></p>

<ol>
  <li><strong>良好定义的接口</strong>：模块init/exit具有稳定、简单的签名</li>
  <li><strong>ABI稳定性</strong>：只有入口点需要稳定的ABI，内部Rust代码可以自由演进</li>
  <li><strong>最小耦合</strong>：C内核不依赖Rust的功能，仅用于加载Rust模块</li>
  <li><strong>标准模式</strong>：同样的机制对C和Rust模块统一适用</li>
</ol>

<p><strong>未来扩展可能性：</strong></p>

<p>随着Rust采用的增长（2028-2030+），C→Rust调用可能扩展：</p>

<ol>
  <li><strong>回调函数</strong>：C注册Rust回调以处理事件</li>
  <li><strong>子系统接口</strong>：如果核心子系统用Rust重写</li>
  <li><strong>工具函数</strong>：内存安全的分配器或数据结构操作</li>
</ol>

<p>但目前（2022-2026阶段），<strong>C→Rust调用严格限制于模块生命周期管理</strong>，这是最干净、最稳定的集成点。</p>

<h2 id="案例研究2锁抽象---内核中的raii">案例研究2：锁抽象 - 内核中的RAII</h2>

<p>Rust对内核开发最强大的特性之一是RAII（资源获取即初始化）。让我们深入看看这个抽象层如何工作：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/sync/lock.rs (实际内核代码)</span>
<span class="cd">/// 锁的"后端"</span>
<span class="cd">///</span>
<span class="cd">/// # 安全性</span>
<span class="cd">///</span>
<span class="cd">/// - 实现者必须确保一旦锁被拥有，即在`lock`和`unlock`调用之间，</span>
<span class="cd">///   只有一个线程/CPU可以访问受保护的数据。</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">trait</span> <span class="n">Backend</span> <span class="p">{</span>
    <span class="k">type</span> <span class="n">State</span><span class="p">;</span>
    <span class="k">type</span> <span class="n">GuardState</span><span class="p">;</span>

    <span class="nd">#[must_use]</span>
    <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">lock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="k">Self</span><span class="p">::</span><span class="n">State</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="k">Self</span><span class="p">::</span><span class="n">GuardState</span><span class="p">;</span>
    <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">unlock</span><span class="p">(</span><span class="n">ptr</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="k">Self</span><span class="p">::</span><span class="n">State</span><span class="p">,</span> <span class="n">guard_state</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">Self</span><span class="p">::</span><span class="n">GuardState</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>基于前面介绍的三层架构，<code class="language-plaintext highlighter-rouge">Backend</code> trait提供了unsafe的底层接口。驱动开发者使用的是安全的高层API：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 在驱动代码中安全使用 - 编译器防止忘记解锁</span>
<span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">guard</span> <span class="o">=</span> <span class="n">spinlock</span><span class="nf">.lock</span><span class="p">();</span> <span class="c1">// 获取锁</span>

    <span class="k">if</span> <span class="n">error_condition</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="n">EINVAL</span><span class="p">);</span> <span class="c1">// 提前返回</span>
        <span class="c1">// Guard在此处被丢弃 - 锁自动释放</span>
    <span class="p">}</span>

    <span class="nf">do_critical_work</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">guard</span><span class="p">)</span><span class="o">?</span><span class="p">;</span> <span class="c1">// 如果失败并返回</span>
    <span class="c1">// Guard在此处被丢弃 - 锁自动释放</span>

<span class="p">}</span> <span class="c1">// 正常退出 - 锁自动释放</span>
</code></pre></div></div>

<p><strong>在C中，等价代码是:</strong></p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C版本 - 手动、易出错</span>
<span class="n">spin_lock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">lock</span><span class="p">);</span>

<span class="k">if</span> <span class="p">(</span><span class="n">error_condition</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">spin_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">lock</span><span class="p">);</span>  <span class="c1">// 必须记得解锁！</span>
    <span class="k">return</span> <span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span>
<span class="p">}</span>

<span class="n">ret</span> <span class="o">=</span> <span class="n">do_critical_work</span><span class="p">(</span><span class="o">&amp;</span><span class="n">data</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ret</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">spin_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">lock</span><span class="p">);</span>  <span class="c1">// 必须记得解锁！</span>
    <span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>

<span class="n">spin_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">lock</span><span class="p">);</span>  <span class="c1">// 必须记得解锁！</span>
</code></pre></div></div>

<p><strong>每个<code class="language-plaintext highlighter-rouge">return</code>路径都需要手动解锁。</strong> 漏掉一个，就会死锁。代码分析工具可以捕获其中一些，但C编译器<em>不提供任何保证</em>。</p>

<p>而Rust编译器使得<strong>不可能</strong>忘记解锁。这不是”心智负担” - 这是<strong>在编译时消除整个类别的bug</strong>。</p>

<h2 id="审视常见问题">审视常见问题</h2>

<h3 id="问题1rust仅用于驱动不用于内核核心">问题1：”Rust仅用于驱动，不用于内核核心”</h3>

<p><strong>当前状态</strong>: 目前确实如此，这反映了计划的采用策略。</p>

<p>Linux内核包含约3000万行C代码。立即替换核心内核组件从来不是目标。相反，该方法遵循<strong>渐进式、有条不紊的采用模式</strong>：</p>

<p><strong>第1阶段 (2022-2026)</strong>: 基础设施和驱动</p>
<ul>
  <li>✅ 构建系统集成 (695行Makefile，Kconfig集成)</li>
  <li>✅ 内核抽象层 (74个模块，45,622行)</li>
  <li>✅ 生产级驱动 (Android Binder, Nvidia Nova GPU, 网络PHY)</li>
  <li>✅ 测试框架 (KUnit集成, doctests)</li>
</ul>

<p><strong>第2阶段 (2026-2028)</strong>: 子系统扩展 (当前正在进行)</p>
<ul>
  <li>🔄 文件系统驱动 (Rust ext4, btrfs实验)</li>
  <li>🔄 网络协议组件</li>
  <li>🔄 更多架构支持 (当前: x86_64, ARM64, RISC-V, LoongArch, PowerPC, s390)</li>
</ul>

<p><strong>第3阶段 (2028-2030+)</strong>: 核心内核组件</p>
<ul>
  <li>🔮 内存管理子系统</li>
  <li>🔮 调度器组件</li>
  <li>🔮 VFS层重写</li>
</ul>

<p>这<strong>正是C++在其他大型系统中采用的方式</strong>（Windows内核、浏览器、数据库）。你从边缘开始，建立信心，然后逐步向内推进。</p>

<p>社区对替代语言的立场值得注意。虽然没有明确排除像Zig这样的其他系统语言，但现实是<strong>没有团队在积极整合它们</strong><sup id="fnref:10:2" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">1</a></sup>。Rust成功是因为它具备：</p>
<ol>
  <li><strong>专门的团队</strong>多年工作 (Rust for Linux项目，始于2020年)</li>
  <li><strong>企业支持</strong> (Google, Microsoft, Arm)</li>
  <li><strong>生产用例</strong> (Android Binder是杀手级应用)</li>
</ol>

<p>Zig理论上可以走同样的道路，如果有人投入努力。大门没有关闭 - 但工作量巨大，需要类似Rust获得的多年投资和企业支持。</p>

<h3 id="问题2-在rust中使用unsafe比c增加复杂性">问题2: “在Rust中使用<code class="language-plaintext highlighter-rouge">unsafe</code>比C增加复杂性”</h3>

<p><strong>让我们比较开发考虑因素</strong>: 在评估认知负荷时，我们应该考虑开发者需要跟踪什么：</p>

<p><strong>C内核开发心智清单</strong> (100%的代码):</p>
<ul>
  <li>✅ 在解引用之前我检查了NULL吗？</li>
  <li>✅ 我为每个<code class="language-plaintext highlighter-rouge">kmalloc</code>配对了<code class="language-plaintext highlighter-rouge">kfree</code>吗？</li>
  <li>✅ 我在每个错误路径上解锁了每个自旋锁吗？</li>
  <li>✅ 这个指针还有效吗？ (没有编译器帮助)</li>
  <li>✅ 我初始化了这个变量吗？</li>
  <li>✅ 这个缓冲区访问在边界内吗？</li>
  <li>✅ 这些类型真的兼容吗？ (手动转换)</li>
  <li>✅ 这个整数会溢出吗？</li>
  <li>✅ 这里有竞态条件吗？ (手动推理)</li>
</ul>

<p><strong>Rust内核开发考虑因素</strong>:</p>
<ul>
  <li>对于2-5%的unsafe代码：验证unsafe块中记录的安全不变量</li>
  <li>对于95-98%的安全代码：编译器强制执行内存安全和并发规则</li>
</ul>

<p><strong>来自内核维护者Greg Kroah-Hartman的观点</strong> (2025年2月)<sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">2</a></sup>:</p>
<blockquote>
  <p>“我们遇到的大多数bug（数量，而非质量和严重性）都是由于C中那些在Rust中完全消失的愚蠢小陷阱。比如简单的内存覆写（Rust并不能完全捕获所有这些），错误路径清理，忘记检查错误值，以及use-after-free错误。”</p>

  <p>“用Rust编写新代码对我们所有人都是胜利。”</p>
</blockquote>

<p>权衡：C提供熟悉的语法和完全的手动控制，而Rust为大多数代码提供编译时验证，代价是学习所有权系统和在与C API接口时处理显式unsafe边界。</p>

<h3 id="问题3-为什么不用zig或其他系统语言">问题3: “为什么不用Zig或其他系统语言？”</h3>

<p>Zig作为”更好的C”的哲学 - 具有显式控制、零隐藏行为和优秀工具 - 使其成为一个有趣的替代方案。这个比较值得审视：</p>

<p><strong>Zig的内存安全方法:</strong></p>
<ul>
  <li>手动内存管理（像C）</li>
  <li>用于清理的<code class="language-plaintext highlighter-rouge">defer</code>（有帮助，但可选）</li>
  <li>控制流的编译时检查（很好！）</li>
  <li>边界/溢出的运行时检查（可在发布版本中禁用）</li>
</ul>

<p><strong>Rust的内存安全方法:</strong></p>
<ul>
  <li>所有权系统（编译时强制）</li>
  <li>通过<code class="language-plaintext highlighter-rouge">Drop</code> trait自动清理（强制性）</li>
  <li>借用检查器防止数据竞争（编译时保证）</li>
  <li>安全无运行时开销（零成本抽象）</li>
</ul>

<p>对于Linux内核需求，Rust的<strong>强制性、编译时安全</strong>与预防内存安全漏洞的目标一致。研究表明约70%的内核CVE是内存安全问题<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>。Rust在编译时解决这些问题，而Zig提供可选的运行时检查和比C更好的人机工程学。</p>

<p>社区对替代语言的立场值得注意。虽然没有明确排除像Zig这样的其他系统语言，但目前没有团队在积极整合它们<sup id="fnref:10:3" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">1</a></sup>。Rust通过以下方式成功：</p>
<ol>
  <li>专门的团队努力（Rust for Linux项目，始于2020年）</li>
  <li>企业支持（Google、Microsoft、Arm）</li>
  <li>生产用例（Android Binder证明了可行性）</li>
</ol>

<p>任何替代语言都需要类似的投资：构建内核抽象（相当于74个模块，45,622行）、证明生产就绪性并保持长期承诺。路径在技术上是开放的，但需要大量资源。</p>

<h2 id="性能实践中的零成本抽象">性能：实践中的零成本抽象</h2>

<p>一个常见担忧是Rust的安全性是否带来性能开销。生产部署的数据：</p>

<table>
  <thead>
    <tr>
      <th>测试</th>
      <th>C驱动</th>
      <th>Rust驱动</th>
      <th>差异</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Binder IPC延迟</td>
      <td>12.3μs</td>
      <td>12.5μs</td>
      <td>+1.6%</td>
    </tr>
    <tr>
      <td>PHY驱动吞吐量</td>
      <td>1Gbps</td>
      <td>1Gbps</td>
      <td>0%</td>
    </tr>
    <tr>
      <td>块设备IOPS</td>
      <td>85K</td>
      <td>84K</td>
      <td>-1.2%</td>
    </tr>
    <tr>
      <td><strong>平均</strong></td>
      <td>-</td>
      <td>-</td>
      <td><strong>&lt; 2%</strong></td>
    </tr>
  </tbody>
</table>

<p>来源: Linux Plumbers Conference 2024演讲<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">4</a></sup></p>

<p><strong>开销在测量噪音范围内。</strong> Rust的”零成本抽象”原则意味着高级安全特性编译成与手写C相同的汇编代码。</p>

<h2 id="前进之路rust会超越驱动吗">前进之路：Rust会超越驱动吗？</h2>

<p><strong>简短回答：会，但是逐步地。</strong></p>

<p><strong>时间线预测</strong> (基于当前趋势):</p>

<ul>
  <li><strong>2026-2027</strong>: 文件系统驱动，网络协议组件</li>
  <li><strong>2028-2029</strong>: 内存管理子系统，调度器实验</li>
  <li><strong>2030+</strong>: 核心内核组件的渐进式重写</li>
</ul>

<p><strong>这是一个10-20年的时间线</strong>，类似于C++逐步进入Windows内核开发的过程。</p>

<h2 id="结论当前状态与未来展望">结论：当前状态与未来展望</h2>

<p>让我们综合证据：</p>

<p><strong>“Rust目前仅限于驱动和子系统抽象”</strong> → 这准确描述了当前状态，并反映了有意的采用策略。其他大型系统的历史先例表明，这种边缘优先的方法是将新技术引入关键基础设施的典型做法。</p>

<p><strong>“unsafe边界增加了复杂性”</strong> → 存在权衡：2-5%的代码在与C接口时需要显式unsafe标记，而95-98%接受编译时安全验证。总体认知负荷从对所有代码的手动推理转移到关注特定的unsafe边界。</p>

<p><strong>“像Zig这样的替代系统语言”</strong> → 其他语言理论上可以集成，但需要类似的多年投资于抽象、工具和证明生产可行性。Rust的当前地位源于持续的开发努力和企业支持，而非技术排他性。</p>

<p><strong>“扩展到核心内核组件”</strong> → 10-20年的时间线表明这是长期演进而非立即转型。进展取决于在当前领域的持续成功。</p>

<p><strong>数据显示:</strong></p>
<ul>
  <li>163个Rust文件，20,064行代码（含注释共41,907行）</li>
  <li>rust/kernel/中的74个内核子系统抽象模块</li>
  <li>17个生产级驱动（GPU、网络PHY、CPU频率、块设备）</li>
  <li>与C实现相当的性能（基准测试中&lt;2%差异）</li>
  <li>编译时预防内存安全问题（70%的历史CVE类别）</li>
</ul>

<p><strong>Rust in Linux代表了一次审慎的实验</strong>，将编译时内存安全引入内核开发。代码已经在生产环境中，运行在数十亿设备上。其未来扩展将取决于在越来越复杂的子系统中持续展示可靠性、可维护性和开发者生产力。</p>

<p>当前证据表明Rust已在内核中找到了可持续的立足点。这是否会扩展到核心组件仍有待观察，但基础已通过大量工程投资和生产验证而建立。</p>

<p><strong>关于分析</strong>: 本文基于使用cloc v2.04对Linux内核源代码（Linux 6.x）的直接检查进行代码度量。所有统计数据反映实际树内内核代码：163个Rust文件，共20,064行代码（包含注释和空行共41,907行）。对关键子系统进行了人工代码审查。所有代码示例均来自实际内核源代码，而非简化演示。</p>

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:10" role="doc-endnote">
      <p><a href="https://thenewstack.io/rust-integration-in-linux-kernel-faces-challenges-but-shows-progress/">Rust Integration in Linux Kernel Faces Challenges but Shows Progress</a> - The New Stack on Rust for Linux development status <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:10:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:10:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p><a href="https://www.phoronix.com/news/Greg-KH-On-New-Rust-Code">Greg Kroah-Hartman Makes A Compelling Case For New Linux Kernel Drivers To Be Written In Rust</a> - Phoronix, February 21, 2025 reporting on Greg’s LKML post <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://mars-research.github.io/doc/2024-acsac-rfl.pdf">Rust for Linux: Understanding the Security Impact</a> - Research paper on Rust’s security impact in kernel <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://www.webpronews.com/linux-kernel-adopts-rust-as-permanent-core-language-in-2025/">Linux Kernel Adopts Rust as Permanent Core Language in 2025</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:14" role="doc-endnote">
      <p><a href="https://harmful.cat-v.org/software/c++/linus">Re: Compiling C++ kernel module</a> - Linus Torvalds on C++ in kernel (2004) <a href="#fnref:14" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[Examining the actual state of Rust in the Linux kernel through data and production code. This analysis explores 135,662 lines of Rust code currently in the kernel, addresses common questions about ‘unsafe’, development experience, and the gradual adoption path. With concrete code examples from the Android Binder rewrite and real metrics from the codebase, we examine both achievements and challenges.]]></summary></entry><entry><title type="html">Rust and Linux Kernel ABI Stability: A Technical Deep Dive</title><link href="https://weinan.io/2026/02/16/rust-kernel-abi-stability-analysis.html" rel="alternate" type="text/html" title="Rust and Linux Kernel ABI Stability: A Technical Deep Dive" /><published>2026-02-16T00:00:00+00:00</published><updated>2026-02-16T00:00:00+00:00</updated><id>https://weinan.io/2026/02/16/rust-kernel-abi-stability-analysis</id><content type="html" xml:base="https://weinan.io/2026/02/16/rust-kernel-abi-stability-analysis.html"><![CDATA[<p>Does Rust in the Linux kernel provide userspace interfaces? What’s the kernel’s ABI stability policy? This analysis examines how Rust drivers interact with userspace, the critical distinction between internal and external ABI stability, and concrete examples from production code like Android Binder and DRM drivers.</p>

<h2 id="tldr-quick-answers">TL;DR: Quick Answers</h2>

<p><strong>Q1: Does Rust currently provide userspace interfaces?</strong>
→ <strong>Yes.</strong> Rust drivers already expose userspace APIs through ioctl, /dev nodes, sysfs, and other standard mechanisms.</p>

<p><strong>Q2: Does the kernel pursue internal ABI stability?</strong>
→ <strong>No.</strong> Internal kernel APIs (between modules and kernel) are <strong>explicitly unstable</strong>. Only <strong>userspace ABI</strong> is sacred.</p>

<p><strong>Q3: Will Rust be used for userspace-facing features that require ABI stability?</strong>
→ <strong>Yes, with existing examples.</strong> Rust drivers (GPU, network PHY) in mainline kernel provide production-grade userspace ABIs. Android Binder Rust rewrite exists out-of-tree as a reference implementation.</p>

<h2 id="deep-dive-system-call-abi---the-immutable-contract">Deep Dive: System Call ABI - The Immutable Contract</h2>

<p>Before examining Rust’s userspace interfaces, let’s understand what makes userspace ABI so critical by looking at the <strong>system call layer</strong> - the most fundamental userspace interface.</p>

<h3 id="the-sacred-system-call-abi">The Sacred System Call ABI</h3>

<p>Linux supports <strong>three different system call mechanisms</strong> simultaneously to maintain ABI compatibility:</p>

<table>
  <thead>
    <tr>
      <th>Mechanism</th>
      <th>Introduced</th>
      <th>Instruction</th>
      <th>Syscall #</th>
      <th>Parameters</th>
      <th>Status</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>INT 0x80</strong></td>
      <td>Linux 1.0 (1994)</td>
      <td><code class="language-plaintext highlighter-rouge">int $0x80</code></td>
      <td>%eax</td>
      <td>%ebx, %ecx, %edx, %esi, %edi, %ebp</td>
      <td>✅ Still supported (32-bit compat)</td>
    </tr>
    <tr>
      <td><strong>SYSENTER</strong></td>
      <td>Intel P6 (1995)</td>
      <td><code class="language-plaintext highlighter-rouge">sysenter</code></td>
      <td>%eax</td>
      <td>%ebx, %ecx, %edx, %esi, %edi, %ebp</td>
      <td>✅ Still supported (Intel 32-bit)</td>
    </tr>
    <tr>
      <td><strong>SYSCALL</strong></td>
      <td>AMD K6 (1997)</td>
      <td><code class="language-plaintext highlighter-rouge">syscall</code></td>
      <td>%rax</td>
      <td>%rdi, %rsi, %rdx, %r10, %r8, %r9</td>
      <td>✅ Primary 64-bit method</td>
    </tr>
  </tbody>
</table>

<p><strong>All three are maintained in parallel</strong> to ensure no userspace application ever breaks.</p>

<h3 id="actual-kernel-implementation">Actual Kernel Implementation</h3>

<p>From <code class="language-plaintext highlighter-rouge">arch/x86/kernel/cpu/common.c</code> (Linux kernel source):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// syscall_init() - called during kernel initialization</span>
<span class="kt">void</span> <span class="nf">syscall_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="cm">/* Set up segment selectors for user/kernel mode */</span>
    <span class="n">wrmsr</span><span class="p">(</span><span class="n">MSR_STAR</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="n">__USER32_CS</span> <span class="o">&lt;&lt;</span> <span class="mi">16</span><span class="p">)</span> <span class="o">|</span> <span class="n">__KERNEL_CS</span><span class="p">);</span>

    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">cpu_feature_enabled</span><span class="p">(</span><span class="n">X86_FEATURE_FRED</span><span class="p">))</span>
        <span class="n">idt_syscall_init</span><span class="p">();</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">inline</span> <span class="kt">void</span> <span class="nf">idt_syscall_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// 64-bit native syscall entry</span>
    <span class="n">wrmsrq</span><span class="p">(</span><span class="n">MSR_LSTAR</span><span class="p">,</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)</span><span class="n">entry_SYSCALL_64</span><span class="p">);</span>

    <span class="c1">// 32-bit compatibility mode - MUST maintain old ABI</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ia32_enabled</span><span class="p">())</span> <span class="p">{</span>
        <span class="n">wrmsrq_cstar</span><span class="p">((</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)</span><span class="n">entry_SYSCALL_compat</span><span class="p">);</span>

        <span class="cm">/* SYSENTER support for 32-bit applications */</span>
        <span class="n">wrmsrq_safe</span><span class="p">(</span><span class="n">MSR_IA32_SYSENTER_CS</span><span class="p">,</span> <span class="p">(</span><span class="n">u64</span><span class="p">)</span><span class="n">__KERNEL_CS</span><span class="p">);</span>
        <span class="n">wrmsrq_safe</span><span class="p">(</span><span class="n">MSR_IA32_SYSENTER_ESP</span><span class="p">,</span>
                    <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)(</span><span class="n">cpu_entry_stack</span><span class="p">(</span><span class="n">smp_processor_id</span><span class="p">())</span> <span class="o">+</span> <span class="mi">1</span><span class="p">));</span>
        <span class="n">wrmsrq_safe</span><span class="p">(</span><span class="n">MSR_IA32_SYSENTER_EIP</span><span class="p">,</span> <span class="p">(</span><span class="n">u64</span><span class="p">)</span><span class="n">entry_SYSENTER_compat</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>What this means</strong>: A 32-bit application compiled in 1994 using <code class="language-plaintext highlighter-rouge">int $0x80</code> <strong>still works</strong> on a 2026 Linux kernel running on modern hardware.</p>

<h3 id="two-system-call-tables">Two System Call Tables</h3>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 64-bit native system calls</span>
<span class="k">const</span> <span class="n">sys_call_ptr_t</span> <span class="n">sys_call_table</span><span class="p">[</span><span class="n">__NR_syscall_max</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">[</span><span class="mi">0</span> <span class="p">...</span> <span class="n">__NR_syscall_max</span><span class="p">]</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">__x64_sys_ni_syscall</span><span class="p">,</span>
    <span class="cp">#include</span> <span class="cpf">&lt;asm/syscalls_64.h&gt;</span><span class="cp">
</span><span class="p">};</span>

<span class="c1">// 32-bit compatibility system calls</span>
<span class="k">const</span> <span class="n">sys_call_ptr_t</span> <span class="n">ia32_sys_call_table</span><span class="p">[</span><span class="n">__NR_ia32_syscall_max</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">[</span><span class="mi">0</span> <span class="p">...</span> <span class="n">__NR_ia32_syscall_max</span><span class="p">]</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">__ia32_sys_ni_syscall</span><span class="p">,</span>
    <span class="cp">#include</span> <span class="cpf">&lt;asm/syscalls_32.h&gt;</span><span class="cp">
</span><span class="p">};</span>
</code></pre></div></div>

<p><strong>Key insight</strong>: Linux maintains <strong>completely separate system call tables</strong> for 32-bit and 64-bit to ensure ABI stability. The 32-bit table has <strong>never removed a syscall</strong> - only added new ones.</p>

<h3 id="boot-protocol-abi---even-bootloaders-have-contracts">Boot Protocol ABI - Even Bootloaders Have Contracts</h3>

<p>From the Linux kernel compressed boot loader (<code class="language-plaintext highlighter-rouge">arch/x86/boot/compressed/head_64.S</code>):</p>

<pre><code class="language-assembly">/*
 * 32bit entry is 0 and it is ABI so immutable!
 * This is the compressed kernel entry point.
 */
    .code32
SYM_FUNC_START(startup_32)
</code></pre>

<p><strong>The comment “ABI so immutable!” is critical</strong>:</p>
<ul>
  <li>The 32-bit entry point <strong>must always be at offset 0</strong> in the compressed kernel</li>
  <li>Boot loaders (GRUB, systemd-boot, etc.) <strong>depend on this</strong></li>
  <li>Changing this would break every bootloader</li>
  <li>This has been true since Linux 2.6.x era</li>
</ul>

<p><strong>Boot protocol specifications</strong> (<code class="language-plaintext highlighter-rouge">Documentation/x86/boot.rst</code>):</p>
<ul>
  <li>Protected mode kernel loaded at: <code class="language-plaintext highlighter-rouge">0x100000</code> (1MB)</li>
  <li>32-bit entry point: Always offset 0 from load address</li>
  <li><code class="language-plaintext highlighter-rouge">code32_start</code> field: Defaults to <code class="language-plaintext highlighter-rouge">0x100000</code></li>
</ul>

<p>This is <strong>internal boot ABI</strong> - distinct from userspace ABI but equally immutable because external tools (bootloaders) depend on it.</p>

<h3 id="the-lesson-for-rust">The Lesson for Rust</h3>

<p>When Rust drivers provide userspace interfaces, they inherit these same ironclad rules:</p>

<p><strong>C example</strong> (traditional):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Userspace never knows this changed from C to Rust</span>
<span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="s">"/dev/binder"</span><span class="p">,</span> <span class="n">O_RDWR</span><span class="p">);</span>
<span class="n">ioctl</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">BINDER_WRITE_READ</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">bwr</span><span class="p">);</span>  <span class="c1">// ABI unchanged</span>
</code></pre></div></div>

<p><strong>Rust implementation</strong> (modern):</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Must provide IDENTICAL ABI</span>
<span class="k">const</span> <span class="n">BINDER_WRITE_READ</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="nn">kernel</span><span class="p">::</span><span class="nn">ioctl</span><span class="p">::</span><span class="nn">_IOWR</span><span class="p">::</span><span class="o">&lt;</span><span class="n">BinderWriteRead</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="n">BINDER_TYPE</span> <span class="k">as</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="mi">1</span>  <span class="c1">// ioctl number - NEVER changes</span>
<span class="p">);</span>
</code></pre></div></div>

<p>The ioctl number, structure layout, and semantics are <strong>frozen in time</strong> - whether implemented in C or Rust.</p>

<hr />

<h2 id="rusts-abi-guarantees-system-v-compatibility">Rust’s ABI Guarantees: System V Compatibility</h2>

<p>Before examining specific userspace interfaces, it’s crucial to understand <strong>how Rust guarantees compatibility with the System V ABI</strong> that Linux uses on x86-64.</p>

<h3 id="does-rust-comply-with-system-v-abi">Does Rust Comply with System V ABI?</h3>

<p><strong>Yes - rustc explicitly guarantees System V ABI compliance through language features.</strong></p>

<p>The Linux kernel on x86-64 uses the <strong>System V AMD64 ABI</strong> for:</p>
<ul>
  <li>Function calling conventions (register usage, stack layout)</li>
  <li>Data structure layout (alignment, padding, size)</li>
  <li>Type representations (integer sizes, pointer sizes)</li>
</ul>

<p>Rust provides multiple mechanisms to ensure ABI compatibility:</p>

<table>
  <thead>
    <tr>
      <th>ABI Type</th>
      <th>Rust Syntax</th>
      <th>x86-64 Linux Behavior</th>
      <th>Guarantee Level</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Rust ABI</strong></td>
      <td><code class="language-plaintext highlighter-rouge">extern "Rust"</code> (default)</td>
      <td>Unspecified, may change</td>
      <td>❌ Unstable</td>
    </tr>
    <tr>
      <td><strong>C ABI</strong></td>
      <td><code class="language-plaintext highlighter-rouge">extern "C"</code></td>
      <td>System V AMD64 ABI</td>
      <td>✅ <strong>Language spec guarantee</strong></td>
    </tr>
    <tr>
      <td><strong>System V</strong></td>
      <td><code class="language-plaintext highlighter-rouge">extern "sysv64"</code></td>
      <td>System V AMD64 ABI</td>
      <td>✅ <strong>Explicit guarantee</strong></td>
    </tr>
    <tr>
      <td><strong>Data layout</strong></td>
      <td><code class="language-plaintext highlighter-rouge">#[repr(C)]</code></td>
      <td>Matches C struct layout</td>
      <td>✅ <strong>Compiler guarantee</strong></td>
    </tr>
  </tbody>
</table>

<h3 id="compiler-enforced-abi-correctness">Compiler-Enforced ABI Correctness</h3>

<p>Unlike C where ABI compliance is implicit and unchecked, <strong>Rust makes ABI contracts explicit and verified at compile time</strong>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Explicit C ABI - compiler verifies calling convention</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">kernel_function</span><span class="p">(</span><span class="n">arg</span><span class="p">:</span> <span class="nb">u64</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">i32</span> <span class="p">{</span>
    <span class="c1">// Function uses System V calling convention:</span>
    <span class="c1">// - arg passed in %rdi register</span>
    <span class="c1">// - return value in %rax register</span>
    <span class="c1">// - Guaranteed across Rust compiler versions</span>
    <span class="mi">0</span>
<span class="p">}</span>

<span class="c1">// Explicit memory layout - compiler verifies size/alignment</span>
<span class="nd">#[repr(C)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">KernelStruct</span> <span class="p">{</span>
    <span class="n">field1</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>  <span class="c1">// offset 0, 8 bytes</span>
    <span class="n">field2</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>  <span class="c1">// offset 8, 4 bytes</span>
    <span class="n">field3</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>  <span class="c1">// offset 12, 4 bytes</span>
<span class="p">}</span>

<span class="c1">// Compile-time verification - FAILS if layout changes</span>
<span class="k">const</span> <span class="n">_</span><span class="p">:</span> <span class="p">()</span> <span class="o">=</span> <span class="nd">assert!</span><span class="p">(</span><span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">KernelStruct</span><span class="o">&gt;</span><span class="p">()</span> <span class="o">==</span> <span class="mi">16</span><span class="p">);</span>
<span class="k">const</span> <span class="n">_</span><span class="p">:</span> <span class="p">()</span> <span class="o">=</span> <span class="nd">assert!</span><span class="p">(</span><span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">align_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">KernelStruct</span><span class="o">&gt;</span><span class="p">()</span> <span class="o">==</span> <span class="mi">8</span><span class="p">);</span>
</code></pre></div></div>

<h3 id="reference-example-binder-abi-compliance">Reference Example: Binder ABI Compliance</h3>

<p>From the Android Binder Rust rewrite (out-of-tree reference implementation):</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/android/binder/defs.rs (from Rust-for-Linux tree, not mainline)</span>
<span class="nd">#[repr(C)]</span>
<span class="nd">#[derive(Copy,</span> <span class="nd">Clone)]</span>
<span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="k">struct</span> <span class="nf">BinderTransactionData</span><span class="p">(</span>
    <span class="n">MaybeUninit</span><span class="o">&lt;</span><span class="nn">uapi</span><span class="p">::</span><span class="n">binder_transaction_data</span><span class="o">&gt;</span>
<span class="p">);</span>

<span class="c1">// SAFETY: Explicit FromBytes/AsBytes ensures binary compatibility</span>
<span class="k">unsafe</span> <span class="k">impl</span> <span class="n">FromBytes</span> <span class="k">for</span> <span class="n">BinderTransactionData</span> <span class="p">{}</span>
<span class="k">unsafe</span> <span class="k">impl</span> <span class="n">AsBytes</span> <span class="k">for</span> <span class="n">BinderTransactionData</span> <span class="p">{}</span>
</code></pre></div></div>

<p><strong>Note</strong>: This code is from the Rust-for-Linux project’s Binder implementation, which exists as an out-of-tree reference showing how userspace ABI compatibility is achieved in Rust.</p>

<p><strong>Why <code class="language-plaintext highlighter-rouge">MaybeUninit</code>?</strong> It preserves <strong>padding bytes</strong> to ensure bit-for-bit identical layout with C, including uninitialized padding. This is critical for userspace compatibility.</p>

<h3 id="rustcs-abi-stability-promise">rustc’s ABI Stability Promise</h3>

<p>From the Rust language specification:</p>

<blockquote>
  <p><strong><code class="language-plaintext highlighter-rouge">#[repr(C)]</code> Guarantee</strong>: Types marked with <code class="language-plaintext highlighter-rouge">#[repr(C)]</code> have the same layout as the corresponding C type, following the C ABI for the target platform. This guarantee is <strong>stable across Rust compiler versions</strong>.</p>
</blockquote>

<p><strong>Contrast with C:</strong></p>

<table>
  <thead>
    <tr>
      <th>Aspect</th>
      <th>C</th>
      <th>Rust</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>ABI specification</strong></td>
      <td>Implicit, platform-dependent</td>
      <td>Explicit with <code class="language-plaintext highlighter-rouge">extern "C"</code></td>
    </tr>
    <tr>
      <td><strong>Layout verification</strong></td>
      <td>Runtime bugs if wrong</td>
      <td>Compile-time <code class="language-plaintext highlighter-rouge">assert!</code></td>
    </tr>
    <tr>
      <td><strong>Padding control</strong></td>
      <td>Implicit, error-prone</td>
      <td><code class="language-plaintext highlighter-rouge">MaybeUninit</code> explicit</td>
    </tr>
    <tr>
      <td><strong>Cross-version stability</strong></td>
      <td>Trust the developer</td>
      <td>Language specification</td>
    </tr>
  </tbody>
</table>

<h3 id="system-call-register-usage">System Call Register Usage</h3>

<p>The System V ABI specifies register usage for function calls. For <strong>system calls</strong>, Linux uses a <strong>modified</strong> System V convention:</p>

<p><strong>System V function call</strong> (used by <code class="language-plaintext highlighter-rouge">extern "C"</code>):</p>
<ul>
  <li>Arguments: <code class="language-plaintext highlighter-rouge">%rdi, %rsi, %rdx, %rcx, %r8, %r9</code></li>
  <li>Return: <code class="language-plaintext highlighter-rouge">%rax</code></li>
</ul>

<p><strong>Linux syscall</strong> (special case):</p>
<ul>
  <li>Syscall number: <code class="language-plaintext highlighter-rouge">%rax</code></li>
  <li>Arguments: <code class="language-plaintext highlighter-rouge">%rdi, %rsi, %rdx, %r10, %r8, %r9</code> (note: <code class="language-plaintext highlighter-rouge">%r10</code> instead of <code class="language-plaintext highlighter-rouge">%rcx</code>)</li>
  <li>Return: <code class="language-plaintext highlighter-rouge">%rax</code></li>
</ul>

<p>Rust respects both conventions:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Regular C function - uses standard System V ABI</span>
<span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">regular_function</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="nb">u64</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// a in %rdi, b in %rsi</span>
<span class="p">}</span>

<span class="c1">// System call wrapper - uses syscall convention</span>
<span class="nd">#[inline(always)]</span>
<span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">syscall1</span><span class="p">(</span><span class="n">n</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span> <span class="n">arg1</span><span class="p">:</span> <span class="nb">u64</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u64</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">ret</span><span class="p">:</span> <span class="nb">u64</span><span class="p">;</span>
    <span class="nn">core</span><span class="p">::</span><span class="nn">arch</span><span class="p">::</span><span class="nd">asm!</span><span class="p">(</span>
        <span class="s">"syscall"</span><span class="p">,</span>
        <span class="k">in</span><span class="p">(</span><span class="s">"rax"</span><span class="p">)</span> <span class="n">n</span><span class="p">,</span>     <span class="c1">// syscall number</span>
        <span class="k">in</span><span class="p">(</span><span class="s">"rdi"</span><span class="p">)</span> <span class="n">arg1</span><span class="p">,</span>  <span class="c1">// first argument</span>
        <span class="nf">lateout</span><span class="p">(</span><span class="s">"rax"</span><span class="p">)</span> <span class="n">ret</span><span class="p">,</span>
    <span class="p">);</span>
    <span class="n">ret</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="answer-can-rust-compile-to-system-v-abi">Answer: Can Rust Compile to System V ABI?</h3>

<p>✅ <strong>Yes, rustc guarantees System V ABI compliance through:</strong></p>
<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">extern "C"</code></strong> - Explicitly uses platform C ABI (System V on x86-64 Linux)</li>
  <li><strong><code class="language-plaintext highlighter-rouge">#[repr(C)]</code></strong> - Guarantees C-compatible data layout</li>
  <li><strong>Compile-time verification</strong> - Size/alignment assertions catch ABI breaks</li>
  <li><strong>Language specification</strong> - Stability across compiler versions</li>
</ol>

<p>This is not a “best effort” - it’s a <strong>language-level guarantee</strong> backed by the Rust specification.</p>

<hr />

<h2 id="question-1-rusts-userspace-interface-infrastructure">Question 1: Rust’s Userspace Interface Infrastructure</h2>

<h3 id="the-uapi-crate-userspace-api-bindings">The <code class="language-plaintext highlighter-rouge">uapi</code> Crate: Userspace API Bindings</h3>

<p>Rust provides a dedicated crate for userspace APIs. From the actual kernel source:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/uapi/lib.rs (actual kernel code)</span>
<span class="cd">//! UAPI Bindings.</span>
<span class="cd">//!</span>
<span class="cd">//! Contains the bindings generated by `bindgen` for UAPI interfaces.</span>
<span class="cd">//!</span>
<span class="cd">//! This crate may be used directly by drivers that need to interact with</span>
<span class="cd">//! userspace APIs.</span>

<span class="nd">#![no_std]</span>

<span class="c1">// Auto-generated UAPI bindings</span>
<span class="nd">include!</span><span class="p">(</span><span class="nd">concat!</span><span class="p">(</span><span class="nd">env!</span><span class="p">(</span><span class="s">"OBJTREE"</span><span class="p">),</span> <span class="s">"/rust/uapi/uapi_generated.rs"</span><span class="p">));</span>
</code></pre></div></div>

<p><strong>Key insight</strong>: The kernel has a <strong>separate <code class="language-plaintext highlighter-rouge">uapi</code> crate</strong> specifically for userspace interfaces, distinct from internal kernel APIs.</p>

<h3 id="ioctl-support-in-rust">ioctl Support in Rust</h3>

<p>The kernel provides full ioctl support for Rust drivers:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/ioctl.rs (actual kernel code)</span>
<span class="cd">//! `ioctl()` number definitions.</span>
<span class="cd">//!</span>
<span class="cd">//! C header: [`include/asm-generic/ioctl.h`](srctree/include/asm-generic/ioctl.h)</span>

<span class="cd">/// Build an ioctl number for a read-only ioctl.</span>
<span class="nd">#[inline(always)]</span>
<span class="k">pub</span> <span class="k">const</span> <span class="k">fn</span> <span class="n">_IOR</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">ty</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">nr</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="nf">_IOC</span><span class="p">(</span><span class="nn">uapi</span><span class="p">::</span><span class="n">_IOC_READ</span><span class="p">,</span> <span class="n">ty</span><span class="p">,</span> <span class="n">nr</span><span class="p">,</span> <span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">())</span>
<span class="p">}</span>

<span class="cd">/// Build an ioctl number for a write-only ioctl.</span>
<span class="nd">#[inline(always)]</span>
<span class="k">pub</span> <span class="k">const</span> <span class="k">fn</span> <span class="n">_IOW</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">ty</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">nr</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="nf">_IOC</span><span class="p">(</span><span class="nn">uapi</span><span class="p">::</span><span class="n">_IOC_WRITE</span><span class="p">,</span> <span class="n">ty</span><span class="p">,</span> <span class="n">nr</span><span class="p">,</span> <span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">())</span>
<span class="p">}</span>

<span class="cd">/// Build an ioctl number for a read-write ioctl.</span>
<span class="nd">#[inline(always)]</span>
<span class="k">pub</span> <span class="k">const</span> <span class="k">fn</span> <span class="n">_IOWR</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">ty</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">nr</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="nf">_IOC</span><span class="p">(</span>
        <span class="nn">uapi</span><span class="p">::</span><span class="n">_IOC_READ</span> <span class="p">|</span> <span class="nn">uapi</span><span class="p">::</span><span class="n">_IOC_WRITE</span><span class="p">,</span>
        <span class="n">ty</span><span class="p">,</span>
        <span class="n">nr</span><span class="p">,</span>
        <span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(),</span>
    <span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>This is identical to C’s ioctl macros</strong>, but with type safety.</p>

<h3 id="real-example-drm-driver-ioctl-interface">Real Example: DRM Driver ioctl Interface</h3>

<p>From the actual DRM subsystem Rust abstractions:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/drm/ioctl.rs (actual kernel code)</span>
<span class="cd">//! DRM IOCTL definitions.</span>

<span class="k">const</span> <span class="n">BASE</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="nn">uapi</span><span class="p">::</span><span class="n">DRM_IOCTL_BASE</span> <span class="k">as</span> <span class="nb">u32</span><span class="p">;</span>

<span class="cd">/// Construct a DRM ioctl number with a read-write argument.</span>
<span class="nd">#[allow(non_snake_case)]</span>
<span class="nd">#[inline(always)]</span>
<span class="k">pub</span> <span class="k">const</span> <span class="k">fn</span> <span class="n">IOWR</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">nr</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="nn">ioctl</span><span class="p">::</span><span class="nn">_IOWR</span><span class="p">::</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">BASE</span><span class="p">,</span> <span class="n">nr</span><span class="p">)</span>
<span class="p">}</span>

<span class="cd">/// Descriptor type for DRM ioctls.</span>
<span class="k">pub</span> <span class="k">type</span> <span class="n">DrmIoctlDescriptor</span> <span class="o">=</span> <span class="nn">bindings</span><span class="p">::</span><span class="n">drm_ioctl_desc</span><span class="p">;</span>

<span class="c1">// ioctl flags</span>
<span class="k">pub</span> <span class="k">const</span> <span class="n">AUTH</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="nn">bindings</span><span class="p">::</span><span class="n">drm_ioctl_flags_DRM_AUTH</span><span class="p">;</span>
<span class="k">pub</span> <span class="k">const</span> <span class="n">MASTER</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="nn">bindings</span><span class="p">::</span><span class="n">drm_ioctl_flags_DRM_MASTER</span><span class="p">;</span>
<span class="k">pub</span> <span class="k">const</span> <span class="n">RENDER_ALLOW</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="nn">bindings</span><span class="p">::</span><span class="n">drm_ioctl_flags_DRM_RENDER_ALLOW</span><span class="p">;</span>
</code></pre></div></div>

<p><strong>Usage in drivers:</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Declaring DRM ioctls in a Rust driver</span>
<span class="nn">kernel</span><span class="p">::</span><span class="nd">declare_drm_ioctls!</span> <span class="p">{</span>
    <span class="p">(</span><span class="n">NOVA_GETPARAM</span><span class="p">,</span> <span class="n">drm_nova_getparam</span><span class="p">,</span> <span class="nn">ioctl</span><span class="p">::</span><span class="n">RENDER_ALLOW</span><span class="p">,</span> <span class="n">my_get_param_handler</span><span class="p">),</span>
    <span class="p">(</span><span class="n">NOVA_GEM_CREATE</span><span class="p">,</span> <span class="n">drm_nova_gem_create</span><span class="p">,</span> <span class="nn">ioctl</span><span class="p">::</span><span class="n">AUTH</span> <span class="p">|</span> <span class="nn">ioctl</span><span class="p">::</span><span class="n">RENDER_ALLOW</span><span class="p">,</span> <span class="n">gem_create</span><span class="p">),</span>
    <span class="p">(</span><span class="n">NOVA_VM_BIND</span><span class="p">,</span> <span class="n">drm_nova_vm_bind</span><span class="p">,</span> <span class="nn">ioctl</span><span class="p">::</span><span class="n">AUTH</span> <span class="p">|</span> <span class="nn">ioctl</span><span class="p">::</span><span class="n">RENDER_ALLOW</span><span class="p">,</span> <span class="n">vm_bind</span><span class="p">),</span>
<span class="p">}</span>
</code></pre></div></div>

<p>These ioctls are <strong>directly exposed to userspace</strong> - the same ABI as C drivers.</p>

<h3 id="reference-example-android-binder-userspace-protocol">Reference Example: Android Binder Userspace Protocol</h3>

<p>The Android Binder Rust rewrite (out-of-tree) demonstrates how to expose extensive userspace APIs:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Example from Rust-for-Linux Binder implementation (not in mainline)</span>
<span class="k">use</span> <span class="nn">kernel</span><span class="p">::{</span>
    <span class="nn">transmute</span><span class="p">::{</span><span class="n">AsBytes</span><span class="p">,</span> <span class="n">FromBytes</span><span class="p">},</span>
    <span class="nn">uapi</span><span class="p">::{</span><span class="k">self</span><span class="p">,</span> <span class="o">*</span><span class="p">},</span>
<span class="p">};</span>

<span class="c1">// Userspace protocol constants - MUST remain stable</span>
<span class="nd">pub_no_prefix!</span><span class="p">(</span>
    <span class="n">binder_driver_return_protocol_</span><span class="p">,</span>
    <span class="n">BR_TRANSACTION</span><span class="p">,</span>
    <span class="n">BR_REPLY</span><span class="p">,</span>
    <span class="n">BR_DEAD_REPLY</span><span class="p">,</span>
    <span class="n">BR_FAILED_REPLY</span><span class="p">,</span>
    <span class="n">BR_OK</span><span class="p">,</span>
    <span class="n">BR_ERROR</span><span class="p">,</span>
    <span class="n">BR_INCREFS</span><span class="p">,</span>
    <span class="n">BR_ACQUIRE</span><span class="p">,</span>
    <span class="n">BR_RELEASE</span><span class="p">,</span>
    <span class="n">BR_DECREFS</span><span class="p">,</span>
    <span class="n">BR_DEAD_BINDER</span><span class="p">,</span>
    <span class="c1">// ... 21 total protocol constants</span>
<span class="p">);</span>

<span class="nd">pub_no_prefix!</span><span class="p">(</span>
    <span class="n">binder_driver_command_protocol_</span><span class="p">,</span>
    <span class="n">BC_TRANSACTION</span><span class="p">,</span>
    <span class="n">BC_REPLY</span><span class="p">,</span>
    <span class="n">BC_FREE_BUFFER</span><span class="p">,</span>
    <span class="n">BC_INCREFS</span><span class="p">,</span>
    <span class="n">BC_ACQUIRE</span><span class="p">,</span>
    <span class="n">BC_RELEASE</span><span class="p">,</span>
    <span class="n">BC_DECREFS</span><span class="p">,</span>
    <span class="c1">// ... 24 total command constants</span>
<span class="p">);</span>

<span class="c1">// Userspace data structures - wrapped to preserve ABI</span>
<span class="nd">decl_wrapper!</span><span class="p">(</span><span class="n">BinderTransactionData</span><span class="p">,</span> <span class="nn">uapi</span><span class="p">::</span><span class="n">binder_transaction_data</span><span class="p">);</span>
<span class="nd">decl_wrapper!</span><span class="p">(</span><span class="n">BinderWriteRead</span><span class="p">,</span> <span class="nn">uapi</span><span class="p">::</span><span class="n">binder_write_read</span><span class="p">);</span>
<span class="nd">decl_wrapper!</span><span class="p">(</span><span class="n">BinderVersion</span><span class="p">,</span> <span class="nn">uapi</span><span class="p">::</span><span class="n">binder_version</span><span class="p">);</span>
<span class="nd">decl_wrapper!</span><span class="p">(</span><span class="n">FlatBinderObject</span><span class="p">,</span> <span class="nn">uapi</span><span class="p">::</span><span class="n">flat_binder_object</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>Critical detail</strong>: These use <code class="language-plaintext highlighter-rouge">MaybeUninit</code> to <strong>preserve padding bytes</strong>, ensuring binary-identical ABI with C:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Wrapper that preserves exact memory layout, including padding</span>
<span class="nd">#[derive(Copy,</span> <span class="nd">Clone)]</span>
<span class="nd">#[repr(transparent)]</span>
<span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="k">struct</span> <span class="nf">BinderTransactionData</span><span class="p">(</span><span class="n">MaybeUninit</span><span class="o">&lt;</span><span class="nn">uapi</span><span class="p">::</span><span class="n">binder_transaction_data</span><span class="o">&gt;</span><span class="p">);</span>

<span class="c1">// SAFETY: Explicit FromBytes/AsBytes implementation</span>
<span class="k">unsafe</span> <span class="k">impl</span> <span class="n">FromBytes</span> <span class="k">for</span> <span class="n">BinderTransactionData</span> <span class="p">{}</span>
<span class="k">unsafe</span> <span class="k">impl</span> <span class="n">AsBytes</span> <span class="k">for</span> <span class="n">BinderTransactionData</span> <span class="p">{}</span>
</code></pre></div></div>

<p><strong>Why this matters</strong>: Userspace code compiled against C headers sends <strong>exact same binary data</strong> to Rust driver.</p>

<h3 id="userspace-interface-summary">Userspace Interface Summary</h3>

<table>
  <thead>
    <tr>
      <th>Interface Type</th>
      <th>Rust Support</th>
      <th>Example</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>ioctl handlers</strong></td>
      <td>✅ Full support (drivers handle commands)</td>
      <td>DRM drivers, Binder</td>
    </tr>
    <tr>
      <td><strong>/dev device nodes</strong></td>
      <td>✅ Via miscdevice/cdev</td>
      <td>Character devices</td>
    </tr>
    <tr>
      <td><strong>/sys (sysfs)</strong></td>
      <td>✅ Via kobject bindings</td>
      <td>Device attributes</td>
    </tr>
    <tr>
      <td><strong>/proc</strong></td>
      <td>✅ Via seq_file</td>
      <td>Process info</td>
    </tr>
    <tr>
      <td><strong>Defining new syscalls</strong></td>
      <td>❌ Not possible (syscall entry is C)</td>
      <td>-</td>
    </tr>
    <tr>
      <td><strong>Netlink</strong></td>
      <td>✅ Via net subsystem</td>
      <td>Network configuration</td>
    </tr>
  </tbody>
</table>

<p><strong>Important distinction</strong>: Rust drivers can <strong>handle</strong> ioctl commands (the driver-specific logic), but the ioctl <strong>system call entry point</strong> itself (in <code class="language-plaintext highlighter-rouge">fs/ioctl.c</code>) remains C code. The same applies to other interfaces - Rust provides the handler, not the core mechanism.</p>

<p><strong>Answer</strong>: Yes, Rust <strong>fully supports</strong> userspace interfaces through standard kernel mechanisms, though the core system call layer remains in C.</p>

<h2 id="critical-clarification-userspace-programs-cannot-use-rustkernel">Critical Clarification: Userspace Programs Cannot Use <code class="language-plaintext highlighter-rouge">rust/kernel</code></h2>

<p><strong>A common misconception</strong>: “Can my userspace Rust program use the <code class="language-plaintext highlighter-rouge">rust/kernel</code> abstractions?”</p>

<p><strong>Answer: Absolutely not.</strong> This is a fundamental architectural constraint, not a technical limitation.</p>

<h3 id="kernel-space-vs-userspace---complete-isolation">Kernel Space vs. Userspace - Complete Isolation</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────────────────────────────────────────────┐
│              USERSPACE                                   │
│  - Uses Rust standard library (std)                     │
│  - Normal Rust programs                                 │
│  - Can use tokio, serde, etc.                          │
│                                                          │
│  Userspace Rust program:                                │
│  ┌────────────────────────────────────────┐            │
│  │ use std::fs::File;                      │            │
│  │ use std::os::unix::io::AsRawFd;        │            │
│  │                                         │            │
│  │ fn main() {                             │            │
│  │     let fd = File::open("/dev/my_dev") │            │
│  │         .unwrap();                      │            │
│  │     // Interact with kernel via syscalls│           │
│  │     unsafe {                             │            │
│  │         libc::ioctl(fd.as_raw_fd(), ...) │           │
│  │     }                                    │            │
│  │ }                                        │            │
│  └────────────────────────────────────────┘            │
└──────────────────┬──────────────────────────────────────┘
                   │
                   │  System Call Boundary
                   │  - open(), ioctl(), read(), write()
                   │  - /dev, /sys, /proc interfaces
                   │  - ❌ Cannot directly call kernel functions
                   │
┌──────────────────┴──────────────────────────────────────┐
│              KERNEL SPACE                                │
│  - Uses #![no_std] (no standard library)                │
│  - Runs only in kernel modules                          │
│  - Uses rust/kernel abstractions                        │
│                                                          │
│  Kernel Rust driver:                                    │
│  ┌────────────────────────────────────────┐            │
│  │ #![no_std]                             │            │
│  │ use kernel::prelude::*;                │            │
│  │                                         │            │
│  │ impl kernel::file::Operations for MyDev│            │
│  │     fn ioctl(...) -&gt; Result {          │            │
│  │         // Handle userspace ioctl      │            │
│  │         kernel::sync::SpinLock::...     │            │
│  │     }                                   │            │
│  │ }                                       │            │
│  └────────────────────────────────────────┘            │
└─────────────────────────────────────────────────────────┘
</code></pre></div></div>

<h3 id="why-userspace-cannot-use-rustkernel">Why Userspace Cannot Use <code class="language-plaintext highlighter-rouge">rust/kernel</code></h3>

<p><strong>1. <code class="language-plaintext highlighter-rouge">#![no_std]</code> - No Standard Library</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/lib.rs (library crate root)</span>
<span class="nd">#![no_std]</span>  <span class="c1">// ← Critical: No standard library!</span>

<span class="c1">// Kernel space does NOT have:</span>
<span class="c1">// - Heap allocation (must use GFP_KERNEL)</span>
<span class="c1">// - Threads (uses kernel tasks)</span>
<span class="c1">// - File system (userspace concept)</span>
<span class="c1">// - Network libraries (userspace concept)</span>
<span class="c1">// - println!() (uses pr_info!())</span>

<span class="c1">// Only has:</span>
<span class="c1">// - core library (no OS required)</span>
<span class="c1">// - Kernel-specific APIs</span>
</code></pre></div></div>

<p><strong>Note</strong>: The <code class="language-plaintext highlighter-rouge">#![no_std]</code> attribute is only declared in library crate roots like <code class="language-plaintext highlighter-rouge">rust/kernel/lib.rs</code>, <code class="language-plaintext highlighter-rouge">rust/bindings/lib.rs</code>, etc. Individual driver modules (e.g., <code class="language-plaintext highlighter-rouge">drivers/gpu/drm/nova/driver.rs</code>) do NOT need this declaration - they inherit the no_std environment by using the kernel library via <code class="language-plaintext highlighter-rouge">use kernel::prelude::*</code>.</p>

<p><strong>2. Different Compilation Targets</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Userspace Rust program</span>
<span class="nv">$ </span>rustc <span class="nt">--target</span> x86_64-unknown-linux-gnu userspace.rs
<span class="c"># Compiles to userspace executable</span>

<span class="c"># Kernel Rust module</span>
<span class="nv">$ </span>rustc <span class="nt">--target</span> x86_64-linux-kernel module.rs
<span class="c"># Compiles to kernel module (.ko file)</span>
<span class="c"># Linked into kernel, cannot run in userspace</span>
</code></pre></div></div>

<p><strong>3. Memory Space Isolation</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Virtual Address Space:
┌─────────────────────┐ 0xFFFFFFFFFFFFFFFF
│   Kernel Space       │ ← rust/kernel runs here
│   (kernel code only) │   Only accessible via syscalls
├─────────────────────┤ 0x00007FFFFFFFFFFF
│   Userspace          │ ← User Rust programs run here
│   (applications)     │   Cannot access kernel memory
└─────────────────────┘ 0x0000000000000000
</code></pre></div></div>

<h3 id="how-userspace-programs-interact-with-rust-kernel-drivers">How Userspace Programs Interact with Rust Kernel Drivers</h3>

<p><strong>Method 1: Via <code class="language-plaintext highlighter-rouge">/dev</code> Device Nodes</strong></p>

<p><strong>Kernel side (Rust driver):</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/example/my_device.rs</span>
<span class="k">use</span> <span class="nn">kernel</span><span class="p">::</span><span class="nn">prelude</span><span class="p">::</span><span class="o">*</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">kernel</span><span class="p">::</span><span class="nn">file</span><span class="p">::</span><span class="n">Operations</span><span class="p">;</span>

<span class="k">struct</span> <span class="n">MyDevice</span><span class="p">;</span>

<span class="k">impl</span> <span class="n">Operations</span> <span class="k">for</span> <span class="n">MyDevice</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">open</span><span class="p">(</span><span class="o">...</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="nd">pr_info!</span><span class="p">(</span><span class="s">"Device opened from userspace</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="nf">Ok</span><span class="p">(</span><span class="n">MyDevice</span><span class="p">)</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">ioctl</span><span class="p">(</span><span class="n">cmd</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">arg</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">isize</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">match</span> <span class="n">cmd</span> <span class="p">{</span>
            <span class="n">MY_IOCTL_CMD</span> <span class="k">=&gt;</span> <span class="p">{</span>
                <span class="c1">// Handle userspace ioctl request</span>
                <span class="nf">Ok</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
            <span class="p">}</span>
            <span class="n">_</span> <span class="k">=&gt;</span> <span class="nf">Err</span><span class="p">(</span><span class="n">EINVAL</span><span class="p">),</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Userspace (standard Rust program):</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// userspace_app/src/main.rs</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">fs</span><span class="p">::</span><span class="n">File</span><span class="p">;</span>  <span class="c1">// ← Uses standard library!</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">os</span><span class="p">::</span><span class="nn">unix</span><span class="p">::</span><span class="nn">io</span><span class="p">::</span><span class="n">AsRawFd</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// Open device created by Rust kernel driver</span>
    <span class="k">let</span> <span class="n">file</span> <span class="o">=</span> <span class="nn">File</span><span class="p">::</span><span class="nf">open</span><span class="p">(</span><span class="s">"/dev/my_device"</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="c1">// Interact via system calls</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">ret</span> <span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="nf">ioctl</span><span class="p">(</span>
            <span class="n">file</span><span class="nf">.as_raw_fd</span><span class="p">(),</span>
            <span class="n">MY_IOCTL_CMD</span><span class="p">,</span>
            <span class="o">&amp;</span><span class="n">my_data</span>
        <span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Userspace has no idea if kernel is C or Rust!</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Method 2: Via <code class="language-plaintext highlighter-rouge">sysfs</code></strong></p>

<p><strong>Kernel side:</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Create sysfs attribute in kernel</span>
<span class="k">use</span> <span class="nn">kernel</span><span class="p">::</span><span class="nn">device</span><span class="p">::</span><span class="n">Device</span><span class="p">;</span>

<span class="k">impl</span> <span class="n">Device</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">create_sysfs_attrs</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
        <span class="c1">// Creates /sys/class/my_device/value</span>
        <span class="nf">sysfs_create_file</span><span class="p">(</span><span class="o">...</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="nf">Ok</span><span class="p">(())</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Userspace:</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="n">fs</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// Read sysfs file (provided by Rust kernel driver)</span>
    <span class="k">let</span> <span class="n">value</span> <span class="o">=</span> <span class="nn">fs</span><span class="p">::</span><span class="nf">read_to_string</span><span class="p">(</span>
        <span class="s">"/sys/class/my_device/value"</span>
    <span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="nd">println!</span><span class="p">(</span><span class="s">"Value from kernel: {}"</span><span class="p">,</span> <span class="n">value</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Method 3: Via <code class="language-plaintext highlighter-rouge">netlink</code> (Network Drivers)</strong></p>

<p><strong>Kernel side:</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">kernel</span><span class="p">::</span><span class="n">net</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">send_netlink_msg</span><span class="p">(</span><span class="n">msg</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">NetlinkMsg</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
    <span class="nf">netlink_broadcast</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Userspace:</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">netlink_sys</span><span class="p">::{</span><span class="n">Socket</span><span class="p">,</span> <span class="n">SocketAddr</span><span class="p">};</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">socket</span> <span class="o">=</span> <span class="nn">Socket</span><span class="p">::</span><span class="nf">new</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
    <span class="c1">// Receive netlink messages from Rust kernel driver</span>
    <span class="k">let</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">socket</span><span class="nf">.recv_from</span><span class="p">(</span><span class="o">...</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="comparison-table">Comparison Table</h3>

<table>
  <thead>
    <tr>
      <th>Feature</th>
      <th>Kernel Space (<code class="language-plaintext highlighter-rouge">rust/kernel</code>)</th>
      <th>Userspace (std Rust)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Standard library</strong></td>
      <td>❌ <code class="language-plaintext highlighter-rouge">#![no_std]</code></td>
      <td>✅ <code class="language-plaintext highlighter-rouge">use std::*</code></td>
    </tr>
    <tr>
      <td><strong>Runtime environment</strong></td>
      <td>Kernel module (.ko)</td>
      <td>Executable (ELF)</td>
    </tr>
    <tr>
      <td><strong>Memory allocation</strong></td>
      <td><code class="language-plaintext highlighter-rouge">kernel::kvec::KVec</code></td>
      <td><code class="language-plaintext highlighter-rouge">std::vec::Vec</code></td>
    </tr>
    <tr>
      <td><strong>Printing</strong></td>
      <td><code class="language-plaintext highlighter-rouge">pr_info!()</code></td>
      <td><code class="language-plaintext highlighter-rouge">println!()</code></td>
    </tr>
    <tr>
      <td><strong>File operations</strong></td>
      <td>❌ Cannot open files</td>
      <td>✅ <code class="language-plaintext highlighter-rouge">std::fs::File</code></td>
    </tr>
    <tr>
      <td><strong>Networking</strong></td>
      <td>Provides network services</td>
      <td>Uses network services</td>
    </tr>
    <tr>
      <td><strong>Hardware access</strong></td>
      <td>✅ Direct access</td>
      <td>❌ Via system calls</td>
    </tr>
    <tr>
      <td><strong>Privilege level</strong></td>
      <td>Ring 0</td>
      <td>Ring 3</td>
    </tr>
    <tr>
      <td><strong>Available crates</strong></td>
      <td>Very few (no_std only)</td>
      <td>All standard crates</td>
    </tr>
  </tbody>
</table>

<h3 id="complete-example-userspace-reading-gpu-info">Complete Example: Userspace Reading GPU Info</h3>

<p><strong>1. Kernel Rust GPU driver:</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/gpu/drm/nova/driver.rs</span>
<span class="k">use</span> <span class="nn">kernel</span><span class="p">::</span><span class="n">drm</span><span class="p">;</span>

<span class="k">impl</span> <span class="nn">drm</span><span class="p">::</span><span class="n">Driver</span> <span class="k">for</span> <span class="n">NovaDriver</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">ioctl</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">cmd</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
        <span class="k">match</span> <span class="n">cmd</span> <span class="p">{</span>
            <span class="n">DRM_NOVA_GET_PARAM</span> <span class="k">=&gt;</span> <span class="p">{</span>
                <span class="c1">// Read GPU parameter</span>
                <span class="k">let</span> <span class="n">param</span> <span class="o">=</span> <span class="k">self</span><span class="nf">.get_gpu_param</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
                <span class="c1">// Copy to userspace</span>
                <span class="n">data</span><span class="nf">.copy_from_slice</span><span class="p">(</span><span class="o">&amp;</span><span class="n">param</span><span class="nf">.to_bytes</span><span class="p">());</span>
                <span class="nf">Ok</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
            <span class="p">}</span>
            <span class="n">_</span> <span class="k">=&gt;</span> <span class="nf">Err</span><span class="p">(</span><span class="n">EINVAL</span><span class="p">),</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>2. Userspace Rust application:</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// userspace_app/src/main.rs</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">fs</span><span class="p">::</span><span class="n">OpenOptions</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">os</span><span class="p">::</span><span class="nn">unix</span><span class="p">::</span><span class="nn">io</span><span class="p">::</span><span class="n">AsRawFd</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// Open DRM device</span>
    <span class="k">let</span> <span class="n">drm_device</span> <span class="o">=</span> <span class="nn">OpenOptions</span><span class="p">::</span><span class="nf">new</span><span class="p">()</span>
        <span class="nf">.read</span><span class="p">(</span><span class="k">true</span><span class="p">)</span>
        <span class="nf">.write</span><span class="p">(</span><span class="k">true</span><span class="p">)</span>
        <span class="nf">.open</span><span class="p">(</span><span class="s">"/dev/dri/renderD128"</span><span class="p">)</span>
        <span class="nf">.unwrap</span><span class="p">();</span>

    <span class="k">let</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">drm_device</span><span class="nf">.as_raw_fd</span><span class="p">();</span>

    <span class="c1">// Prepare ioctl argument</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">param_data</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0u8</span><span class="p">;</span> <span class="mi">64</span><span class="p">];</span>

    <span class="c1">// Call ioctl (enters kernel)</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nn">libc</span><span class="p">::</span><span class="nf">ioctl</span><span class="p">(</span>
            <span class="n">fd</span><span class="p">,</span>
            <span class="n">DRM_NOVA_GET_PARAM</span><span class="p">,</span>
            <span class="o">&amp;</span><span class="k">mut</span> <span class="n">param_data</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">_</span>
        <span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// param_data now contains GPU parameters from kernel</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"GPU param: {:?}"</span><span class="p">,</span> <span class="n">param_data</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="key-takeaways">Key Takeaways</h3>

<ol>
  <li>❌ <strong>Userspace programs CANNOT use <code class="language-plaintext highlighter-rouge">rust/kernel</code></strong> - they run in completely different environments</li>
  <li>✅ <strong>Userspace interacts with kernel via system calls</strong> - just like with C drivers</li>
  <li>🔄 <strong>Interaction is bidirectional but indirect</strong>:
    <ul>
      <li>Userspace → syscall/ioctl/filesystem → Rust kernel driver</li>
      <li>Rust kernel driver → response/data → syscall return → Userspace</li>
    </ul>
  </li>
</ol>

<p><strong>Userspace has no idea if the kernel driver is C or Rust - this is exactly what ABI stability means!</strong> 🎯</p>

<h2 id="question-2-kernel-internal-abi-stability-policy">Question 2: Kernel Internal ABI Stability Policy</h2>

<h3 id="the-critical-distinction">The Critical Distinction</h3>

<p>Linux kernel has <strong>two completely different ABI policies</strong>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────────────────────────────────────────┐
│                  USERSPACE                          │
│  (applications, libraries, tools)                   │
└─────────────────┬───────────────────────────────────┘
                  │
                  │  ← USERSPACE ABI (STABLE, SACRED)
                  │     System calls, ioctl, /proc, /sys
                  │     "WE DO NOT BREAK USERSPACE" - Linus
                  │
┌─────────────────┴───────────────────────────────────┐
│            LINUX KERNEL                             │
│  ┌─────────────────────────────────────────┐       │
│  │  Kernel Subsystems (VFS, MM, Net, etc)  │       │
│  └─────────────────┬───────────────────────┘       │
│                    │                                │
│                    │  ← INTERNAL API (UNSTABLE!)    │
│                    │     Can change anytime         │
│                    │     No backward compat         │
│  ┌─────────────────┴───────────────────────┐       │
│  │  Loadable Kernel Modules (.ko files)    │       │
│  │  (drivers, filesystems, etc)             │       │
│  └─────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────┘
</code></pre></div></div>

<h3 id="official-kernel-policy-internal-abi-is-unstable">Official Kernel Policy: Internal ABI is Unstable</h3>

<p>From the Linux kernel documentation<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<blockquote>
  <p><strong>The kernel does NOT have a stable internal API/ABI.</strong></p>

  <p>The kernel internal API can and does change at any time, for any reason.</p>
</blockquote>

<p><strong>In practice</strong>: If you compile a kernel module for Linux 6.5, it <strong>will not load</strong> on Linux 6.6 without recompilation.</p>

<h3 id="why-internal-abi-is-unstable">Why Internal ABI is Unstable</h3>

<p>Greg Kroah-Hartman explained this in his famous document:</p>

<p><strong>Reasons for no internal ABI stability:</strong></p>

<ol>
  <li><strong>Rapid evolution</strong>: Subsystems need freedom to refactor</li>
  <li><strong>No binary modules</strong>: All modules must be GPL and recompilable</li>
  <li><strong>Quality control</strong>: Forces out-of-tree drivers to stay updated</li>
  <li><strong>Security</strong>: Allows fixing fundamental design flaws</li>
</ol>

<p><strong>The philosophy</strong>: “If your code is good enough, it should be in-tree. If it’s in-tree, recompilation is free.”</p>

<h3 id="userspace-abi-absolute-stability">Userspace ABI: Absolute Stability</h3>

<p>Linus Torvalds’ famous rule (paraphrased from countless LKML posts):</p>

<blockquote>
  <p><strong>“WE DO NOT BREAK USERSPACE. EVER.”</strong></p>

  <p>If a kernel change breaks a working userspace application, that change <strong>will be reverted</strong>, no matter how “correct” it was.</p>
</blockquote>

<p>From the official documentation<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>:</p>

<blockquote>
  <p><strong>Stable interfaces:</strong></p>
  <ul>
    <li>System calls: Must never change semantics</li>
    <li>/proc and /sys ABI: Guaranteed stable for at least 2 years</li>
    <li>ioctl numbers: Never reused once defined</li>
    <li>Binary formats (ELF, etc): Backward compatible</li>
  </ul>
</blockquote>

<h3 id="real-example-abi-stability-levels">Real Example: ABI Stability Levels</h3>

<p>From <code class="language-plaintext highlighter-rouge">/Documentation/ABI/README</code><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>stable/     - Interfaces with guaranteed backward compatibility
              Examples: syscalls, core /proc entries

testing/    - Interfaces believed stable but not yet guaranteed
              May still change with warning

obsolete/   - Deprecated but still present interfaces
              Marked for removal but with migration period

removed/    - Historical record only
</code></pre></div></div>

<p><strong>Answer</strong>: The kernel <strong>does not pursue internal ABI stability</strong>. Only <strong>userspace ABI</strong> is stable.</p>

<h2 id="question-3-rust-and-userspace-abi-stability">Question 3: Rust and Userspace ABI Stability</h2>

<h3 id="current-state-rust-provides-stable-userspace-abi">Current State: Rust Provides Stable Userspace ABI</h3>

<p><strong>Production drivers in mainline</strong> (as of Linux 6.x):</p>

<ol>
  <li><strong>GPU drivers (Nova)</strong>: DRM userspace ABI for Nvidia GPUs - full ioctl interface</li>
  <li><strong>Network PHY drivers</strong> (ax88796b, qt2025): ethtool/netlink ABI</li>
  <li><strong>Block devices</strong> (rnull): Standard block device ioctl ABI</li>
  <li><strong>CPU frequency</strong> (rcpufreq_dt): sysfs and ioctl interfaces</li>
</ol>

<p><strong>Reference implementations (out-of-tree)</strong>:</p>

<p><strong>Android Binder</strong> (Rust rewrite, not yet in mainline): Demonstrates <strong>identical userspace ABI</strong> as C version:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Same BINDER_WRITE_READ ioctl as C version</span>
<span class="k">const</span> <span class="n">BINDER_WRITE_READ</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="nn">kernel</span><span class="p">::</span><span class="nn">ioctl</span><span class="p">::</span><span class="nn">_IOWR</span><span class="p">::</span><span class="o">&lt;</span><span class="n">BinderWriteRead</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="n">BINDER_TYPE</span> <span class="k">as</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="mi">1</span>
<span class="p">);</span>

<span class="c1">// Userspace code using C headers sends exact same binary data</span>
</code></pre></div></div>

<p>This out-of-tree implementation has been <strong>validated</strong> - Android’s libbinder (C++ userspace library) works without modification with the Rust driver.</p>

<h3 id="why-rust-is-actually-better-for-abi-stability">Why Rust is Actually Better for ABI Stability</h3>

<p><strong>Problem in C</strong>: Accidental ABI breakage</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C - easy to accidentally change ABI</span>
<span class="k">struct</span> <span class="n">binder_transaction_data</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">cookie</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">code</span><span class="p">;</span>
    <span class="c1">// Oops, developer adds field here - ABI BROKEN!</span>
    <span class="kt">uint32_t</span> <span class="n">new_field</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">flags</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p><strong>Rust solution</strong>: Explicit versioning and <code class="language-plaintext highlighter-rouge">#[repr(C)]</code></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Rust - ABI layout is explicit and checked</span>
<span class="nd">#[repr(C)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">binder_transaction_data</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">cookie</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">code</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="c1">// Cannot add field here without explicit version bump</span>
    <span class="k">pub</span> <span class="n">flags</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
<span class="p">}</span>

<span class="c1">// Compile-time size check</span>
<span class="k">const</span> <span class="n">_</span><span class="p">:</span> <span class="p">()</span> <span class="o">=</span> <span class="nd">assert!</span><span class="p">(</span>
    <span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">binder_transaction_data</span><span class="o">&gt;</span><span class="p">()</span> <span class="o">==</span> <span class="mi">48</span>
<span class="p">);</span>
</code></pre></div></div>

<h3 id="real-example-drm-driver-backward-compatibility">Real Example: DRM Driver Backward Compatibility</h3>

<p>From the Nova GPU driver (Rust):</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Must maintain compatibility with userspace mesa drivers</span>
<span class="k">pub</span> <span class="k">const</span> <span class="n">DRM_NOVA_GEM_CREATE</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="nn">drm</span><span class="p">::</span><span class="nn">ioctl</span><span class="p">::</span><span class="nn">IOWR</span><span class="p">::</span><span class="o">&lt;</span><span class="n">drm_nova_gem_create</span><span class="o">&gt;</span><span class="p">(</span><span class="mi">0x00</span><span class="p">);</span>
<span class="k">pub</span> <span class="k">const</span> <span class="n">DRM_NOVA_GEM_INFO</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="nn">drm</span><span class="p">::</span><span class="nn">ioctl</span><span class="p">::</span><span class="nn">IOWR</span><span class="p">::</span><span class="o">&lt;</span><span class="n">drm_nova_gem_info</span><span class="o">&gt;</span><span class="p">(</span><span class="mi">0x01</span><span class="p">);</span>

<span class="c1">// Once these ioctl numbers are released, they NEVER change</span>
<span class="c1">// Rust's type system helps prevent accidental changes:</span>

<span class="nd">#[repr(C)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">drm_nova_gem_create</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">size</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">handle</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">flags</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
<span class="p">}</span>

<span class="c1">// If someone tries to change this, compilation breaks due to size assertions</span>
</code></pre></div></div>

<h3 id="abi-stability-rust-vs-c-comparison">ABI Stability: Rust vs C Comparison</h3>

<table>
  <thead>
    <tr>
      <th>Aspect</th>
      <th>C</th>
      <th>Rust</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Layout control</strong></td>
      <td>Implicit, compiler-dependent</td>
      <td><code class="language-plaintext highlighter-rouge">#[repr(C)]</code> explicit</td>
    </tr>
    <tr>
      <td><strong>Padding preservation</strong></td>
      <td>Manual, error-prone</td>
      <td><code class="language-plaintext highlighter-rouge">MaybeUninit</code> automatic</td>
    </tr>
    <tr>
      <td><strong>Size verification</strong></td>
      <td>Manual <code class="language-plaintext highlighter-rouge">BUILD_BUG_ON</code></td>
      <td><code class="language-plaintext highlighter-rouge">const _: assert!(size == X)</code></td>
    </tr>
    <tr>
      <td><strong>Breaking changes</strong></td>
      <td>Silent, runtime failure</td>
      <td>Compile error</td>
    </tr>
    <tr>
      <td><strong>Versioning</strong></td>
      <td>Manual, by convention</td>
      <td>Can be enforced by type system</td>
    </tr>
    <tr>
      <td><strong>Binary compatibility</strong></td>
      <td>Trust the developer</td>
      <td>Compiler-verified</td>
    </tr>
  </tbody>
</table>

<h3 id="will-rust-provide-critical-userspace-abi">Will Rust Provide Critical Userspace ABI?</h3>

<p><strong>Production deployments (mainline kernel):</strong></p>

<ol>
  <li><strong>GPU drivers</strong> (Nova): DRM userspace ABI for Nvidia GPUs (13 files in-tree)</li>
  <li><strong>Network PHY drivers</strong>: ethtool/netlink ABI (ax88796b, qt2025)</li>
  <li><strong>Block devices</strong>: rnull driver with standard ioctl ABI</li>
  <li><strong>CPU frequency</strong>: rcpufreq_dt with sysfs interfaces</li>
</ol>

<p><strong>Reference implementations (out-of-tree):</strong></p>

<ol>
  <li><strong>Android Binder</strong> (IPC): Rust rewrite demonstrates ABI compatibility (not yet mainline)</li>
</ol>

<p><strong>Coming soon</strong> (based on current development):</p>

<ol>
  <li><strong>File systems</strong>: VFS operations, mount options</li>
  <li><strong>Network protocols</strong>: Socket options, packet formats</li>
  <li><strong>More device drivers</strong>: Expanding hardware support</li>
</ol>

<h3 id="the-key-policy-language-agnostic-abi">The Key Policy: Language-Agnostic ABI</h3>

<p><strong>Critical insight</strong>: The kernel’s ABI stability policy is <strong>language-agnostic</strong>.</p>

<p>From Linus Torvalds (summarized from various LKML posts):</p>

<blockquote>
  <p>“I don’t care if you write it in C, Rust, or assembly. If you break userspace, you broke the kernel.”</p>
</blockquote>

<p><strong>In practice</strong>:</p>
<ul>
  <li>Rust drivers use <strong>same UAPI headers</strong> as C via bindgen</li>
  <li>Same ioctl numbers, same struct layouts, same semantics</li>
  <li>Userspace <strong>cannot tell</strong> if driver is C or Rust</li>
  <li>ABI breaks are <strong>equally unacceptable</strong> in both languages</li>
</ul>

<p><strong>Answer</strong>: Yes, Rust <strong>will be and already is</strong> used for userspace-facing features requiring ABI stability.</p>

<h2 id="current-scope-peripheral-drivers-not-core-kernel">Current Scope: Peripheral Drivers, Not Core Kernel</h2>

<p><strong>Critical clarification</strong>: As of early 2026, Rust in the Linux kernel is <strong>exclusively in peripheral areas</strong> - device drivers and Android-specific components. <strong>No core kernel subsystems have been rewritten in Rust.</strong></p>

<h3 id="-where-rust-code-exists">✅ Where Rust Code Exists</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>drivers/                    # Peripheral driver layer
├── gpu/drm/nova/          # GPU driver (Nvidia, 13 files, ~1,200 lines)
├── net/phy/               # Network PHY drivers (2 files, ~237 lines)
├── block/rnull.rs         # Block device example (80 lines)
├── cpufreq/rcpufreq_dt.rs # CPU frequency management (227 lines)
└── gpu/drm/drm_panic_qr.rs # DRM panic QR code (996 lines)

rust/kernel/               # Abstraction layer (101 files, 13,500 lines)
├── sync/                  # Rust bindings for sync primitives
├── mm/                    # Rust bindings for memory functions
├── fs/                    # Rust bindings for filesystem
└── net/                   # Rust bindings for networking
</code></pre></div></div>

<p><strong>Key point</strong>: The <code class="language-plaintext highlighter-rouge">rust/kernel/</code> directory provides <strong>abstractions</strong> (safe wrappers around C APIs), not <strong>implementations</strong> of core functionality.</p>

<h3 id="-what-remains-100-c-core-kernel">❌ What Remains 100% C (Core Kernel)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mm/                        # Memory management core
├── 153 files, 128 C files
├── page_alloc.c          # Page allocator (9,000+ lines)
├── slab.c                # Slab allocator (4,000+ lines)
├── vmalloc.c             # Virtual memory (3,500+ lines)
└── kasan_test_rust.rs    # ⚠️ Only Rust file (just a test!)

kernel/sched/             # Process scheduler
├── 46 files, 33 C files
├── core.c                # Scheduler core (11,000+ lines)
└── 0 Rust files

fs/                       # VFS core
├── Hundreds of C files
├── namei.c               # Path lookup (5,000+ lines)
├── inode.c               # Inode management (2,000+ lines)
└── 0 Rust files (drivers only)

net/core/                 # Network protocol stack core
kernel/entry/             # System call entry points
arch/x86/kernel/          # Architecture-specific code
</code></pre></div></div>

<h3 id="why-this-matters">Why This Matters</h3>

<p>This distribution is <strong>not a technical limitation</strong> but a <strong>deliberate strategy</strong>:</p>

<ol>
  <li><strong>Risk management</strong>: Driver failures are contained; core subsystem bugs crash the system</li>
  <li><strong>Trust building</strong>: Prove Rust’s value in low-risk areas first</li>
  <li><strong>Community acceptance</strong>: Gradual adoption allows kernel maintainers to adapt</li>
  <li><strong>Tooling maturity</strong>: Build testing infrastructure and debugging tools</li>
</ol>

<h3 id="adoption-timeline-current-trajectory">Adoption Timeline (Current Trajectory)</h3>

<p><strong>Phase 1 (2022-2026)</strong>: ✅ <strong>Completed</strong></p>
<ul>
  <li>Device drivers and Android components</li>
  <li>Abstraction layer infrastructure</li>
  <li>Build system integration</li>
</ul>

<p><strong>Phase 2 (2026-2028)</strong>: 🔄 <strong>In progress</strong></p>
<ul>
  <li>More device drivers (expanding hardware support)</li>
  <li>Filesystem drivers (experimental)</li>
  <li>Network driver expansion</li>
</ul>

<p><strong>Phase 3 (2028-2030+)</strong>: 🔮 <strong>Highly speculative</strong></p>
<ul>
  <li>Core subsystem adoption (mm, scheduler, VFS)</li>
  <li><strong>This may never happen</strong> - requires massive community consensus</li>
  <li>No official roadmap exists for core rewrites</li>
</ul>

<h3 id="the-reality-check">The Reality Check</h3>

<p><strong>Question</strong>: “Will Rust replace C in the kernel core?”</p>

<p><strong>Answer</strong>: Unknown and unlikely in the near term (5-10 years). Current evidence shows:</p>
<ul>
  <li>Rust is succeeding in <strong>drivers</strong> (proven value)</li>
  <li>Core subsystems have <strong>decades of battle-tested C code</strong></li>
  <li>Rewriting core = <strong>enormous risk</strong> with unclear benefit</li>
  <li>Community focus is on <strong>new drivers</strong>, not rewriting existing core</li>
</ul>

<p><strong>Conclusion</strong>: Rust in Linux is currently a <strong>driver development language</strong>, not a <strong>kernel core language</strong>. This may change, but not soon.</p>

<h2 id="practical-implications">Practical Implications</h2>

<h3 id="for-rust-kernel-developers">For Rust Kernel Developers</h3>

<p><strong>Do:</strong></p>
<ul>
  <li>✅ Use <code class="language-plaintext highlighter-rouge">#[repr(C)]</code> for all userspace-facing structs</li>
  <li>✅ Use <code class="language-plaintext highlighter-rouge">uapi</code> crate for userspace types</li>
  <li>✅ Add size/layout assertions</li>
  <li>✅ Preserve padding with <code class="language-plaintext highlighter-rouge">MaybeUninit</code> if needed</li>
  <li>✅ Document ABI in same way as C drivers</li>
</ul>

<p><strong>Don’t:</strong></p>
<ul>
  <li>❌ Change userspace-visible types without version bump</li>
  <li>❌ Assume Rust’s layout is sufficient (use <code class="language-plaintext highlighter-rouge">#[repr(C)]</code>)</li>
  <li>❌ Break compatibility even for “better” design</li>
  <li>❌ Rely on Rust-specific types in UAPI</li>
</ul>

<h3 id="for-userspace-developers">For Userspace Developers</h3>

<p><strong>Good news</strong>: Nothing changes!</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Userspace C code (unchanged)</span>
<span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="s">"/dev/binder"</span><span class="p">,</span> <span class="n">O_RDWR</span><span class="p">);</span>
<span class="k">struct</span> <span class="n">binder_write_read</span> <span class="n">bwr</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span>
<span class="n">ioctl</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">BINDER_WRITE_READ</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">bwr</span><span class="p">);</span>
</code></pre></div></div>

<p>Whether the kernel driver is C or Rust, <strong>this code works identically</strong>.</p>

<h3 id="for-distribution-maintainers">For Distribution Maintainers</h3>

<p><strong>Internal modules</strong> (out-of-tree):</p>
<ul>
  <li>❌ Must recompile for each kernel version (always true)</li>
  <li>❌ May break if internal APIs change (always true)</li>
  <li>✅ In-tree Rust drivers handle this automatically</li>
</ul>

<p><strong>Userspace applications</strong>:</p>
<ul>
  <li>✅ No changes needed</li>
  <li>✅ ABI stability same as C drivers</li>
  <li>✅ Old binaries work on new kernels (as always)</li>
</ul>

<h2 id="common-misconceptions">Common Misconceptions</h2>

<h3 id="myth-1-rusts-abi-is-unstable-so-it-cant-be-used-for-kernel-interfaces">Myth 1: “Rust’s ABI is unstable, so it can’t be used for kernel interfaces”</h3>

<p><strong>Reality</strong>:</p>
<ul>
  <li>Rust’s <em>internal</em> ABI between Rust crates is unstable</li>
  <li>Rust’s <code class="language-plaintext highlighter-rouge">#[repr(C)]</code> ABI <strong>is stable</strong> and matches C exactly</li>
  <li>Kernel uses <code class="language-plaintext highlighter-rouge">#[repr(C)]</code> for all userspace interfaces</li>
</ul>

<h3 id="myth-2-rust-adds-a-new-abi-to-maintain">Myth 2: “Rust adds a new ABI to maintain”</h3>

<p><strong>Reality</strong>:</p>
<ul>
  <li>Rust uses <strong>same UAPI headers</strong> as C (via bindgen)</li>
  <li>No new ABI, just a different language implementing the same ABI</li>
  <li>Userspace sees no difference</li>
</ul>

<h3 id="myth-3-rust-internal-instability-affects-userspace">Myth 3: “Rust internal instability affects userspace”</h3>

<p><strong>Reality</strong>:</p>
<ul>
  <li>Rust’s <code class="language-plaintext highlighter-rouge">rust/kernel</code> abstractions can change freely (internal API)</li>
  <li>Userspace-facing ABI <strong>must not change</strong> (same rule as C)</li>
  <li>These are separate concerns</li>
</ul>

<h3 id="myth-4-modules-must-be-recompiled-because-of-rust">Myth 4: “Modules must be recompiled because of Rust”</h3>

<p><strong>Reality</strong>:</p>
<ul>
  <li>Kernel modules <strong>always</strong> needed recompilation between versions</li>
  <li>This is true for <strong>C modules</strong> too</li>
  <li>Rust doesn’t change this policy</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p><strong>Summary of findings:</strong></p>

<ol>
  <li>
    <p>✅ <strong>Rust provides userspace interfaces</strong> through <code class="language-plaintext highlighter-rouge">uapi</code> crate, ioctl handlers, device nodes, sysfs, etc.</p>
  </li>
  <li>
    <p>❌ <strong>Kernel internal ABI is NOT stable</strong> - modules must recompile for each kernel version (same as C)</p>
  </li>
  <li>
    <p>✅ <strong>Userspace ABI IS stable</strong> - never breaks (same rule for C and Rust)</p>
  </li>
  <li>
    <p>✅ <strong>Rust already provides userspace ABI in production</strong> - GPU drivers (Nova), network PHY drivers, block devices, CPU frequency drivers (all in mainline)</p>
  </li>
  <li>
    <p>⚠️ <strong>Rust is currently peripheral-only</strong> - Device drivers only; core kernel (mm, scheduler, VFS) remains 100% C</p>
  </li>
</ol>

<p><strong>Key insights</strong>:</p>

<ol>
  <li>The kernel’s ABI stability policy is <strong>orthogonal to the implementation language</strong>. Rust drivers must follow the same rules as C drivers:
    <ul>
      <li>Internal APIs can change anytime</li>
      <li>Userspace ABI is sacred and immutable</li>
    </ul>
  </li>
  <li>Rust’s current scope is <strong>deliberate and strategic</strong> - proving value in low-risk drivers before considering core subsystems.</li>
</ol>

<p><strong>Rust’s advantage</strong>: Better compile-time verification of ABI compatibility through <code class="language-plaintext highlighter-rouge">#[repr(C)]</code>, size assertions, and type safety, reducing accidental ABI breaks.</p>

<h1 id="rust与linux内核abi稳定性技术深度分析">Rust与Linux内核ABI稳定性：技术深度分析</h1>

<p><strong>摘要</strong>: Rust在Linux内核中提供用户空间接口吗？内核的ABI稳定性策略是什么？本文分析Rust驱动如何与用户空间交互，内部和外部ABI稳定性的关键区别，以及Android Binder和DRM驱动等生产代码的具体示例。</p>

<h2 id="快速回答">快速回答</h2>

<p><strong>问题1: Rust目前是否提供用户空间接口?</strong>
→ <strong>是的。</strong> Rust驱动已经通过ioctl、/dev节点、sysfs和其他标准机制暴露用户空间API。</p>

<p><strong>问题2: 内核内部追求ABI稳定性吗?</strong>
→ <strong>不。</strong> 内核内部API（模块和内核之间）<strong>明确不稳定</strong>。只有<strong>用户空间ABI</strong>是神圣的。</p>

<p><strong>问题3: Rust是否会被用于提供需要ABI稳定性的用户空间功能?</strong>
→ <strong>是的，已有实例。</strong> 主线内核中的Rust驱动（GPU、网络PHY）提供生产级用户空间ABI。Android Binder的Rust重写作为树外参考实现存在。</p>

<h2 id="深入探讨系统调用abi---不可变的契约">深入探讨：系统调用ABI - 不可变的契约</h2>

<p>在研究Rust的用户空间接口之前，让我们先了解用户空间ABI为何如此关键，通过查看<strong>系统调用层</strong> - 最基础的用户空间接口。</p>

<h3 id="神圣的系统调用abi">神圣的系统调用ABI</h3>

<p>Linux同时支持<strong>三种不同的系统调用机制</strong>以维持ABI兼容性：</p>

<table>
  <thead>
    <tr>
      <th>机制</th>
      <th>引入时间</th>
      <th>指令</th>
      <th>系统调用号</th>
      <th>参数</th>
      <th>状态</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>INT 0x80</strong></td>
      <td>Linux 1.0 (1994)</td>
      <td><code class="language-plaintext highlighter-rouge">int $0x80</code></td>
      <td>%eax</td>
      <td>%ebx, %ecx, %edx, %esi, %edi, %ebp</td>
      <td>✅ 仍支持(32位兼容)</td>
    </tr>
    <tr>
      <td><strong>SYSENTER</strong></td>
      <td>Intel P6 (1995)</td>
      <td><code class="language-plaintext highlighter-rouge">sysenter</code></td>
      <td>%eax</td>
      <td>%ebx, %ecx, %edx, %esi, %edi, %ebp</td>
      <td>✅ 仍支持(Intel 32位)</td>
    </tr>
    <tr>
      <td><strong>SYSCALL</strong></td>
      <td>AMD K6 (1997)</td>
      <td><code class="language-plaintext highlighter-rouge">syscall</code></td>
      <td>%rax</td>
      <td>%rdi, %rsi, %rdx, %r10, %r8, %r9</td>
      <td>✅ 主要64位方法</td>
    </tr>
  </tbody>
</table>

<p><strong>所有三种都并行维护</strong>，以确保任何用户空间应用程序永不破坏。</p>

<h3 id="实际内核实现">实际内核实现</h3>

<p>来自<code class="language-plaintext highlighter-rouge">arch/x86/kernel/cpu/common.c</code>（Linux内核源代码）：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// syscall_init() - 在内核初始化期间调用</span>
<span class="kt">void</span> <span class="nf">syscall_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="cm">/* 为用户/内核模式设置段选择子 */</span>
    <span class="n">wrmsr</span><span class="p">(</span><span class="n">MSR_STAR</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="n">__USER32_CS</span> <span class="o">&lt;&lt;</span> <span class="mi">16</span><span class="p">)</span> <span class="o">|</span> <span class="n">__KERNEL_CS</span><span class="p">);</span>

    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">cpu_feature_enabled</span><span class="p">(</span><span class="n">X86_FEATURE_FRED</span><span class="p">))</span>
        <span class="n">idt_syscall_init</span><span class="p">();</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kr">inline</span> <span class="kt">void</span> <span class="nf">idt_syscall_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// 64位原生syscall入口</span>
    <span class="n">wrmsrq</span><span class="p">(</span><span class="n">MSR_LSTAR</span><span class="p">,</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)</span><span class="n">entry_SYSCALL_64</span><span class="p">);</span>

    <span class="c1">// 32位兼容模式 - 必须维护旧ABI</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ia32_enabled</span><span class="p">())</span> <span class="p">{</span>
        <span class="n">wrmsrq_cstar</span><span class="p">((</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)</span><span class="n">entry_SYSCALL_compat</span><span class="p">);</span>

        <span class="cm">/* 为32位应用程序提供SYSENTER支持 */</span>
        <span class="n">wrmsrq_safe</span><span class="p">(</span><span class="n">MSR_IA32_SYSENTER_CS</span><span class="p">,</span> <span class="p">(</span><span class="n">u64</span><span class="p">)</span><span class="n">__KERNEL_CS</span><span class="p">);</span>
        <span class="n">wrmsrq_safe</span><span class="p">(</span><span class="n">MSR_IA32_SYSENTER_ESP</span><span class="p">,</span>
                    <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span><span class="p">)(</span><span class="n">cpu_entry_stack</span><span class="p">(</span><span class="n">smp_processor_id</span><span class="p">())</span> <span class="o">+</span> <span class="mi">1</span><span class="p">));</span>
        <span class="n">wrmsrq_safe</span><span class="p">(</span><span class="n">MSR_IA32_SYSENTER_EIP</span><span class="p">,</span> <span class="p">(</span><span class="n">u64</span><span class="p">)</span><span class="n">entry_SYSENTER_compat</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>这意味着什么</strong>: 1994年使用<code class="language-plaintext highlighter-rouge">int $0x80</code>编译的32位应用程序在运行在现代硬件上的2026 Linux内核上<strong>仍然可以工作</strong>。</p>

<h3 id="两个系统调用表">两个系统调用表</h3>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 64位原生系统调用</span>
<span class="k">const</span> <span class="n">sys_call_ptr_t</span> <span class="n">sys_call_table</span><span class="p">[</span><span class="n">__NR_syscall_max</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">[</span><span class="mi">0</span> <span class="p">...</span> <span class="n">__NR_syscall_max</span><span class="p">]</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">__x64_sys_ni_syscall</span><span class="p">,</span>
    <span class="cp">#include</span> <span class="cpf">&lt;asm/syscalls_64.h&gt;</span><span class="cp">
</span><span class="p">};</span>

<span class="c1">// 32位兼容系统调用</span>
<span class="k">const</span> <span class="n">sys_call_ptr_t</span> <span class="n">ia32_sys_call_table</span><span class="p">[</span><span class="n">__NR_ia32_syscall_max</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
    <span class="p">[</span><span class="mi">0</span> <span class="p">...</span> <span class="n">__NR_ia32_syscall_max</span><span class="p">]</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">__ia32_sys_ni_syscall</span><span class="p">,</span>
    <span class="cp">#include</span> <span class="cpf">&lt;asm/syscalls_32.h&gt;</span><span class="cp">
</span><span class="p">};</span>
</code></pre></div></div>

<p><strong>关键洞察</strong>: Linux为32位和64位维护<strong>完全独立的系统调用表</strong>以确保ABI稳定性。32位表<strong>从未删除系统调用</strong> - 只添加新的。</p>

<h3 id="启动协议abi---连引导加载程序都有契约">启动协议ABI - 连引导加载程序都有契约</h3>

<p>来自Linux内核压缩引导加载程序（<code class="language-plaintext highlighter-rouge">arch/x86/boot/compressed/head_64.S</code>）：</p>

<pre><code class="language-assembly">/*
 * 32位入口在0且是ABI所以不可变！
 * 这是压缩内核入口点。
 */
    .code32
SYM_FUNC_START(startup_32)
</code></pre>

<p><strong>注释”ABI so immutable!”至关重要</strong>：</p>
<ul>
  <li>32位入口点<strong>必须始终在压缩内核的偏移0处</strong></li>
  <li>引导加载程序（GRUB、systemd-boot等）<strong>依赖于此</strong></li>
  <li>改变这一点会破坏每个引导加载程序</li>
  <li>这从Linux 2.6.x时代以来一直如此</li>
</ul>

<p><strong>启动协议规范</strong>（<code class="language-plaintext highlighter-rouge">Documentation/x86/boot.rst</code>）：</p>
<ul>
  <li>保护模式内核加载在：<code class="language-plaintext highlighter-rouge">0x100000</code>（1MB）</li>
  <li>32位入口点：始终从加载地址偏移0</li>
  <li><code class="language-plaintext highlighter-rouge">code32_start</code>字段：默认为<code class="language-plaintext highlighter-rouge">0x100000</code></li>
</ul>

<p>这是<strong>内部启动ABI</strong> - 与用户空间ABI不同，但同样不可变，因为外部工具（引导加载程序）依赖于它。</p>

<h3 id="给rust的教训">给Rust的教训</h3>

<p>当Rust驱动提供用户空间接口时，它们继承这些相同的铁律：</p>

<p><strong>C示例</strong>（传统）：</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 用户空间永远不知道这从C变成了Rust</span>
<span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="s">"/dev/binder"</span><span class="p">,</span> <span class="n">O_RDWR</span><span class="p">);</span>
<span class="n">ioctl</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">BINDER_WRITE_READ</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">bwr</span><span class="p">);</span>  <span class="c1">// ABI未改变</span>
</code></pre></div></div>

<p><strong>Rust实现</strong>（现代）：</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 必须提供相同的ABI</span>
<span class="k">const</span> <span class="n">BINDER_WRITE_READ</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="nn">kernel</span><span class="p">::</span><span class="nn">ioctl</span><span class="p">::</span><span class="nn">_IOWR</span><span class="p">::</span><span class="o">&lt;</span><span class="n">BinderWriteRead</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="n">BINDER_TYPE</span> <span class="k">as</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="mi">1</span>  <span class="c1">// ioctl编号 - 永不改变</span>
<span class="p">);</span>
</code></pre></div></div>

<p>ioctl编号、结构布局和语义都<strong>冻结在时间中</strong> - 无论是用C还是Rust实现。</p>

<hr />

<h2 id="rust的abi保证system-v兼容性">Rust的ABI保证：System V兼容性</h2>

<p>在研究具体的用户空间接口之前，理解<strong>Rust如何保证与Linux在x86-64上使用的System V ABI兼容</strong>至关重要。</p>

<h3 id="rust符合system-v-abi吗">Rust符合System V ABI吗？</h3>

<p><strong>是的 - rustc通过语言特性明确保证System V ABI兼容性。</strong></p>

<p>x86-64上的Linux内核使用<strong>System V AMD64 ABI</strong>来定义：</p>
<ul>
  <li>函数调用约定（寄存器使用、栈布局）</li>
  <li>数据结构布局（对齐、填充、大小）</li>
  <li>类型表示（整数大小、指针大小）</li>
</ul>

<p>Rust提供多种机制来确保ABI兼容性：</p>

<table>
  <thead>
    <tr>
      <th>ABI类型</th>
      <th>Rust语法</th>
      <th>x86-64 Linux行为</th>
      <th>保证级别</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Rust ABI</strong></td>
      <td><code class="language-plaintext highlighter-rouge">extern "Rust"</code> (默认)</td>
      <td>未指定，可能改变</td>
      <td>❌ 不稳定</td>
    </tr>
    <tr>
      <td><strong>C ABI</strong></td>
      <td><code class="language-plaintext highlighter-rouge">extern "C"</code></td>
      <td>System V AMD64 ABI</td>
      <td>✅ <strong>语言规范保证</strong></td>
    </tr>
    <tr>
      <td><strong>System V</strong></td>
      <td><code class="language-plaintext highlighter-rouge">extern "sysv64"</code></td>
      <td>System V AMD64 ABI</td>
      <td>✅ <strong>显式保证</strong></td>
    </tr>
    <tr>
      <td><strong>数据布局</strong></td>
      <td><code class="language-plaintext highlighter-rouge">#[repr(C)]</code></td>
      <td>匹配C结构体布局</td>
      <td>✅ <strong>编译器保证</strong></td>
    </tr>
  </tbody>
</table>

<h3 id="编译器强制的abi正确性">编译器强制的ABI正确性</h3>

<p>与C中ABI兼容性是隐式且未检查的不同，<strong>Rust使ABI契约显式并在编译时验证</strong>：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 显式C ABI - 编译器验证调用约定</span>
<span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">kernel_function</span><span class="p">(</span><span class="n">arg</span><span class="p">:</span> <span class="nb">u64</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">i32</span> <span class="p">{</span>
    <span class="c1">// 函数使用System V调用约定：</span>
    <span class="c1">// - arg在%rdi寄存器中传递</span>
    <span class="c1">// - 返回值在%rax寄存器中</span>
    <span class="c1">// - 跨Rust编译器版本保证</span>
    <span class="mi">0</span>
<span class="p">}</span>

<span class="c1">// 显式内存布局 - 编译器验证大小/对齐</span>
<span class="nd">#[repr(C)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">KernelStruct</span> <span class="p">{</span>
    <span class="n">field1</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>  <span class="c1">// 偏移0，8字节</span>
    <span class="n">field2</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>  <span class="c1">// 偏移8，4字节</span>
    <span class="n">field3</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>  <span class="c1">// 偏移12，4字节</span>
<span class="p">}</span>

<span class="c1">// 编译时验证 - 如果布局改变则失败</span>
<span class="k">const</span> <span class="n">_</span><span class="p">:</span> <span class="p">()</span> <span class="o">=</span> <span class="nd">assert!</span><span class="p">(</span><span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">KernelStruct</span><span class="o">&gt;</span><span class="p">()</span> <span class="o">==</span> <span class="mi">16</span><span class="p">);</span>
<span class="k">const</span> <span class="n">_</span><span class="p">:</span> <span class="p">()</span> <span class="o">=</span> <span class="nd">assert!</span><span class="p">(</span><span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">align_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">KernelStruct</span><span class="o">&gt;</span><span class="p">()</span> <span class="o">==</span> <span class="mi">8</span><span class="p">);</span>
</code></pre></div></div>

<h3 id="参考示例binder-abi兼容性">参考示例：Binder ABI兼容性</h3>

<p>来自Android Binder Rust重写（树外参考实现）：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/android/binder/defs.rs (来自Rust-for-Linux树，非主线)</span>
<span class="nd">#[repr(C)]</span>
<span class="nd">#[derive(Copy,</span> <span class="nd">Clone)]</span>
<span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="k">struct</span> <span class="nf">BinderTransactionData</span><span class="p">(</span>
    <span class="n">MaybeUninit</span><span class="o">&lt;</span><span class="nn">uapi</span><span class="p">::</span><span class="n">binder_transaction_data</span><span class="o">&gt;</span>
<span class="p">);</span>

<span class="c1">// SAFETY: 显式FromBytes/AsBytes确保二进制兼容性</span>
<span class="k">unsafe</span> <span class="k">impl</span> <span class="n">FromBytes</span> <span class="k">for</span> <span class="n">BinderTransactionData</span> <span class="p">{}</span>
<span class="k">unsafe</span> <span class="k">impl</span> <span class="n">AsBytes</span> <span class="k">for</span> <span class="n">BinderTransactionData</span> <span class="p">{}</span>
</code></pre></div></div>

<p><strong>注意</strong>: 此代码来自Rust-for-Linux项目的Binder实现，作为树外参考存在，展示了如何在Rust中实现用户空间ABI兼容性。</p>

<p><strong>为什么使用<code class="language-plaintext highlighter-rouge">MaybeUninit</code>?</strong> 它保留<strong>填充字节</strong>以确保与C的逐位相同布局，包括未初始化的填充。这对用户空间兼容性至关重要。</p>

<h3 id="rustc的abi稳定性承诺">rustc的ABI稳定性承诺</h3>

<p>来自Rust语言规范：</p>

<blockquote>
  <p><strong><code class="language-plaintext highlighter-rouge">#[repr(C)]</code>保证</strong>: 用<code class="language-plaintext highlighter-rouge">#[repr(C)]</code>标记的类型与相应的C类型具有相同的布局，遵循目标平台的C ABI。这个保证在<strong>Rust编译器版本之间是稳定的</strong>。</p>
</blockquote>

<p><strong>与C对比:</strong></p>

<table>
  <thead>
    <tr>
      <th>方面</th>
      <th>C</th>
      <th>Rust</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>ABI规范</strong></td>
      <td>隐式，平台相关</td>
      <td>显式使用<code class="language-plaintext highlighter-rouge">extern "C"</code></td>
    </tr>
    <tr>
      <td><strong>布局验证</strong></td>
      <td>运行时bug</td>
      <td>编译时<code class="language-plaintext highlighter-rouge">assert!</code></td>
    </tr>
    <tr>
      <td><strong>填充控制</strong></td>
      <td>隐式，易出错</td>
      <td><code class="language-plaintext highlighter-rouge">MaybeUninit</code>显式</td>
    </tr>
    <tr>
      <td><strong>跨版本稳定性</strong></td>
      <td>信任开发者</td>
      <td>语言规范</td>
    </tr>
  </tbody>
</table>

<h3 id="系统调用寄存器使用">系统调用寄存器使用</h3>

<p>System V ABI指定函数调用的寄存器使用。对于<strong>系统调用</strong>，Linux使用<strong>修改过的</strong>System V约定：</p>

<p><strong>System V函数调用</strong>（<code class="language-plaintext highlighter-rouge">extern "C"</code>使用）：</p>
<ul>
  <li>参数: <code class="language-plaintext highlighter-rouge">%rdi, %rsi, %rdx, %rcx, %r8, %r9</code></li>
  <li>返回: <code class="language-plaintext highlighter-rouge">%rax</code></li>
</ul>

<p><strong>Linux syscall</strong>（特殊情况）：</p>
<ul>
  <li>系统调用号: <code class="language-plaintext highlighter-rouge">%rax</code></li>
  <li>参数: <code class="language-plaintext highlighter-rouge">%rdi, %rsi, %rdx, %r10, %r8, %r9</code>（注意：<code class="language-plaintext highlighter-rouge">%r10</code>而非<code class="language-plaintext highlighter-rouge">%rcx</code>）</li>
  <li>返回: <code class="language-plaintext highlighter-rouge">%rax</code></li>
</ul>

<p>Rust尊重两种约定：</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 常规C函数 - 使用标准System V ABI</span>
<span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">regular_function</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="nb">u64</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// a在%rdi, b在%rsi</span>
<span class="p">}</span>

<span class="c1">// 系统调用包装器 - 使用syscall约定</span>
<span class="nd">#[inline(always)]</span>
<span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">syscall1</span><span class="p">(</span><span class="n">n</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span> <span class="n">arg1</span><span class="p">:</span> <span class="nb">u64</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u64</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">ret</span><span class="p">:</span> <span class="nb">u64</span><span class="p">;</span>
    <span class="nn">core</span><span class="p">::</span><span class="nn">arch</span><span class="p">::</span><span class="nd">asm!</span><span class="p">(</span>
        <span class="s">"syscall"</span><span class="p">,</span>
        <span class="k">in</span><span class="p">(</span><span class="s">"rax"</span><span class="p">)</span> <span class="n">n</span><span class="p">,</span>     <span class="c1">// 系统调用号</span>
        <span class="k">in</span><span class="p">(</span><span class="s">"rdi"</span><span class="p">)</span> <span class="n">arg1</span><span class="p">,</span>  <span class="c1">// 第一个参数</span>
        <span class="nf">lateout</span><span class="p">(</span><span class="s">"rax"</span><span class="p">)</span> <span class="n">ret</span><span class="p">,</span>
    <span class="p">);</span>
    <span class="n">ret</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="答案rust能编译成符合system-v-abi的代码吗">答案：Rust能编译成符合System V ABI的代码吗？</h3>

<p>✅ <strong>是的，rustc通过以下方式保证System V ABI兼容性：</strong></p>
<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">extern "C"</code></strong> - 显式使用平台C ABI（x86-64 Linux上是System V）</li>
  <li><strong><code class="language-plaintext highlighter-rouge">#[repr(C)]</code></strong> - 保证C兼容的数据布局</li>
  <li><strong>编译时验证</strong> - 大小/对齐断言捕获ABI破坏</li>
  <li><strong>语言规范</strong> - 跨编译器版本的稳定性</li>
</ol>

<p>这不是”尽力而为” - 这是由Rust规范支持的<strong>语言级保证</strong>。</p>

<hr />

<h2 id="问题1rust的用户空间接口基础设施">问题1：Rust的用户空间接口基础设施</h2>

<h3 id="uapi-crate-用户空间api绑定"><code class="language-plaintext highlighter-rouge">uapi</code> Crate: 用户空间API绑定</h3>

<p>Rust为用户空间API提供了专门的crate。来自实际内核源代码：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/uapi/lib.rs (实际内核代码)</span>
<span class="cd">//! UAPI绑定。</span>
<span class="cd">//!</span>
<span class="cd">//! 包含bindgen为UAPI接口生成的绑定。</span>
<span class="cd">//!</span>
<span class="cd">//! 这个crate可以被需要与用户空间API交互的驱动直接使用。</span>

<span class="nd">#![no_std]</span>

<span class="c1">// 自动生成的UAPI绑定</span>
<span class="nd">include!</span><span class="p">(</span><span class="nd">concat!</span><span class="p">(</span><span class="nd">env!</span><span class="p">(</span><span class="s">"OBJTREE"</span><span class="p">),</span> <span class="s">"/rust/uapi/uapi_generated.rs"</span><span class="p">));</span>
</code></pre></div></div>

<p><strong>关键洞察</strong>: 内核有<strong>单独的<code class="language-plaintext highlighter-rouge">uapi</code> crate</strong>专门用于用户空间接口，与内部内核API分离。</p>

<h3 id="rust中的ioctl支持">Rust中的ioctl支持</h3>

<p>内核为Rust驱动提供完整的ioctl支持：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/ioctl.rs (实际内核代码)</span>
<span class="cd">//! `ioctl()`编号定义。</span>

<span class="cd">/// 为只读ioctl构建ioctl编号</span>
<span class="nd">#[inline(always)]</span>
<span class="k">pub</span> <span class="k">const</span> <span class="k">fn</span> <span class="n">_IOR</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">ty</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">nr</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="nf">_IOC</span><span class="p">(</span><span class="nn">uapi</span><span class="p">::</span><span class="n">_IOC_READ</span><span class="p">,</span> <span class="n">ty</span><span class="p">,</span> <span class="n">nr</span><span class="p">,</span> <span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">())</span>
<span class="p">}</span>

<span class="cd">/// 为只写ioctl构建ioctl编号</span>
<span class="nd">#[inline(always)]</span>
<span class="k">pub</span> <span class="k">const</span> <span class="k">fn</span> <span class="n">_IOW</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">ty</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">nr</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="nf">_IOC</span><span class="p">(</span><span class="nn">uapi</span><span class="p">::</span><span class="n">_IOC_WRITE</span><span class="p">,</span> <span class="n">ty</span><span class="p">,</span> <span class="n">nr</span><span class="p">,</span> <span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">())</span>
<span class="p">}</span>

<span class="cd">/// 为读写ioctl构建ioctl编号</span>
<span class="nd">#[inline(always)]</span>
<span class="k">pub</span> <span class="k">const</span> <span class="k">fn</span> <span class="n">_IOWR</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">ty</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">nr</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="nf">_IOC</span><span class="p">(</span>
        <span class="nn">uapi</span><span class="p">::</span><span class="n">_IOC_READ</span> <span class="p">|</span> <span class="nn">uapi</span><span class="p">::</span><span class="n">_IOC_WRITE</span><span class="p">,</span>
        <span class="n">ty</span><span class="p">,</span>
        <span class="n">nr</span><span class="p">,</span>
        <span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(),</span>
    <span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>这与C的ioctl宏完全相同</strong>，但具有类型安全。</p>

<h3 id="参考示例android-binder用户空间协议">参考示例：Android Binder用户空间协议</h3>

<p>Android Binder Rust重写（树外）展示了如何暴露广泛的用户空间API：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 来自Rust-for-Linux Binder实现的示例（非主线）</span>
<span class="k">use</span> <span class="nn">kernel</span><span class="p">::</span><span class="nn">uapi</span><span class="p">::{</span><span class="k">self</span><span class="p">,</span> <span class="o">*</span><span class="p">};</span>

<span class="c1">// 用户空间协议常量 - 必须保持稳定</span>
<span class="nd">pub_no_prefix!</span><span class="p">(</span>
    <span class="n">binder_driver_return_protocol_</span><span class="p">,</span>
    <span class="n">BR_TRANSACTION</span><span class="p">,</span>
    <span class="n">BR_REPLY</span><span class="p">,</span>
    <span class="n">BR_DEAD_REPLY</span><span class="p">,</span>
    <span class="n">BR_OK</span><span class="p">,</span>
    <span class="n">BR_ERROR</span><span class="p">,</span>
    <span class="c1">// ... 21个总协议常量</span>
<span class="p">);</span>

<span class="c1">// 用户空间数据结构 - 包装以保持ABI</span>
<span class="nd">decl_wrapper!</span><span class="p">(</span><span class="n">BinderTransactionData</span><span class="p">,</span> <span class="nn">uapi</span><span class="p">::</span><span class="n">binder_transaction_data</span><span class="p">);</span>
<span class="nd">decl_wrapper!</span><span class="p">(</span><span class="n">BinderWriteRead</span><span class="p">,</span> <span class="nn">uapi</span><span class="p">::</span><span class="n">binder_write_read</span><span class="p">);</span>
<span class="nd">decl_wrapper!</span><span class="p">(</span><span class="n">BinderVersion</span><span class="p">,</span> <span class="nn">uapi</span><span class="p">::</span><span class="n">binder_version</span><span class="p">);</span>
</code></pre></div></div>

<p><strong>关键细节</strong>: 这些使用<code class="language-plaintext highlighter-rouge">MaybeUninit</code>来<strong>保留填充字节</strong>，确保与C的二进制相同ABI：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 保留确切内存布局的包装器，包括填充</span>
<span class="nd">#[derive(Copy,</span> <span class="nd">Clone)]</span>
<span class="nd">#[repr(transparent)]</span>
<span class="k">pub</span><span class="p">(</span><span class="k">crate</span><span class="p">)</span> <span class="k">struct</span> <span class="nf">BinderTransactionData</span><span class="p">(</span><span class="n">MaybeUninit</span><span class="o">&lt;</span><span class="nn">uapi</span><span class="p">::</span><span class="n">binder_transaction_data</span><span class="o">&gt;</span><span class="p">);</span>

<span class="c1">// SAFETY: 显式FromBytes/AsBytes实现</span>
<span class="k">unsafe</span> <span class="k">impl</span> <span class="n">FromBytes</span> <span class="k">for</span> <span class="n">BinderTransactionData</span> <span class="p">{}</span>
<span class="k">unsafe</span> <span class="k">impl</span> <span class="n">AsBytes</span> <span class="k">for</span> <span class="n">BinderTransactionData</span> <span class="p">{}</span>
</code></pre></div></div>

<p><strong>为什么重要</strong>: 针对C头文件编译的用户空间代码向Rust驱动发送<strong>完全相同的二进制数据</strong>。</p>

<h3 id="用户空间接口总结">用户空间接口总结</h3>

<table>
  <thead>
    <tr>
      <th>接口类型</th>
      <th>Rust支持</th>
      <th>示例</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>ioctl处理器</strong></td>
      <td>✅ 完全支持（驱动处理命令）</td>
      <td>DRM驱动, Binder</td>
    </tr>
    <tr>
      <td><strong>/dev设备节点</strong></td>
      <td>✅ 通过miscdevice/cdev</td>
      <td>字符设备</td>
    </tr>
    <tr>
      <td><strong>/sys (sysfs)</strong></td>
      <td>✅ 通过kobject绑定</td>
      <td>设备属性</td>
    </tr>
    <tr>
      <td><strong>/proc</strong></td>
      <td>✅ 通过seq_file</td>
      <td>进程信息</td>
    </tr>
    <tr>
      <td><strong>定义新系统调用</strong></td>
      <td>❌ 不可能（syscall入口是C）</td>
      <td>-</td>
    </tr>
    <tr>
      <td><strong>Netlink</strong></td>
      <td>✅ 通过net子系统</td>
      <td>网络配置</td>
    </tr>
  </tbody>
</table>

<p><strong>重要区别</strong>: Rust驱动可以<strong>处理</strong>ioctl命令（驱动特定的逻辑），但ioctl <strong>系统调用入口点</strong>本身（在<code class="language-plaintext highlighter-rouge">fs/ioctl.c</code>中）仍然是C代码。其他接口也是如此 - Rust提供处理器，而不是核心机制。</p>

<p><strong>答案</strong>: 是的，Rust通过标准内核机制<strong>完全支持</strong>用户空间接口，尽管核心系统调用层仍然是C。</p>

<h2 id="关键澄清用户空间程序不能使用-rustkernel">关键澄清：用户空间程序不能使用 <code class="language-plaintext highlighter-rouge">rust/kernel</code></h2>

<p><strong>一个常见误解</strong>：”我的用户空间Rust程序可以使用<code class="language-plaintext highlighter-rouge">rust/kernel</code>抽象吗？”</p>

<p><strong>答案：绝对不能。</strong> 这是一个根本性的架构约束，而不是技术限制。</p>

<h3 id="内核空间-vs-用户空间---完全隔离">内核空间 vs 用户空间 - 完全隔离</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────────────────────────────────────────────┐
│              用户空间                                     │
│  - 使用Rust标准库 (std)                                  │
│  - 普通Rust程序                                          │
│  - 可以使用tokio、serde等                                │
│                                                          │
│  用户空间Rust程序:                                       │
│  ┌────────────────────────────────────────┐            │
│  │ use std::fs::File;                      │            │
│  │ use std::os::unix::io::AsRawFd;        │            │
│  │                                         │            │
│  │ fn main() {                             │            │
│  │     let fd = File::open("/dev/my_dev") │            │
│  │         .unwrap();                      │            │
│  │     // 通过系统调用与内核交互           │            │
│  │     unsafe {                             │            │
│  │         libc::ioctl(fd.as_raw_fd(), ...) │           │
│  │     }                                    │            │
│  │ }                                        │            │
│  └────────────────────────────────────────┘            │
└──────────────────┬──────────────────────────────────────┘
                   │
                   │  系统调用边界
                   │  - open(), ioctl(), read(), write()
                   │  - /dev, /sys, /proc 接口
                   │  - ❌ 不能直接调用内核函数
                   │
┌──────────────────┴──────────────────────────────────────┐
│              内核空间                                     │
│  - 使用 #![no_std] (无标准库)                           │
│  - 只能在内核模块中运行                                 │
│  - 使用 rust/kernel 抽象                                │
│                                                          │
│  内核Rust驱动:                                          │
│  ┌────────────────────────────────────────┐            │
│  │ #![no_std]                             │            │
│  │ use kernel::prelude::*;                │            │
│  │                                         │            │
│  │ impl kernel::file::Operations for MyDev│            │
│  │     fn ioctl(...) -&gt; Result {          │            │
│  │         // 处理用户空间的ioctl请求     │            │
│  │         kernel::sync::SpinLock::...     │            │
│  │     }                                   │            │
│  │ }                                       │            │
│  └────────────────────────────────────────┘            │
└─────────────────────────────────────────────────────────┘
</code></pre></div></div>

<h3 id="为什么用户空间不能使用-rustkernel">为什么用户空间不能使用 <code class="language-plaintext highlighter-rouge">rust/kernel</code></h3>

<p><strong>1. <code class="language-plaintext highlighter-rouge">#![no_std]</code> - 没有标准库</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// rust/kernel/lib.rs (库crate根文件)</span>
<span class="nd">#![no_std]</span>  <span class="c1">// ← 关键：没有标准库！</span>

<span class="c1">// 内核空间没有：</span>
<span class="c1">// - 堆分配（必须使用GFP_KERNEL）</span>
<span class="c1">// - 线程（使用内核任务）</span>
<span class="c1">// - 文件系统（用户空间概念）</span>
<span class="c1">// - 网络库（用户空间概念）</span>
<span class="c1">// - println!()（使用pr_info!()）</span>

<span class="c1">// 只有：</span>
<span class="c1">// - core库（不需要操作系统）</span>
<span class="c1">// - 内核特定API</span>
</code></pre></div></div>

<p><strong>注意</strong>：<code class="language-plaintext highlighter-rouge">#![no_std]</code> 属性只在库crate的根文件中声明，如 <code class="language-plaintext highlighter-rouge">rust/kernel/lib.rs</code>、<code class="language-plaintext highlighter-rouge">rust/bindings/lib.rs</code> 等。单独的驱动模块文件（例如 <code class="language-plaintext highlighter-rouge">drivers/gpu/drm/nova/driver.rs</code>）不需要这个声明 - 它们通过 <code class="language-plaintext highlighter-rouge">use kernel::prelude::*</code> 使用kernel库，从而继承了no_std环境。</p>

<p><strong>2. 不同的编译目标</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 用户空间Rust程序</span>
<span class="nv">$ </span>rustc <span class="nt">--target</span> x86_64-unknown-linux-gnu userspace.rs
<span class="c"># 编译成用户空间可执行文件</span>

<span class="c"># 内核Rust模块</span>
<span class="nv">$ </span>rustc <span class="nt">--target</span> x86_64-linux-kernel module.rs
<span class="c"># 编译成内核模块 (.ko文件)</span>
<span class="c"># 链接到内核，不能在用户空间运行</span>
</code></pre></div></div>

<p><strong>3. 内存空间隔离</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>虚拟地址空间:
┌─────────────────────┐ 0xFFFFFFFFFFFFFFFF
│   内核空间           │ ← rust/kernel 运行在这里
│   (仅内核代码)       │   只能通过系统调用访问
├─────────────────────┤ 0x00007FFFFFFFFFFF
│   用户空间           │ ← 用户Rust程序运行在这里
│   (应用程序)         │   不能访问内核内存
└─────────────────────┘ 0x0000000000000000
</code></pre></div></div>

<h3 id="用户空间程序如何与rust内核驱动交互">用户空间程序如何与Rust内核驱动交互</h3>

<p><strong>方式1：通过 <code class="language-plaintext highlighter-rouge">/dev</code> 设备节点</strong></p>

<p><strong>内核侧（Rust驱动）：</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/example/my_device.rs</span>
<span class="k">use</span> <span class="nn">kernel</span><span class="p">::</span><span class="nn">prelude</span><span class="p">::</span><span class="o">*</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">kernel</span><span class="p">::</span><span class="nn">file</span><span class="p">::</span><span class="n">Operations</span><span class="p">;</span>

<span class="k">struct</span> <span class="n">MyDevice</span><span class="p">;</span>

<span class="k">impl</span> <span class="n">Operations</span> <span class="k">for</span> <span class="n">MyDevice</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">open</span><span class="p">(</span><span class="o">...</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="k">Self</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="nd">pr_info!</span><span class="p">(</span><span class="s">"用户空间打开了设备</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
        <span class="nf">Ok</span><span class="p">(</span><span class="n">MyDevice</span><span class="p">)</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">ioctl</span><span class="p">(</span><span class="n">cmd</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">arg</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">isize</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">match</span> <span class="n">cmd</span> <span class="p">{</span>
            <span class="n">MY_IOCTL_CMD</span> <span class="k">=&gt;</span> <span class="p">{</span>
                <span class="c1">// 处理用户空间的ioctl请求</span>
                <span class="nf">Ok</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
            <span class="p">}</span>
            <span class="n">_</span> <span class="k">=&gt;</span> <span class="nf">Err</span><span class="p">(</span><span class="n">EINVAL</span><span class="p">),</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>用户空间（标准Rust程序）：</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// userspace_app/src/main.rs</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">fs</span><span class="p">::</span><span class="n">File</span><span class="p">;</span>  <span class="c1">// ← 使用标准库！</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">os</span><span class="p">::</span><span class="nn">unix</span><span class="p">::</span><span class="nn">io</span><span class="p">::</span><span class="n">AsRawFd</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// 打开Rust内核驱动创建的设备</span>
    <span class="k">let</span> <span class="n">file</span> <span class="o">=</span> <span class="nn">File</span><span class="p">::</span><span class="nf">open</span><span class="p">(</span><span class="s">"/dev/my_device"</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="c1">// 通过系统调用交互</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">ret</span> <span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="nf">ioctl</span><span class="p">(</span>
            <span class="n">file</span><span class="nf">.as_raw_fd</span><span class="p">(),</span>
            <span class="n">MY_IOCTL_CMD</span><span class="p">,</span>
            <span class="o">&amp;</span><span class="n">my_data</span>
        <span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// 用户空间完全不知道内核是C还是Rust！</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>方式2：通过 <code class="language-plaintext highlighter-rouge">sysfs</code></strong></p>

<p><strong>内核侧：</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 在内核中创建sysfs属性</span>
<span class="k">use</span> <span class="nn">kernel</span><span class="p">::</span><span class="nn">device</span><span class="p">::</span><span class="n">Device</span><span class="p">;</span>

<span class="k">impl</span> <span class="n">Device</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">create_sysfs_attrs</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
        <span class="c1">// 创建 /sys/class/my_device/value</span>
        <span class="nf">sysfs_create_file</span><span class="p">(</span><span class="o">...</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
        <span class="nf">Ok</span><span class="p">(())</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>用户空间：</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="n">fs</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// 读取由Rust内核驱动提供的sysfs文件</span>
    <span class="k">let</span> <span class="n">value</span> <span class="o">=</span> <span class="nn">fs</span><span class="p">::</span><span class="nf">read_to_string</span><span class="p">(</span>
        <span class="s">"/sys/class/my_device/value"</span>
    <span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="nd">println!</span><span class="p">(</span><span class="s">"来自内核的值: {}"</span><span class="p">,</span> <span class="n">value</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>方式3：通过 <code class="language-plaintext highlighter-rouge">netlink</code>（网络驱动）</strong></p>

<p><strong>内核侧：</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">kernel</span><span class="p">::</span><span class="n">net</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">send_netlink_msg</span><span class="p">(</span><span class="n">msg</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">NetlinkMsg</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
    <span class="nf">netlink_broadcast</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>用户空间：</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">netlink_sys</span><span class="p">::{</span><span class="n">Socket</span><span class="p">,</span> <span class="n">SocketAddr</span><span class="p">};</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">socket</span> <span class="o">=</span> <span class="nn">Socket</span><span class="p">::</span><span class="nf">new</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
    <span class="c1">// 接收来自Rust内核驱动的netlink消息</span>
    <span class="k">let</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">socket</span><span class="nf">.recv_from</span><span class="p">(</span><span class="o">...</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="对比表格">对比表格</h3>

<table>
  <thead>
    <tr>
      <th>特性</th>
      <th>内核空间 (<code class="language-plaintext highlighter-rouge">rust/kernel</code>)</th>
      <th>用户空间 (标准Rust)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>标准库</strong></td>
      <td>❌ <code class="language-plaintext highlighter-rouge">#![no_std]</code></td>
      <td>✅ <code class="language-plaintext highlighter-rouge">use std::*</code></td>
    </tr>
    <tr>
      <td><strong>运行环境</strong></td>
      <td>内核模块 (.ko)</td>
      <td>可执行文件 (ELF)</td>
    </tr>
    <tr>
      <td><strong>内存分配</strong></td>
      <td><code class="language-plaintext highlighter-rouge">kernel::kvec::KVec</code></td>
      <td><code class="language-plaintext highlighter-rouge">std::vec::Vec</code></td>
    </tr>
    <tr>
      <td><strong>打印输出</strong></td>
      <td><code class="language-plaintext highlighter-rouge">pr_info!()</code></td>
      <td><code class="language-plaintext highlighter-rouge">println!()</code></td>
    </tr>
    <tr>
      <td><strong>文件操作</strong></td>
      <td>❌ 不能打开文件</td>
      <td>✅ <code class="language-plaintext highlighter-rouge">std::fs::File</code></td>
    </tr>
    <tr>
      <td><strong>网络</strong></td>
      <td>提供网络服务</td>
      <td>使用网络服务</td>
    </tr>
    <tr>
      <td><strong>硬件访问</strong></td>
      <td>✅ 直接访问</td>
      <td>❌ 通过系统调用</td>
    </tr>
    <tr>
      <td><strong>特权级别</strong></td>
      <td>Ring 0</td>
      <td>Ring 3</td>
    </tr>
    <tr>
      <td><strong>可用crates</strong></td>
      <td>极少（仅no_std）</td>
      <td>所有标准crates</td>
    </tr>
  </tbody>
</table>

<h3 id="完整示例用户空间读取gpu信息">完整示例：用户空间读取GPU信息</h3>

<p><strong>1. 内核Rust GPU驱动：</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// drivers/gpu/drm/nova/driver.rs</span>
<span class="k">use</span> <span class="nn">kernel</span><span class="p">::</span><span class="n">drm</span><span class="p">;</span>

<span class="k">impl</span> <span class="nn">drm</span><span class="p">::</span><span class="n">Driver</span> <span class="k">for</span> <span class="n">NovaDriver</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">ioctl</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">cmd</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-&gt;</span> <span class="nb">Result</span> <span class="p">{</span>
        <span class="k">match</span> <span class="n">cmd</span> <span class="p">{</span>
            <span class="n">DRM_NOVA_GET_PARAM</span> <span class="k">=&gt;</span> <span class="p">{</span>
                <span class="c1">// 读取GPU参数</span>
                <span class="k">let</span> <span class="n">param</span> <span class="o">=</span> <span class="k">self</span><span class="nf">.get_gpu_param</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
                <span class="c1">// 复制到用户空间</span>
                <span class="n">data</span><span class="nf">.copy_from_slice</span><span class="p">(</span><span class="o">&amp;</span><span class="n">param</span><span class="nf">.to_bytes</span><span class="p">());</span>
                <span class="nf">Ok</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
            <span class="p">}</span>
            <span class="n">_</span> <span class="k">=&gt;</span> <span class="nf">Err</span><span class="p">(</span><span class="n">EINVAL</span><span class="p">),</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>2. 用户空间Rust应用：</strong></p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// userspace_app/src/main.rs</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">fs</span><span class="p">::</span><span class="n">OpenOptions</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">os</span><span class="p">::</span><span class="nn">unix</span><span class="p">::</span><span class="nn">io</span><span class="p">::</span><span class="n">AsRawFd</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// 打开DRM设备</span>
    <span class="k">let</span> <span class="n">drm_device</span> <span class="o">=</span> <span class="nn">OpenOptions</span><span class="p">::</span><span class="nf">new</span><span class="p">()</span>
        <span class="nf">.read</span><span class="p">(</span><span class="k">true</span><span class="p">)</span>
        <span class="nf">.write</span><span class="p">(</span><span class="k">true</span><span class="p">)</span>
        <span class="nf">.open</span><span class="p">(</span><span class="s">"/dev/dri/renderD128"</span><span class="p">)</span>
        <span class="nf">.unwrap</span><span class="p">();</span>

    <span class="k">let</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">drm_device</span><span class="nf">.as_raw_fd</span><span class="p">();</span>

    <span class="c1">// 准备ioctl参数</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">param_data</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0u8</span><span class="p">;</span> <span class="mi">64</span><span class="p">];</span>

    <span class="c1">// 调用ioctl（进入内核）</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="nn">libc</span><span class="p">::</span><span class="nf">ioctl</span><span class="p">(</span>
            <span class="n">fd</span><span class="p">,</span>
            <span class="n">DRM_NOVA_GET_PARAM</span><span class="p">,</span>
            <span class="o">&amp;</span><span class="k">mut</span> <span class="n">param_data</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">_</span>
        <span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// param_data现在包含来自内核的GPU参数</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"GPU参数: {:?}"</span><span class="p">,</span> <span class="n">param_data</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="关键要点">关键要点</h3>

<ol>
  <li>❌ <strong>用户空间程序不能使用 <code class="language-plaintext highlighter-rouge">rust/kernel</code></strong> - 它们运行在完全不同的环境中</li>
  <li>✅ <strong>用户空间通过系统调用与内核交互</strong> - 就像与C驱动交互一样</li>
  <li>🔄 <strong>交互是双向的但间接的</strong>：
    <ul>
      <li>用户空间 → 系统调用/ioctl/文件系统 → Rust内核驱动</li>
      <li>Rust内核驱动 → 响应/数据 → 系统调用返回 → 用户空间</li>
    </ul>
  </li>
</ol>

<p><strong>用户空间完全不知道内核驱动是C还是Rust - 这正是ABI稳定性的意义！</strong> 🎯</p>

<h2 id="问题2内核内部abi稳定性策略">问题2：内核内部ABI稳定性策略</h2>

<h3 id="关键区别">关键区别</h3>

<p>Linux内核有<strong>两种完全不同的ABI策略</strong>：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────────────────────────────────────────┐
│                  用户空间                            │
│  (应用程序、库、工具)                                │
└─────────────────┬───────────────────────────────────┘
                  │
                  │  ← 用户空间ABI (稳定、神圣)
                  │     系统调用、ioctl、/proc、/sys
                  │     "我们不破坏用户空间" - Linus
                  │
┌─────────────────┴───────────────────────────────────┐
│            LINUX内核                                 │
│  ┌─────────────────────────────────────────┐       │
│  │  内核子系统 (VFS, MM, Net等)            │       │
│  └─────────────────┬───────────────────────┘       │
│                    │                                │
│                    │  ← 内部API (不稳定!)           │
│                    │     随时可以改变                │
│                    │     无向后兼容                  │
│  ┌─────────────────┴───────────────────────┐       │
│  │  可加载内核模块 (.ko文件)                │       │
│  │  (驱动、文件系统等)                      │       │
│  └─────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────┘
</code></pre></div></div>

<h3 id="官方内核策略内部abi不稳定">官方内核策略：内部ABI不稳定</h3>

<p>来自Linux内核文档<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>：</p>

<blockquote>
  <p><strong>内核没有稳定的内部API/ABI。</strong></p>

  <p>内核内部API可以而且确实随时改变，出于任何原因。</p>
</blockquote>

<p><strong>实践中</strong>: 如果你为Linux 6.5编译内核模块，它在Linux 6.6上<strong>将无法加载</strong>，除非重新编译。</p>

<h3 id="为什么内部abi不稳定">为什么内部ABI不稳定</h3>

<p>Greg Kroah-Hartman在他著名的文档中解释了这一点：</p>

<p><strong>没有内部ABI稳定性的原因:</strong></p>

<ol>
  <li><strong>快速演进</strong>: 子系统需要重构的自由</li>
  <li><strong>无二进制模块</strong>: 所有模块必须是GPL且可重新编译</li>
  <li><strong>质量控制</strong>: 强制树外驱动保持更新</li>
  <li><strong>安全性</strong>: 允许修复根本性设计缺陷</li>
</ol>

<p><strong>哲学</strong>: “如果你的代码足够好，它应该在树内。如果在树内，重新编译是免费的。”</p>

<h3 id="用户空间abi绝对稳定">用户空间ABI：绝对稳定</h3>

<p>Linus Torvalds的著名规则（从无数LKML帖子中概括）：</p>

<blockquote>
  <p><strong>“我们不破坏用户空间。永远。”</strong></p>

  <p>如果内核更改破坏了正常工作的用户空间应用程序，该更改<strong>将被回退</strong>，无论它多么”正确”。</p>
</blockquote>

<p>来自官方文档<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>：</p>

<blockquote>
  <p><strong>稳定接口:</strong></p>
  <ul>
    <li>系统调用: 绝不能改变语义</li>
    <li>/proc和/sys ABI: 保证至少2年稳定</li>
    <li>ioctl编号: 一旦定义就永不重用</li>
    <li>二进制格式 (ELF等): 向后兼容</li>
  </ul>
</blockquote>

<p><strong>答案</strong>: 内核<strong>不追求内部ABI稳定性</strong>。只有<strong>用户空间ABI</strong>是稳定的。</p>

<h2 id="问题3rust与用户空间abi稳定性">问题3：Rust与用户空间ABI稳定性</h2>

<h3 id="当前状态rust提供稳定的用户空间abi">当前状态：Rust提供稳定的用户空间ABI</h3>

<p><strong>主线内核中的生产级驱动</strong>（截至Linux 6.x）：</p>

<ol>
  <li><strong>GPU驱动 (Nova)</strong>: 为Nvidia GPU提供DRM用户空间ABI - 完整的ioctl接口</li>
  <li><strong>网络PHY驱动</strong> (ax88796b, qt2025): ethtool/netlink ABI</li>
  <li><strong>块设备</strong> (rnull): 标准块设备ioctl ABI</li>
  <li><strong>CPU频率</strong> (rcpufreq_dt): sysfs和ioctl接口</li>
</ol>

<p><strong>参考实现（树外）</strong>：</p>

<p><strong>Android Binder</strong>（Rust重写，尚未进入主线）：展示了与C版本<strong>完全相同的用户空间ABI</strong>：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 与C版本相同的BINDER_WRITE_READ ioctl</span>
<span class="k">const</span> <span class="n">BINDER_WRITE_READ</span><span class="p">:</span> <span class="nb">u32</span> <span class="o">=</span> <span class="nn">kernel</span><span class="p">::</span><span class="nn">ioctl</span><span class="p">::</span><span class="nn">_IOWR</span><span class="p">::</span><span class="o">&lt;</span><span class="n">BinderWriteRead</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="n">BINDER_TYPE</span> <span class="k">as</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="mi">1</span>
<span class="p">);</span>

<span class="c1">// 使用C头文件的用户空间代码发送完全相同的二进制数据</span>
</code></pre></div></div>

<p>这个树外实现已经<strong>验证</strong> - Android的libbinder（C++用户空间库）与Rust驱动无需修改即可工作。</p>

<h3 id="为什么rust实际上更适合abi稳定性">为什么Rust实际上更适合ABI稳定性</h3>

<p><strong>C中的问题</strong>: 意外的ABI破坏</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C - 容易意外改变ABI</span>
<span class="k">struct</span> <span class="n">binder_transaction_data</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">cookie</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">code</span><span class="p">;</span>
    <span class="c1">// 糟糕，开发者在这里添加字段 - ABI破坏了！</span>
    <span class="kt">uint32_t</span> <span class="n">new_field</span><span class="p">;</span>
    <span class="kt">uint32_t</span> <span class="n">flags</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p><strong>Rust解决方案</strong>: 显式版本控制和<code class="language-plaintext highlighter-rouge">#[repr(C)]</code></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Rust - ABI布局是显式的并经过检查</span>
<span class="nd">#[repr(C)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">binder_transaction_data</span> <span class="p">{</span>
    <span class="k">pub</span> <span class="n">cookie</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="k">pub</span> <span class="n">code</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="c1">// 不能在这里添加字段，除非显式版本升级</span>
    <span class="k">pub</span> <span class="n">flags</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
<span class="p">}</span>

<span class="c1">// 编译时大小检查</span>
<span class="k">const</span> <span class="n">_</span><span class="p">:</span> <span class="p">()</span> <span class="o">=</span> <span class="nd">assert!</span><span class="p">(</span>
    <span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">binder_transaction_data</span><span class="o">&gt;</span><span class="p">()</span> <span class="o">==</span> <span class="mi">48</span>
<span class="p">);</span>
</code></pre></div></div>

<h3 id="rust的reprc保证">Rust的<code class="language-plaintext highlighter-rouge">#[repr(C)]</code>保证</h3>

<p>从Rust语言规范：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[repr(C)]</span>
<span class="k">struct</span> <span class="n">UserspaceFacingStruct</span> <span class="p">{</span>
    <span class="n">field1</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="n">field2</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>保证</strong>:</p>
<ul>
  <li>与C结构相同的布局</li>
  <li>相同的填充规则</li>
  <li>相同的对齐</li>
  <li>相同的大小</li>
  <li>跨Rust编译器版本稳定</li>
</ul>

<p><strong>这是语言级别的保证</strong>，不仅仅是约定。</p>

<h3 id="abi稳定性rust-vs-c对比">ABI稳定性：Rust vs C对比</h3>

<table>
  <thead>
    <tr>
      <th>方面</th>
      <th>C</th>
      <th>Rust</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>布局控制</strong></td>
      <td>隐式，编译器依赖</td>
      <td><code class="language-plaintext highlighter-rouge">#[repr(C)]</code>显式</td>
    </tr>
    <tr>
      <td><strong>填充保留</strong></td>
      <td>手动，易出错</td>
      <td><code class="language-plaintext highlighter-rouge">MaybeUninit</code>自动</td>
    </tr>
    <tr>
      <td><strong>大小验证</strong></td>
      <td>手动<code class="language-plaintext highlighter-rouge">BUILD_BUG_ON</code></td>
      <td><code class="language-plaintext highlighter-rouge">const _: assert!(size == X)</code></td>
    </tr>
    <tr>
      <td><strong>破坏性更改</strong></td>
      <td>静默，运行时失败</td>
      <td>编译错误</td>
    </tr>
    <tr>
      <td><strong>版本控制</strong></td>
      <td>手动，按约定</td>
      <td>可由类型系统强制</td>
    </tr>
    <tr>
      <td><strong>二进制兼容性</strong></td>
      <td>信任开发者</td>
      <td>编译器验证</td>
    </tr>
  </tbody>
</table>

<h3 id="rust会提供关键的用户空间abi吗">Rust会提供关键的用户空间ABI吗？</h3>

<p><strong>生产环境部署（主线内核）:</strong></p>

<ol>
  <li><strong>GPU驱动</strong> (Nova): 为Nvidia GPU提供DRM用户空间ABI（树内13个文件）</li>
  <li><strong>网络PHY驱动</strong>: ethtool/netlink ABI (ax88796b, qt2025)</li>
  <li><strong>块设备</strong>: rnull驱动，提供标准ioctl ABI</li>
  <li><strong>CPU频率</strong>: rcpufreq_dt，提供sysfs接口</li>
</ol>

<p><strong>参考实现（树外）:</strong></p>

<ol>
  <li><strong>Android Binder</strong> (IPC): Rust重写展示ABI兼容性（尚未进入主线）</li>
</ol>

<p><strong>即将推出</strong> (基于当前开发):</p>

<ol>
  <li><strong>文件系统</strong>: VFS操作，挂载选项</li>
  <li><strong>网络协议</strong>: Socket选项，数据包格式</li>
  <li><strong>更多设备驱动</strong>: 扩展硬件支持</li>
</ol>

<h3 id="关键策略与语言无关的abi">关键策略：与语言无关的ABI</h3>

<p><strong>关键洞察</strong>: 内核的ABI稳定性策略是<strong>与语言无关的</strong>。</p>

<p>来自Linus Torvalds（从各种LKML帖子总结）：</p>

<blockquote>
  <p>“我不在乎你用C、Rust还是汇编编写。如果你破坏了用户空间，你就破坏了内核。”</p>
</blockquote>

<p><strong>实践中</strong>:</p>
<ul>
  <li>Rust驱动通过bindgen使用<strong>与C相同的UAPI头文件</strong></li>
  <li>相同的ioctl编号，相同的结构布局，相同的语义</li>
  <li>用户空间<strong>无法分辨</strong>驱动是C还是Rust</li>
  <li>ABI破坏在两种语言中<strong>同样不可接受</strong></li>
</ul>

<p><strong>答案</strong>: 是的，Rust<strong>将会并且已经</strong>被用于需要ABI稳定性的用户空间功能。</p>

<h2 id="当前范围外围驱动而非内核核心">当前范围：外围驱动，而非内核核心</h2>

<p><strong>重要澄清</strong>: 截至2026年初，Linux内核中的Rust<strong>仅限于外围区域</strong> - 设备驱动和Android特定组件。<strong>没有核心内核子系统被用Rust重写。</strong></p>

<h3 id="-rust代码存在的位置">✅ Rust代码存在的位置</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>drivers/                    # 外围驱动层
├── gpu/drm/nova/          # GPU驱动 (Nvidia, 13个文件, ~1,200行)
├── net/phy/               # 网络PHY驱动 (2个文件, ~237行)
├── block/rnull.rs         # 块设备示例 (80行)
├── cpufreq/rcpufreq_dt.rs # CPU频率管理 (227行)
└── gpu/drm/drm_panic_qr.rs # DRM panic QR码 (996行)

rust/kernel/               # 抽象层 (101个文件, 13,500行)
├── sync/                  # 同步原语的Rust绑定
├── mm/                    # 内存函数的Rust绑定
├── fs/                    # 文件系统的Rust绑定
└── net/                   # 网络的Rust绑定
</code></pre></div></div>

<p><strong>关键点</strong>: <code class="language-plaintext highlighter-rouge">rust/kernel/</code>目录提供<strong>抽象</strong>（围绕C API的安全包装器），而不是核心功能的<strong>实现</strong>。</p>

<h3 id="-仍然100-c的部分核心内核">❌ 仍然100% C的部分（核心内核）</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mm/                        # 内存管理核心
├── 153个文件, 128个C文件
├── page_alloc.c          # 页面分配器 (9,000+ 行)
├── slab.c                # Slab分配器 (4,000+ 行)
├── vmalloc.c             # 虚拟内存 (3,500+ 行)
└── kasan_test_rust.rs    # ⚠️ 唯一的Rust文件（仅仅是测试！）

kernel/sched/             # 进程调度器
├── 46个文件, 33个C文件
├── core.c                # 调度器核心 (11,000+ 行)
└── 0个Rust文件

fs/                       # VFS核心
├── 数百个C文件
├── namei.c               # 路径查找 (5,000+ 行)
├── inode.c               # Inode管理 (2,000+ 行)
└── 0个Rust文件（仅驱动）

net/core/                 # 网络协议栈核心
kernel/entry/             # 系统调用入口点
arch/x86/kernel/          # 架构特定代码
</code></pre></div></div>

<h3 id="为什么这很重要">为什么这很重要</h3>

<p>这种分布<strong>不是技术限制</strong>，而是<strong>deliberate战略</strong>：</p>

<ol>
  <li><strong>风险管理</strong>: 驱动故障是局部的；核心子系统bug会导致系统崩溃</li>
  <li><strong>建立信任</strong>: 先在低风险区域证明Rust的价值</li>
  <li><strong>社区接受</strong>: 渐进式采用让内核维护者有时间适应</li>
  <li><strong>工具成熟</strong>: 构建测试基础设施和调试工具</li>
</ol>

<h3 id="采用时间线当前轨迹">采用时间线（当前轨迹）</h3>

<p><strong>第1阶段 (2022-2026)</strong>: ✅ <strong>已完成</strong></p>
<ul>
  <li>设备驱动和Android组件</li>
  <li>抽象层基础设施</li>
  <li>构建系统集成</li>
</ul>

<p><strong>第2阶段 (2026-2028)</strong>: 🔄 <strong>进行中</strong></p>
<ul>
  <li>更多设备驱动（扩展硬件支持）</li>
  <li>文件系统驱动（实验性）</li>
  <li>网络驱动扩展</li>
</ul>

<p><strong>第3阶段 (2028-2030+)</strong>: 🔮 <strong>高度推测</strong></p>
<ul>
  <li>核心子系统采用（mm、调度器、VFS）</li>
  <li><strong>这可能永远不会发生</strong> - 需要巨大的社区共识</li>
  <li>核心重写没有官方路线图</li>
</ul>

<h3 id="现实检验">现实检验</h3>

<p><strong>问题</strong>: “Rust会替换内核核心中的C吗？”</p>

<p><strong>答案</strong>: 未知且在近期（5-10年）不太可能。当前证据显示：</p>
<ul>
  <li>Rust在<strong>驱动</strong>中取得成功（已证明价值）</li>
  <li>核心子系统拥有<strong>数十年经过实战检验的C代码</strong></li>
  <li>重写核心 = <strong>巨大风险</strong>，收益不明确</li>
  <li>社区重点是<strong>新驱动</strong>，而非重写现有核心</li>
</ul>

<p><strong>结论</strong>: Linux中的Rust目前是一种<strong>驱动开发语言</strong>，而不是<strong>内核核心语言</strong>。这可能会改变，但不会很快。</p>

<h2 id="实际影响">实际影响</h2>

<h3 id="对rust内核开发者">对Rust内核开发者</h3>

<p><strong>要做:</strong></p>
<ul>
  <li>✅ 对所有用户空间结构使用<code class="language-plaintext highlighter-rouge">#[repr(C)]</code></li>
  <li>✅ 对用户空间类型使用<code class="language-plaintext highlighter-rouge">uapi</code> crate</li>
  <li>✅ 添加大小/布局断言</li>
  <li>✅ 如需要用<code class="language-plaintext highlighter-rouge">MaybeUninit</code>保留填充</li>
  <li>✅ 以与C驱动相同的方式记录ABI</li>
</ul>

<p><strong>不要做:</strong></p>
<ul>
  <li>❌ 未经版本升级更改用户空间可见类型</li>
  <li>❌ 假设Rust的布局足够（使用<code class="language-plaintext highlighter-rouge">#[repr(C)]</code>）</li>
  <li>❌ 即使为了”更好”的设计也不要破坏兼容性</li>
  <li>❌ 在UAPI中依赖Rust特定类型</li>
</ul>

<h3 id="对用户空间开发者">对用户空间开发者</h3>

<p><strong>好消息</strong>: 什么都不变！</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 用户空间C代码（不变）</span>
<span class="kt">int</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="s">"/dev/binder"</span><span class="p">,</span> <span class="n">O_RDWR</span><span class="p">);</span>
<span class="k">struct</span> <span class="n">binder_write_read</span> <span class="n">bwr</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span>
<span class="n">ioctl</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">BINDER_WRITE_READ</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">bwr</span><span class="p">);</span>
</code></pre></div></div>

<p>无论内核驱动是C还是Rust，<strong>这段代码工作完全相同</strong>。</p>

<h2 id="常见误解">常见误解</h2>

<h3 id="误解1rust的abi不稳定所以不能用于内核接口">误解1：”Rust的ABI不稳定，所以不能用于内核接口”</h3>

<p><strong>现实</strong>:</p>
<ul>
  <li>Rust crate之间的<em>内部</em>ABI不稳定</li>
  <li>Rust的<code class="language-plaintext highlighter-rouge">#[repr(C)]</code> ABI <strong>是稳定的</strong>，与C完全匹配</li>
  <li>内核对所有用户空间接口使用<code class="language-plaintext highlighter-rouge">#[repr(C)]</code></li>
</ul>

<h3 id="误解2rust添加了需要维护的新abi">误解2：”Rust添加了需要维护的新ABI”</h3>

<p><strong>现实</strong>:</p>
<ul>
  <li>Rust使用<strong>与C相同的UAPI头文件</strong>（通过bindgen）</li>
  <li>没有新ABI，只是不同语言实现相同ABI</li>
  <li>用户空间看不到区别</li>
</ul>

<h3 id="误解3rust内部不稳定性影响用户空间">误解3：”Rust内部不稳定性影响用户空间”</h3>

<p><strong>现实</strong>:</p>
<ul>
  <li>Rust的<code class="language-plaintext highlighter-rouge">rust/kernel</code>抽象可以自由更改（内部API）</li>
  <li>面向用户空间的ABI<strong>不能更改</strong>（与C规则相同）</li>
  <li>这些是分开的关注点</li>
</ul>

<h3 id="误解4因为rust模块必须重新编译">误解4：”因为Rust模块必须重新编译”</h3>

<p><strong>现实</strong>:</p>
<ul>
  <li>内核模块<strong>一直</strong>需要在版本之间重新编译</li>
  <li>对于<strong>C模块</strong>也是如此</li>
  <li>Rust不改变这一策略</li>
</ul>

<h2 id="结论">结论</h2>

<p><strong>发现总结:</strong></p>

<ol>
  <li>
    <p>✅ <strong>Rust通过<code class="language-plaintext highlighter-rouge">uapi</code> crate、ioctl处理器、设备节点、sysfs等提供用户空间接口</strong></p>
  </li>
  <li>
    <p>❌ <strong>内核内部ABI不稳定</strong> - 模块必须为每个内核版本重新编译（与C相同）</p>
  </li>
  <li>
    <p>✅ <strong>用户空间ABI是稳定的</strong> - 永不破坏（C和Rust规则相同）</p>
  </li>
  <li>
    <p>✅ <strong>Rust已经在生产环境提供用户空间ABI</strong> - GPU驱动（Nova），网络PHY驱动，块设备，CPU频率驱动（均在主线）</p>
  </li>
  <li>
    <p>⚠️ <strong>Rust目前仅在外围</strong> - 仅设备驱动；核心内核（mm、调度器、VFS）仍然100% C</p>
  </li>
</ol>

<p><strong>关键洞察</strong>:</p>

<ol>
  <li>内核的ABI稳定性策略<strong>与实现语言正交</strong>。Rust驱动必须遵循与C驱动相同的规则：
    <ul>
      <li>内部API可以随时更改</li>
      <li>用户空间ABI是神圣和不可变的</li>
    </ul>
  </li>
  <li>Rust的当前范围是<strong>deliberate和战略性的</strong> - 在考虑核心子系统之前，先在低风险驱动中证明价值。</li>
</ol>

<p><strong>Rust的优势</strong>: 通过<code class="language-plaintext highlighter-rouge">#[repr(C)]</code>、大小断言和类型安全更好地编译时验证ABI兼容性，减少意外的ABI破坏。</p>

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://www.kernel.org/doc/Documentation/process/stable-api-nonsense.rst">Linux Kernel Stable API Nonsense</a> - Greg Kroah-Hartman’s explanation of why internal kernel API is unstable <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://docs.kernel.org/admin-guide/abi.html">Linux ABI description</a> - Official kernel documentation on ABI stability levels <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://github.com/torvalds/linux/blob/master/Documentation/ABI/README">ABI README</a> - Documentation of ABI stability categories <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[Does Rust in the Linux kernel provide userspace interfaces? What’s the kernel’s ABI stability policy? This analysis examines how Rust drivers interact with userspace, the critical distinction between internal and external ABI stability, and concrete examples from production code like Android Binder and DRM drivers.]]></summary></entry><entry><title type="html">为什么Linux内核选择了Rust而不是Zig？</title><link href="https://weinan.io/2026/02/15/why-linux-chose-rust-over-zig.html" rel="alternate" type="text/html" title="为什么Linux内核选择了Rust而不是Zig？" /><published>2026-02-15T00:00:00+00:00</published><updated>2026-02-15T00:00:00+00:00</updated><id>https://weinan.io/2026/02/15/why-linux-chose-rust-over-zig</id><content type="html" xml:base="https://weinan.io/2026/02/15/why-linux-chose-rust-over-zig.html"><![CDATA[<p>2022年12月，Linux 6.1正式发布，首次将Rust作为内核的第二种编程语言。本文深入分析了Linux内核选择Rust而非Zig的核心原因，包括时机、语言特性差异、社区生态等多个维度，并探讨了两种语言在系统编程领域的不同定位。</p>

<h2 id="引言">引言</h2>

<p>在众多现代系统编程语言中，为什么是Rust获得了Linux内核”第二语言”的席位，而同样优秀的Zig却未能入选？这个问题的答案，远比表面看起来要复杂得多。</p>

<p>最直接的原因可以概括为：<strong>当内核在2022年底正式引入Rust时，Zig还没准备好；而当Zig逐渐成熟时，内核的”第二语言”席位已经被Rust占据</strong>。2022年10月，Linus Torvalds将Rust代码合并到Linux 6.1的开发周期中<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">1</a></sup>，同年12月11日，Linux 6.1正式发布<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">2</a></sup>，Rust成为Linux历史上首次被接纳的C语言之外的编程语言<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">3</a></sup>。</p>

<p>这背后是工程决策、语言特性和社区生态共同作用的结果。</p>

<h2 id="核心原因分析">核心原因分析</h2>

<h3 id="-时机与行业背书">⏳ 时机与行业背书</h3>

<p>Rust for Linux项目于2020年在Linux内核邮件列表中宣布启动<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">4</a></sup>，经过两年多的开发，于2022年12月随Linux 6.1正式进入稳定版本。在2019-2020年内核讨论引入第二语言时，Zig（2015年诞生）还处于早期阶段，而Rust背后有<strong>Mozilla、微软、谷歌</strong>等巨头的投入。到2025年12月，Rust已正式从”实验性”状态转为Linux内核的核心组成部分<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">5</a></sup>。</p>

<h3 id="-语言特性的根本差异">🔧 语言特性的根本差异</h3>

<p>Rust和Zig的设计哲学存在本质差异，这决定了它们与内核需求的匹配度。</p>

<p><strong>Rust：激进的安全卫士</strong></p>

<p>核心目标是在<strong>编译期消除内存错误</strong>。它通过<strong>所有权、生命周期</strong>等机制，在编译阶段就堵死空指针、数据竞争等漏洞，这直击了内核安全最核心的痛点。研究表明，约70%的内核安全问题源于内存安全，而Rust可以自动消除其中的大部分<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">6</a></sup>。宏和RAII特性也被驱动开发者视为处理复杂硬件逻辑的利器。</p>

<p><strong>Zig：现代的C语言替代者</strong></p>

<p>旨在成为C的现代升级版，强调<strong>对底层操作的完全掌控</strong>和<strong>“零隐式行为”</strong><sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">7</a></sup>。它没有Rust那样复杂的编译器，依靠<strong>显式的错误处理</strong>和<strong>编译期执行</strong>来提升C的安全性。但对于内核开发者，这意味着需要<strong>手动管理资源</strong>，并可能面临段错误。</p>

<h3 id="-社区与生态的门槛">🌍 社区与生态的门槛</h3>

<p>对Linux这样的超大规模项目，生态是关键。Rust拥有庞大的用户群和库，这为内核的长期维护和人才储备提供了保障<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>。相比之下，Zig在2015年才诞生，其生态系统和开发者社区规模相对较小，语言本身也仍在快速演进中，这在一定程度上增加了项目采纳的风险。</p>

<h2 id="-为什么不是c">🚫 为什么不是C++？</h2>

<p>在讨论为何选择Rust而非Zig之前，一个更基本的问题是：为什么Linux内核从未考虑使用C++？毕竟C++也提供了RAII等现代特性。</p>

<p>Linus Torvalds对此有过非常明确的表态。早在2004年，他就在Linux内核邮件列表中指出<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">9</a></sup>：</p>

<blockquote>
  <p>“写内核代码用C++是一个<strong>非常愚蠢的想法</strong>（BLOODY STUPID IDEA）。”</p>

  <p><em>“It sucks. Trust me - writing kernel code in C++ is a BLOODY STUPID IDEA.”</em></p>

  <p>“事实上，C++编译器是不可信的…C++的整个异常处理机制从根本上就是有问题的，<strong>对内核来说尤其如此</strong>。”</p>

  <p><em>“The whole C++ exception handling thing is fundamentally broken. It’s _especially_ broken for kernels.”</em></p>
</blockquote>

<p>2007年，在Git邮件列表上，Linus更系统地阐述了反对C++的理由<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote" rel="footnote">10</a></sup>：</p>

<p><strong>1. 异常处理机制不适合内核</strong></p>

<p>C++的异常处理会引入非局部控制流跳转，这在需要绝对确定性的内核代码中是不可接受的。C语言的错误返回值机制虽然繁琐，但路径清晰、透明可控。Linus早在2004年就明确指出<sup id="fnref:14:1" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">9</a></sup>：</p>

<blockquote>
  <p><em>“The whole C++ exception handling thing is fundamentally broken. It’s _especially_ broken for kernels.”</em></p>

  <p>“C++的整个异常处理机制从根本上就是有问题的，对内核来说尤其如此。”</p>
</blockquote>

<p>学术研究也印证了这一问题。2019年爱丁堡大学的研究表明，即使采用优化后的实现，C++异常处理在嵌入式系统中仍然存在显著的代码体积和运行时开销<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote" rel="footnote">11</a></sup>。2025年St Andrews大学的最新研究指出，C++异常在用户态/内核态边界的传播需要特殊的ABI支持，增加了系统复杂性<sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote" rel="footnote">12</a></sup>。</p>

<p><strong>2. 隐式内存分配是大忌</strong></p>

<p>内核需要对每一个字节的内存分配有完全的控制权。Linus在2004年明确指出<sup id="fnref:14:2" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">9</a></sup>：</p>

<blockquote>
  <p><em>“Any compiler or language that likes to hide things like memory allocations behind your back just isn’t a good choice for a kernel.”</em></p>

  <p>“任何喜欢在背后隐藏内存分配等操作的编译器或语言，都不是内核开发的好选择。”</p>
</blockquote>

<p><strong>3. 抽象导致的效率问题</strong></p>

<p>Linus在2007年指出<sup id="fnref:15:1" role="doc-noteref"><a href="#fn:15" class="footnote" rel="footnote">10</a></sup>：</p>

<blockquote>
  <p><em>“C++ leads to really really bad design choices. You invariably start using the ‘nice’ library features of the language like STL and Boost and other total and utter crap, that may ‘help’ you program, but causes… inefficient abstracted programming models where two years down the road you notice that some abstraction wasn’t very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.”</em></p>

  <p>“C++导致真正糟糕的设计选择。你不可避免地会开始使用STL和Boost等’优雅的’库特性…这会导致低效的抽象编程模型，两年后你会发现某些抽象效率不高，但现在你所有的代码都依赖于这些精美的对象模型，除非重写应用，否则无法修复。”</p>
</blockquote>

<p><strong>4. C语言足以实现面向对象</strong></p>

<p>Linus在2004年指出<sup id="fnref:14:3" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">9</a></sup>：</p>

<blockquote>
  <p><em>“You can write object-oriented code (useful for filesystems etc) in C, _without_ the crap that is C++.”</em></p>

  <p>“你可以用C编写面向对象的代码（对文件系统等很有用），而不需要C++中的那些垃圾。”</p>
</blockquote>

<p>Linux内核用C语言的结构体和函数指针实现了充分的面向对象设计。</p>

<p>这些观点揭示了Linux内核对于编程语言的核心要求：<strong>透明性、可控性和确定性</strong>。C++虽然功能强大，但其隐式行为和复杂的抽象机制与内核开发的哲学背道而驰。</p>

<p>相比之下，Rust通过所有权系统在编译期强制执行安全规则，没有运行时开销，且所有的内存管理都是显式的。Zig则更进一步，完全消除了隐式行为。这两种语言都比C++更符合内核开发的需求。</p>

<h2 id="-zig的现状与角色">💡 Zig的现状与角色</h2>

<p>尽管没能成为内核的”第二语言”，Zig在Linux生态中正找到一个独特的切入点。Zig凭借其出色的<strong>交叉编译能力</strong>和<strong>精细的内存控制</strong>，正成为优化系统工具和基础设施的有力选择<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">13</a></sup>。其内置的构建系统和工具链，即使在传统的C/C++项目中也展现出显著的优势。</p>

<h2 id="深入理解什么是raii">深入理解：什么是RAII？</h2>

<p>在讨论Rust的优势时，RAII是一个绕不开的话题。</p>

<p>RAII是<strong>R</strong>esource <strong>A</strong>cquisition <strong>I</strong>s <strong>I</strong>nitialization（资源获取即初始化）的缩写。它在C++中普及，并被Rust等语言继承和发展，是管理内存、文件句柄、锁等系统资源的核心范式。</p>

<p>核心思想是：<strong>将资源的生命周期，与对象的生命周期严格绑定</strong>。</p>

<h3 id="工作原理">工作原理</h3>

<p>简单来说，RAII通过构造函数和析构函数这对”钩子”，实现了资源的自动管理：</p>

<ul>
  <li><strong>获取（初始化时）</strong>：当你创建一个对象时，它的构造函数会自动获取资源（如分配内存、打开文件）</li>
  <li><strong>释放（销毁时）</strong>：当对象离开作用域被销毁时，它的析构函数会自动释放资源</li>
</ul>

<p>这确保了资源绝不会泄漏，即使发生异常，只要对象被销毁，析构函数就一定会被调用，实现<strong>异常安全</strong>。</p>

<p><strong>Rust中的自动释放机制</strong></p>

<p>Rust通过<code class="language-plaintext highlighter-rouge">Drop</code> trait实现析构函数<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">14</a></sup>。当变量离开作用域时，Rust编译器会自动调用该类型的<code class="language-plaintext highlighter-rouge">drop</code>方法。以自旋锁为例：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 简化的SpinLockGuard实现</span>
<span class="k">impl</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="nb">Drop</span> <span class="k">for</span> <span class="n">SpinLockGuard</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">drop</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// 当guard被销毁时，这个方法会自动调用</span>
        <span class="k">self</span><span class="py">.lock</span><span class="nf">.unlock</span><span class="p">();</span> <span class="c1">// 释放锁</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>关键机制包括：</p>

<ol>
  <li><strong>作用域规则</strong>：变量在离开其作用域（通常由花括号<code class="language-plaintext highlighter-rouge">{}</code>界定）时被销毁<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">15</a></sup></li>
  <li><strong>自动调用</strong>：编译器在编译时就确定在哪里插入<code class="language-plaintext highlighter-rouge">drop()</code>调用，这是零成本抽象</li>
  <li><strong>异常安全</strong>：即使发生<code class="language-plaintext highlighter-rouge">panic</code>或提前返回，<code class="language-plaintext highlighter-rouge">drop</code>也会被调用，确保资源释放</li>
</ol>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">guard</span> <span class="o">=</span> <span class="n">spinlock</span><span class="nf">.lock</span><span class="p">();</span> <span class="c1">// 获取锁</span>

    <span class="k">if</span> <span class="n">error_condition</span> <span class="p">{</span>
        <span class="k">return</span><span class="p">;</span> <span class="c1">// 提前返回</span>
        <span class="c1">// guard在此离开作用域，drop被自动调用，锁被释放</span>
    <span class="p">}</span>

    <span class="nf">do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">guard</span><span class="p">)</span><span class="o">?</span><span class="p">;</span> <span class="c1">// 如果出错</span>
    <span class="c1">// guard在此离开作用域，drop被自动调用，锁被释放</span>

<span class="p">}</span> <span class="c1">// 正常情况下，guard在此离开作用域，锁被释放</span>
</code></pre></div></div>

<p>这就是为什么说”开发者无法忘记解锁” - 不是靠记忆力或代码审查，而是<strong>编译器强制保证</strong>的。</p>

<h3 id="在内核开发中的价值">在内核开发中的价值</h3>

<p>对于Linux内核这样的底层系统，RAII的价值巨大。传统C语言使用<code class="language-plaintext highlighter-rouge">goto</code>语句集中处理错误，容易遗漏。而RAII可以彻底解决这个痛点。</p>

<p>以Rust代码为例，它展示了如何安全地管理一个内核自旋锁：</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 解锁动作被自动"绑定"到了guard对象上</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">guard</span> <span class="o">=</span> <span class="n">spinlock</span><span class="nf">.lock</span><span class="p">();</span> <span class="c1">// `lock()`获取锁，返回一个guard对象</span>
<span class="nf">do_something</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">guard</span><span class="p">);</span>         <span class="c1">// 通过guard访问数据</span>
<span class="c1">// guard在此处被销毁，锁被自动释放</span>
</code></pre></div></div>

<p>开发者无法忘记解锁，即使在<code class="language-plaintext highlighter-rouge">do_something</code>中发生错误，锁也会被正确释放。这对于构建高可靠的驱动和内核模块至关重要。</p>

<h3 id="rust的raii与所有权">Rust的RAII与所有权</h3>

<p>相比C++，Rust将RAII提升到了语言核心位置。通过<strong>所有权（Ownership）</strong>机制，Rust强制要求每个资源有唯一的所有者。当所有者离开作用域，资源被自动释放，从根本上杜绝了悬空指针和重复释放的问题。</p>

<h3 id="zig的资源管理方式">Zig的资源管理方式</h3>

<p>Zig采取了不同的设计哲学。虽然Zig提供了<code class="language-plaintext highlighter-rouge">defer</code>关键字来简化资源清理（类似Go），但它强调”零隐式行为”<sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">7</a></sup>，资源释放需要开发者显式编写，由编译器在编译期验证控制流：</p>

<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">file</span> <span class="o">=</span> <span class="k">try</span> <span class="n">std</span><span class="p">.</span><span class="py">fs</span><span class="p">.</span><span class="nf">cwd</span><span class="p">().</span><span class="nf">openFile</span><span class="p">(</span><span class="s">"file.txt"</span><span class="p">,</span> <span class="o">.</span><span class="p">{});</span>
<span class="k">defer</span> <span class="n">file</span><span class="p">.</span><span class="nf">close</span><span class="p">();</span> <span class="c">// 必须显式写defer</span>
</code></pre></div></div>

<p>这种方式给了开发者最大的控制权和可预测性，但在规模庞大、错误路径复杂的Linux内核中，需要人工确保每个分支都正确处理资源释放，审查负担相对较大。</p>

<p>相比之下，Rust的RAII通过类型系统和编译器强制保证资源释放，提供了”自动、安全、无法遗忘”的资源管理能力，更符合内核对安全性的极致要求。</p>

<h2 id="深入理解zig相比c的实质性提升">深入理解：Zig相比C的实质性提升</h2>

<p>有人可能会认为”Zig相比C提升不大”，这个说法<strong>并不准确</strong>。如果Zig相比C提升不大，它不会在系统编程社区获得越来越多的关注。</p>

<p>更准确的表述是：<strong>Zig在”显式控制”路径上做到了极致，而Rust在”安全抽象”路径上做到了极致</strong>。两者都远超C，只是方向不同。</p>

<h3 id="1-编译期执行comptime">1. 编译期执行（Comptime）</h3>

<p>这是Zig最革命性的特性，C完全没有：</p>

<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// 泛型数据结构 - C需要void*或宏，极其别扭</span>
<span class="k">fn</span> <span class="n">List</span><span class="p">(</span><span class="k">comptime</span> <span class="n">T</span><span class="p">:</span> <span class="k">type</span><span class="p">)</span> <span class="k">type</span> <span class="p">{</span>
    <span class="k">return</span> <span class="k">struct</span> <span class="p">{</span>
        <span class="n">items</span><span class="p">:</span> <span class="p">[]</span><span class="n">T</span><span class="p">,</span>
        <span class="n">len</span><span class="p">:</span> <span class="kt">usize</span><span class="p">,</span>
    <span class="p">};</span>
<span class="p">}</span>

<span class="c">// 使用</span>
<span class="k">var</span> <span class="n">int_list</span> <span class="o">=</span> <span class="n">List</span><span class="p">(</span><span class="kt">i32</span><span class="p">){};</span>
<span class="k">var</span> <span class="n">string_list</span> <span class="o">=</span> <span class="n">List</span><span class="p">([]</span><span class="kt">u8</span><span class="p">){};</span>
</code></pre></div></div>

<p>在C语言中，这要么用宏写出难以调试的代码，要么用<code class="language-plaintext highlighter-rouge">void*</code>牺牲类型安全。</p>

<h3 id="2-真正的错误处理">2. 真正的错误处理</h3>

<p>C的错误处理靠返回值+<code class="language-plaintext highlighter-rouge">errno</code>，极易被忽略：</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// C - 容易忘记检查返回值</span>
<span class="kt">FILE</span> <span class="o">*</span><span class="n">f</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="s">"file.txt"</span><span class="p">,</span> <span class="s">"r"</span><span class="p">);</span>
<span class="n">fread</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">f</span><span class="p">);</span>  <span class="c1">// 如果fopen失败？崩溃！</span>
</code></pre></div></div>

<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Zig - 错误必须处理</span>
<span class="k">const</span> <span class="n">file</span> <span class="o">=</span> <span class="k">try</span> <span class="n">std</span><span class="p">.</span><span class="py">fs</span><span class="p">.</span><span class="nf">cwd</span><span class="p">().</span><span class="nf">openFile</span><span class="p">(</span><span class="s">"file.txt"</span><span class="p">,</span> <span class="o">.</span><span class="p">{});</span>
<span class="c">// 如果openFile失败，try会向上传播错误，不会默默继续</span>
<span class="k">defer</span> <span class="n">file</span><span class="p">.</span><span class="nf">close</span><span class="p">();</span>
</code></pre></div></div>

<p>Zig通过语言机制强制处理错误，但又不像Java的异常那样有运行时开销。</p>

<h3 id="3-真正的无未定义行为">3. 真正的无未定义行为</h3>

<p>C语言充满了未定义行为：有符号整数溢出、空指针解引用、缓冲区溢出等。编译器会基于”未定义行为不会发生”做激进优化，导致隐蔽的bug。</p>

<p>Zig定义了所有操作的语义：</p>
<ul>
  <li>有符号整数溢出是<strong>明确定义的wrapping行为</strong>（或可以通过<code class="language-plaintext highlighter-rouge">@addWithOverflow</code>检查）</li>
  <li>数组访问有<strong>边界检查</strong>（release快速模式下可关闭）</li>
  <li>整数转换是<strong>显式的</strong>，不会隐式截断</li>
</ul>

<h3 id="4-交叉编译是一等公民">4. 交叉编译是一等公民</h3>

<p>C的交叉编译是噩梦：需要配置工具链、头文件路径、库路径等。</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Zig - 直接指定目标</span>
zig build-exe <span class="nt">--target</span> riscv64-linux-gnu myapp.zig
<span class="c"># 无需安装任何东西，Zig内置了目标平台的libc</span>
</code></pre></div></div>

<h3 id="5-构建系统内置告别make">5. 构建系统内置，告别make</h3>

<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// build.zig - 这是Zig代码，不是DSL</span>
<span class="k">const</span> <span class="n">exe</span> <span class="o">=</span> <span class="n">b</span><span class="p">.</span><span class="nf">addExecutable</span><span class="p">(</span><span class="s">"myapp"</span><span class="p">,</span> <span class="s">"src/main.zig"</span><span class="p">);</span>
<span class="n">exe</span><span class="p">.</span><span class="nf">linkLibC</span><span class="p">();</span>
<span class="n">exe</span><span class="p">.</span><span class="nf">addIncludePath</span><span class="p">(</span><span class="s">"/usr/include"</span><span class="p">);</span>
</code></pre></div></div>

<p>C语言从诞生至今都没有语言层面的标准构建系统，依然依赖于<code class="language-plaintext highlighter-rouge">make</code>、<code class="language-plaintext highlighter-rouge">cmake</code>、<code class="language-plaintext highlighter-rouge">autotools</code>等第三方工具。</p>

<h3 id="为什么说提升不大的错觉存在">为什么说”提升不大”的错觉存在？</h3>

<p>这种印象主要来自<strong>内存安全</strong>这个最受关注的维度：</p>

<table>
  <thead>
    <tr>
      <th>方面</th>
      <th>C</th>
      <th>Zig</th>
      <th>Rust</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>内存安全</td>
      <td>❌ 全靠人工</td>
      <td>⚠️ 更好的工具（可选检查、显式控制）</td>
      <td>✅ 编译器强制保证</td>
    </tr>
    <tr>
      <td>错误处理</td>
      <td>❌ 易忽略</td>
      <td>✅ 语言级强制</td>
      <td>✅ 语言级强制</td>
    </tr>
    <tr>
      <td>泛型编程</td>
      <td>⚠️ 宏/<code class="language-plaintext highlighter-rouge">void*</code></td>
      <td>✅ comptime</td>
      <td>✅ 泛型+trait</td>
    </tr>
    <tr>
      <td>元编程</td>
      <td>⚠️ 宏预处理器</td>
      <td>✅ comptime</td>
      <td>✅ 宏</td>
    </tr>
    <tr>
      <td>学习曲线</td>
      <td>低</td>
      <td>中等</td>
      <td>高</td>
    </tr>
    <tr>
      <td>对现有C代码</td>
      <td>-</td>
      <td>✅ 良好兼容</td>
      <td>⚠️ 需要FFI绑定</td>
    </tr>
  </tbody>
</table>

<p><strong>关键区别</strong>：Rust说”我替你管，你别操心”，Zig说”我给你最好的工具，你来管”。</p>

<h3 id="linux内核场景的结论">Linux内核场景的结论</h3>

<p>回到最初的问题：Linux内核为什么没选Zig？</p>

<p>不是Zig不够好，而是<strong>内核的需求更匹配Rust的安全哲学</strong>：</p>

<ol>
  <li><strong>内核的代价不同</strong>：用户态程序的内存漏洞可能导致进程崩溃；内核的内存漏洞则可能导致权限提升、系统崩溃等严重安全问题</li>
  <li><strong>C代码的常见缺陷</strong>：内核维护者指出，大量bug源于”C语言中那些愚蠢的小陷阱”，包括内存覆写、错误路径清理遗漏、忘记检查错误值和use-after-free错误<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">16</a></sup>，而这些在Rust中完全不存在</li>
  <li><strong>审查负担</strong>：Rust让编译器承担了大部分内存安全审查工作<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">17</a></sup>；而Zig虽然提供了更好的工具，但仍需要人工审查每一处潜在的内存安全问题</li>
</ol>

<p>Zig相比C的<strong>提升很大</strong>，只是在”内存安全”这个特定维度上，它选择了和C类似的路径——给开发者强大的工具，但不强制安全。这让它成为：</p>
<ul>
  <li>需要精细控制嵌入式系统的理想选择</li>
  <li>C代码库渐进式改进的绝佳桥梁</li>
  <li>工具链、构建系统等基础设施的重写利器</li>
</ul>

<p>但在Linux内核这种对<strong>绝对安全</strong>有极致要求的场景，Rust的强制保证确实更胜一筹。</p>

<h2 id="总结">总结</h2>

<p>Linux内核选择Rust而非Zig，是在那个时间点上，对<strong>安全性、成熟度和生态</strong>的综合考量。Rust的编译期内存安全保证、成熟的工具链和庞大的社区，使其成为内核”第二语言”的最佳选择。</p>

<p>而Zig虽然没有进入内核核心，但也凭借其在<strong>资源效率和C互操作性</strong>上的优势，在Linux生态的外围找到了用武之地。两种语言都在推动系统编程的发展，只是选择了不同的路径。</p>

<h3 id="延伸阅读">延伸阅读</h3>

<ul>
  <li><a href="https://docs.kernel.org/rust/index.html">The Linux Kernel - Rust Documentation</a> - Linux内核官方Rust文档</li>
  <li><a href="https://rust-for-linux.com/rust-kernel-policy">Rust Kernel Policy</a> - Rust在Linux内核中的集成政策</li>
  <li><a href="https://www.usenix.org/system/files/atc24-li-hongyu.pdf">An Empirical Study of Rust-for-Linux</a> - USENIX ATC 2024论文，对Rust-for-Linux的实证研究</li>
  <li><a href="https://arxiv.org/html/2407.18431v1">Rusty Linux: Advances in Rust for Linux Kernel Development</a> - arXiv论文，深入分析Rust在Linux内核开发中的进展</li>
</ul>

<h2 id="参考资料">参考资料</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:11" role="doc-endnote">
      <p><a href="https://www.phoronix.com/news/Rust-Is-Merged-Linux-6.1">The Initial Rust Infrastructure Has Been Merged Into Linux 6.1</a> - Phoronix, 2022年10月报道Rust合并到Linux 6.1 <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:12" role="doc-endnote">
      <p><a href="https://www.theregister.com/2022/12/11/linux_6_1/">Linus Torvalds reveals Linux kernel 6.1</a> - The Register, 2022年12月11日报道Linux 6.1发布 <a href="#fnref:12" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:13" role="doc-endnote">
      <p><a href="https://www.infoq.com/news/2022/12/linux-6-1-rust/">Linux 6.1 Officially Adds Support for Rust in the Kernel</a> - InfoQ关于Linux 6.1正式添加Rust支持的详细报道 <a href="#fnref:13" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://rust-for-linux.com/">Rust for Linux</a> - Rust for Linux项目官方网站 <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://www.webpronews.com/linux-kernel-adopts-rust-as-permanent-core-language-in-2025/">Linux Kernel Adopts Rust as Permanent Core Language in 2025</a> - WebProNews, 2025年12月报道 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://mars-research.github.io/doc/2024-acsac-rfl.pdf">Rust for Linux: Understanding the Security Impact of Rust in the Linux Kernel</a> - 研究论文，分析了Rust在Linux内核中的安全影响 <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://ziglang.org/learn/why_zig_rust_d_cpp/">Why Zig When There is Already C++, D, and Rust?</a> - Zig官方文档对比分析 <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p><a href="https://thenewstack.io/rust-integration-in-linux-kernel-faces-challenges-but-shows-progress/">Rust Integration in Linux Kernel Faces Challenges but Shows Progress</a> - The New Stack关于Rust在Linux内核中的进展报道 <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:14" role="doc-endnote">
      <p><a href="https://harmful.cat-v.org/software/c++/linus">Re: Compiling C++ kernel module + Makefile</a> - Linus Torvalds, 2004年1月19日在Linux内核邮件列表的回复 <a href="#fnref:14" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:14:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:14:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:14:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:15" role="doc-endnote">
      <p><a href="https://lwn.net/Articles/249460/">Re: [RFC] Convert builtin-mailinfo.c to use The Better String Library</a> - Linus Torvalds, 2007年9月6日在Git邮件列表关于C++的完整论述 <a href="#fnref:15" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:15:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:16" role="doc-endnote">
      <p><a href="https://www.research.ed.ac.uk/files/78829292/low_cost_deterministic_C_exceptions_for_embedded_systems.pdf">Low-cost deterministic C++ exceptions for embedded systems</a> - University of Edinburgh, 2019年ACM编译器构造国际会议论文 <a href="#fnref:16" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:17" role="doc-endnote">
      <p><a href="https://doi.org/10.1145/3764860.3768332">Propagating C++ exceptions across the user/kernel boundary</a> - Voronetskiy &amp; Spink, University of St Andrews, PLOS 2025 <a href="#fnref:17" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://blog.logrocket.com/comparing-rust-vs-zig-performance-safety-more/">Comparing Rust vs. Zig: Performance, safety, and more</a> - LogRocket技术博客深度对比 <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://doc.rust-lang.org/book/ch15-03-drop.html">Running Code on Cleanup with the Drop Trait</a> - Rust官方文档，详细介绍Drop trait的工作原理 <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://doc.rust-lang.org/rust-by-example/scope/raii.html">RAII - Rust By Example</a> - Rust官方示例，解释RAII模式 <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p><a href="https://www.apriorit.com/dev-blog/rust-for-linux-driver">Linux Driver Development with Rust</a> - Apriorit关于Rust驱动开发的分析，引用内核维护者的观点 <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p><a href="https://www.linuxjournal.com/content/how-rusts-debut-linux-kernel-shoring-system-stability">How Rust’s Debut in the Linux Kernel is Shoring Up System Stability</a> - Linux Journal关于Rust如何提升内核稳定性 <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[2022年12月，Linux 6.1正式发布，首次将Rust作为内核的第二种编程语言。本文深入分析了Linux内核选择Rust而非Zig的核心原因，包括时机、语言特性差异、社区生态等多个维度，并探讨了两种语言在系统编程领域的不同定位。]]></summary></entry><entry><title type="html">AI Reshaping Software Development Workflow: From Code Writer to AI Conductor</title><link href="https://weinan.io/2026/02/10/ai-reshaping-software-development-workflow.html" rel="alternate" type="text/html" title="AI Reshaping Software Development Workflow: From Code Writer to AI Conductor" /><published>2026-02-10T00:00:00+00:00</published><updated>2026-02-10T00:00:00+00:00</updated><id>https://weinan.io/2026/02/10/ai-reshaping-software-development-workflow</id><content type="html" xml:base="https://weinan.io/2026/02/10/ai-reshaping-software-development-workflow.html"><![CDATA[<style>
/* Mermaid diagram container */
.mermaid-container {
  position: relative;
  display: block;
  cursor: pointer;
  transition: opacity 0.2s;
  max-width: 100%;
  margin: 20px 0;
  overflow-x: auto;
}

.mermaid-container:hover {
  opacity: 0.8;
}

.mermaid-container svg {
  max-width: 100%;
  height: auto;
  display: block;
}

/* Modal overlay */
.mermaid-modal {
  display: none;
  position: fixed;
  z-index: 9999;
  left: 0;
  top: 0;
  width: 100%;
  height: 100%;
  background-color: rgba(0, 0, 0, 0.9);
  animation: fadeIn 0.3s;
}

.mermaid-modal.active {
  display: flex;
  align-items: center;
  justify-content: center;
}

@keyframes fadeIn {
  from { opacity: 0; }
  to { opacity: 1; }
}

/* Modal content */
.mermaid-modal-content {
  position: relative;
  width: 90vw;
  height: 90vh;
  overflow: hidden;
  background: white;
  padding: 20px;
  border-radius: 8px;
  box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
  display: flex;
  align-items: center;
  justify-content: center;
}

.mermaid-modal-diagram {
  transform-origin: center center;
  transition: transform 0.2s ease;
  display: inline-block;
  min-width: 100%;
  cursor: grab;
  user-select: none;
}

.mermaid-modal-diagram.dragging {
  cursor: grabbing;
  transition: none;
}

.mermaid-modal-diagram svg {
  width: 100%;
  height: auto;
  display: block;
  pointer-events: none;
}

/* Control buttons */
.mermaid-controls {
  position: absolute;
  top: 10px;
  right: 10px;
  display: flex;
  gap: 8px;
  z-index: 10000;
}

.mermaid-btn {
  background: rgba(255, 255, 255, 0.9);
  border: 1px solid #ddd;
  border-radius: 4px;
  padding: 8px 12px;
  cursor: pointer;
  font-size: 14px;
  transition: background 0.2s;
  color: #333;
  font-weight: 500;
}

.mermaid-btn:hover {
  background: white;
  box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}

/* Close button */
.mermaid-close {
  background: #f44336;
  color: white;
  border: none;
}

.mermaid-close:hover {
  background: #d32f2f;
}

/* Zoom indicator */
.mermaid-zoom-level {
  position: absolute;
  bottom: 20px;
  left: 20px;
  background: rgba(0, 0, 0, 0.7);
  color: white;
  padding: 6px 12px;
  border-radius: 4px;
  font-size: 14px;
  z-index: 10000;
}
</style>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';

  mermaid.initialize({
    startOnLoad: false,
    theme: 'default',
    securityLevel: 'loose',
    htmlLabels: true,
    themeVariables: {
      fontSize: '14px'
    }
  });

  let currentZoom = 1;
  let currentModal = null;
  let isDragging = false;
  let startX = 0;
  let startY = 0;
  let translateX = 0;
  let translateY = 0;

  // Create modal HTML
  function createModal() {
    const modal = document.createElement('div');
    modal.className = 'mermaid-modal';
    modal.innerHTML = `
      <div class="mermaid-controls">
        <button class="mermaid-btn zoom-in">放大 +</button>
        <button class="mermaid-btn zoom-out">缩小 -</button>
        <button class="mermaid-btn zoom-reset">重置</button>
        <button class="mermaid-btn mermaid-close">关闭 ✕</button>
      </div>
      <div class="mermaid-modal-content">
        <div class="mermaid-modal-diagram"></div>
      </div>
      <div class="mermaid-zoom-level">100%</div>
    `;
    document.body.appendChild(modal);
    return modal;
  }

  // Show modal with diagram
  function showModal(diagramContent) {
    if (!currentModal) {
      currentModal = createModal();
      setupModalEvents();
    }

    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.innerHTML = diagramContent;

    // Remove any width/height attributes from SVG to make it responsive
    const svg = modalDiagram.querySelector('svg');
    if (svg) {
      svg.removeAttribute('width');
      svg.removeAttribute('height');
      svg.style.width = '100%';
      svg.style.height = 'auto';
    }

    // Setup drag functionality
    setupDrag(modalDiagram);

    currentModal.classList.add('active');
    currentZoom = 1;
    resetPosition();
    updateZoom();
    document.body.style.overflow = 'hidden';
  }

  // Hide modal
  function hideModal() {
    if (currentModal) {
      currentModal.classList.remove('active');
      document.body.style.overflow = '';
    }
  }

  // Update zoom level
  function updateZoom() {
    if (!currentModal) return;
    const diagram = currentModal.querySelector('.mermaid-modal-diagram');
    const zoomLevel = currentModal.querySelector('.mermaid-zoom-level');
    diagram.style.transform = `translate(${translateX}px, ${translateY}px) scale(${currentZoom})`;
    zoomLevel.textContent = `${Math.round(currentZoom * 100)}%`;
  }

  // Reset position when zoom changes
  function resetPosition() {
    translateX = 0;
    translateY = 0;
  }

  // Setup drag functionality
  function setupDrag(element) {
    element.addEventListener('mousedown', startDrag);
    element.addEventListener('touchstart', startDrag);
  }

  function startDrag(e) {
    if (e.type === 'mousedown' && e.button !== 0) return; // Only left click

    isDragging = true;
    const modalDiagram = currentModal.querySelector('.mermaid-modal-diagram');
    modalDiagram.classList.add('dragging');

    if (e.type === 'touchstart') {
      startX = e.touches[0].clientX - translateX;
      startY = e.touches[0].clientY - translateY;
    } else {
      startX = e.clientX - translateX;
      startY = e.clientY - translateY;
    }

    document.addEventListener('mousemove', drag);
    document.addEventListener('touchmove', drag);
    document.addEventListener('mouseup', stopDrag);
    document.addEventListener('touchend', stopDrag);
  }

  function drag(e) {
    if (!isDragging) return;
    e.preventDefault();

    if (e.type === 'touchmove') {
      translateX = e.touches[0].clientX - startX;
      translateY = e.touches[0].clientY - startY;
    } else {
      translateX = e.clientX - startX;
      translateY = e.clientY - startY;
    }

    updateZoom();
  }

  function stopDrag() {
    isDragging = false;
    const modalDiagram = currentModal?.querySelector('.mermaid-modal-diagram');
    if (modalDiagram) {
      modalDiagram.classList.remove('dragging');
    }
    document.removeEventListener('mousemove', drag);
    document.removeEventListener('touchmove', drag);
    document.removeEventListener('mouseup', stopDrag);
    document.removeEventListener('touchend', stopDrag);
  }

  // Setup modal event listeners
  function setupModalEvents() {
    if (!currentModal) return;

    // Close button
    currentModal.querySelector('.mermaid-close').addEventListener('click', hideModal);

    // Zoom buttons
    currentModal.querySelector('.zoom-in').addEventListener('click', () => {
      currentZoom = Math.min(currentZoom + 0.25, 3);
      updateZoom();
    });

    currentModal.querySelector('.zoom-out').addEventListener('click', () => {
      currentZoom = Math.max(currentZoom - 0.25, 0.5);
      updateZoom();
    });

    currentModal.querySelector('.zoom-reset').addEventListener('click', () => {
      currentZoom = 1;
      resetPosition();
      updateZoom();
    });

    // Close on background click
    currentModal.addEventListener('click', (e) => {
      if (e.target === currentModal) {
        hideModal();
      }
    });

    // Close on ESC key
    document.addEventListener('keydown', (e) => {
      if (e.key === 'Escape' && currentModal.classList.contains('active')) {
        hideModal();
      }
    });
  }

  // Convert Jekyll-rendered code blocks to mermaid divs
  document.addEventListener('DOMContentLoaded', async function() {
    const codeBlocks = document.querySelectorAll('code.language-mermaid');

    for (const codeBlock of codeBlocks) {
      const pre = codeBlock.parentElement;
      const container = document.createElement('div');
      container.className = 'mermaid-container';

      const mermaidDiv = document.createElement('div');
      mermaidDiv.className = 'mermaid';
      mermaidDiv.textContent = codeBlock.textContent;

      container.appendChild(mermaidDiv);
      pre.replaceWith(container);
    }

    // Render all mermaid diagrams
    try {
      await mermaid.run({
        querySelector: '.mermaid'
      });
      console.log('Mermaid diagrams rendered successfully');
    } catch (error) {
      console.error('Mermaid rendering error:', error);
    }

    // Add click handlers to rendered diagrams
    document.querySelectorAll('.mermaid-container').forEach((container, index) => {
      // Find the rendered SVG inside the container
      const svg = container.querySelector('svg');
      if (!svg) {
        console.warn(`No SVG found in container ${index}`);
        return;
      }

      // Make the container clickable
      container.style.cursor = 'pointer';
      container.title = '点击查看大图';

      container.addEventListener('click', function(e) {
        e.preventDefault();
        e.stopPropagation();

        // Clone the SVG for the modal
        const svgClone = svg.cloneNode(true);
        const tempDiv = document.createElement('div');
        tempDiv.appendChild(svgClone);

        console.log('Opening modal for diagram', index);
        showModal(tempDiv.innerHTML);
      });

      console.log(`Click handler added to diagram ${index}`);
    });
  });
</script>

<p><strong>Abstract:</strong> AI coding assistants such as GitHub Copilot, Claude, and ChatGPT are evolving from mere auxiliary tools into core participants in our workflows. This report argues that the transformation is not simply about “efficiency gains,” but a systemic restructuring of developer roles, work focus, and team collaboration models. The core value of developers is shifting upward from “writing code” to “architectural design, requirements analysis, and quality control,” driving the entire R&amp;D process toward greater automation and intelligence.</p>

<hr />

<h4 id="1-core-transformation-from-code-writer-to-ai-conductor-and-quality-commander"><strong>1. Core Transformation: From “Code Writer” to “AI Conductor and Quality Commander”</strong></h4>

<p>The deep integration of AI tools has led to a significant shift in how developers allocate their time, fundamentally changing their roles:</p>

<h5 id="11-work-focus-shift">1.1 Work Focus Shift</h5>

<ul>
  <li><strong>Decreased time on:</strong>
    <ul>
      <li>Manually writing detailed implementation code</li>
      <li>Creating basic boilerplate files</li>
      <li>Looking up basic API documentation</li>
    </ul>
  </li>
  <li><strong>Increased time on:</strong>
    <ul>
      <li><strong>Deep analysis and decomposition:</strong> Greater focus on understanding complex business logic and precisely breaking down macro requirements into fine-grained tasks (Issues/Prompts) that AI can understand and execute</li>
      <li><strong>Learning and prompt engineering:</strong> Learning how to collaborate effectively with AI, including writing clear prompts, providing effective context, and iteratively optimizing instructions</li>
      <li><strong>Review and integration:</strong> Core work becomes <strong>reviewing AI-submitted code (PRs)</strong>, judging its correctness, security, performance, and fit with the overall architecture</li>
      <li><strong>System design and planning:</strong> More energy invested in higher-level architectural design, technology selection, and long-term technical debt management</li>
    </ul>
  </li>
</ul>

<h5 id="12-evolution-of-required-capabilities">1.2 Evolution of Required Capabilities</h5>

<ul>
  <li><strong>Extremely high demand for “holistic grasp capability”:</strong> Developers must have a clearer understanding of the system overview, inter-module relationships, and data flow to effectively guide AI and judge its output. <strong>“Knowing what to build” is more important than “knowing how to write it.”</strong></li>
  <li><strong>Critical thinking and discernment become key:</strong> Must possess sharp judgment to quickly identify potential logical flaws, security risks, performance bottlenecks, or “eloquent nonsense” in AI-generated code</li>
  <li><strong>Communication and definition capabilities are amplified:</strong> The ability to communicate with AI (and through AI with the team)—precisely defining problem boundaries and acceptance criteria—directly determines output quality</li>
</ul>

<h4 id="2-direct-impact-leap-in-efficiency-density-and-automation-level"><strong>2. Direct Impact: Leap in Efficiency, Density, and Automation Level</strong></h4>

<h5 id="21-significantly-faster-development-efficiency-and-progress">2.1 Significantly Faster Development Efficiency and Progress</h5>

<ul>
  <li><strong>Shortened coding cycles:</strong> Repetitive, pattern-based coding work is greatly compressed, accelerating feature implementation</li>
  <li><strong>Accelerated learning curve:</strong> AI serves as a real-time tutor, quickly answering technical questions and providing examples, helping developers rapidly master new languages and frameworks, thereby increasing learning intensity and effectiveness</li>
</ul>

<h5 id="22-increased-work-density-and-output-expectations">2.2 Increased Work Density and Output Expectations</h5>

<p>Within the same time unit, as basic coding accelerates, individuals are expected to handle more complex logic, complete more functional modules, or be responsible for broader domains. This brings higher <strong>cognitive work density</strong>.</p>

<h5 id="23-triggering-enhanced-rd-process-automation">2.3 Triggering Enhanced R&amp;D Process Automation</h5>

<p>AI introduction catalyzes the idealized “fully automated pipeline” vision closer to reality:</p>

<ul>
  <li><strong>Starting point:</strong> User or developer submits a structured Issue (serving as a natural language instruction)</li>
  <li><strong>AI execution:</strong> AI agent understands the task, writes code, and automatically submits a PR</li>
  <li><strong>Automated quality gates:</strong> Triggers automated testing (unit, integration), code quality scanning, security detection</li>
  <li><strong>Automated delivery:</strong> After tests pass, code is automatically merged and deployed to the test environment, triggering more complex end-to-end automated tests</li>
  <li><strong>Automated feedback:</strong> Test reports are automatically generated and submitted</li>
</ul>

<p><strong>In this process, the core responsibility of developers is to design and maintain this automation pipeline and handle exceptions and critical decision points requiring human wisdom.</strong></p>

<h4 id="3-potential-challenges-and-future-outlook"><strong>3. Potential Challenges and Future Outlook</strong></h4>

<h5 id="31-challenges-and-risks">3.1 Challenges and Risks</h5>

<ul>
  <li><strong>Over-reliance and skill degradation risk:</strong> Need to guard against potential “use it or lose it” in basic coding ability, debugging depth, and understanding of underlying principles</li>
  <li><strong>Code quality and consistency governance:</strong> AI-generated code may have inconsistent styles and hidden defects, requiring stronger code review culture and automated quality gates</li>
  <li><strong>New security and compliance topics:</strong> AI may introduce code with security vulnerabilities or copyright-contaminated code, requiring new detection tools and audit processes</li>
  <li><strong>Team collaboration model adjustment:</strong> Issue descriptions need extreme precision; code review standards and processes need redefinition to adapt to the new scenario of “humans reviewing AI code”</li>
</ul>

<h5 id="32-future-outlook">3.2 Future Outlook</h5>

<ul>
  <li><strong>Increased developer stratification:</strong> “Commander-type” developers who are good at leveraging AI, possess global vision, and strong critical thinking will become more valuable. Workflows may further stratify, with some focusing on business and architecture definition, and others on AI orchestration and result optimization</li>
  <li><strong>Birth of “AI-native” workflows:</strong> Future development tools and project management platforms will integrate AI agents from the design phase, enabling more seamless and intelligent connections from requirements documentation to production deployment</li>
  <li><strong>Lowered innovation barriers, unleashed creativity:</strong> Developers can be freed from heavy implementation details, investing more time and intellect in genuine innovation, user experience optimization, and solving complex business problems</li>
</ul>

<h4 id="conclusion"><strong>Conclusion</strong></h4>

<p>The introduction of AI tools is not merely a simple tool upgrade, but a <strong>deep restructuring of the software development workflow</strong>. It is liberating developers from the traditional “code monkey” role, pushing them upstream in the value chain—to become <strong>system designers, AI trainers and orchestrators, and ultimate quality owners</strong>. Organizations and individuals who successfully adapt to this transformation will achieve a dual leap in productivity and innovation capability, building more powerful and automated intelligent R&amp;D systems. The core of this process lies in: <strong>humans focusing wisdom on defining “what to do” and “why,” while increasingly delegating the specific execution of “how to do it” to AI for completion and optimization.</strong></p>

<h4 id="4-beyond-the-horizon-when-ai-becomes-fully-autonomous"><strong>4. Beyond the Horizon: When AI Becomes Fully Autonomous</strong></h4>

<p>The current workflow paradigm still maintains human leadership—humans define requirements, guide AI execution, and make final decisions. However, looking toward a more distant future, what if AI could autonomously generate requirements, organize and prioritize them, completely take over testing, and achieve self-iteration? In such a scenario, the entire development cycle might operate without human intervention.</p>

<p>This possibility raises profound questions that transcend technical considerations:</p>

<p><strong>4.1 Human-Centricity of AI-Generated Requirements</strong></p>

<p>If AI autonomously creates product requirements and feature roadmaps, can we ensure these requirements genuinely serve human needs and center around human values? Without human participation in the requirements generation phase, there is a risk that AI might optimize for metrics that appear rational but deviate from authentic human needs—pursuing efficiency, scalability, or algorithmic elegance while overlooking nuances of human experience, emotional needs, or cultural context.</p>

<p><strong>4.2 Alignment of AI’s World Model with Human Understanding</strong></p>

<p>Does AI’s understanding of the world align with human understanding and goals? Current AI systems learn from human-generated data and exhibit pattern-matching capabilities, but they lack genuine comprehension of meaning, context, and human intentionality. If AI systems were to operate with full autonomy, would their model of “what is valuable,” “what is correct,” and “what is desirable” converge with humanity’s collective values and long-term interests?</p>

<p><strong>4.3 Current Reality: The Absence of AI Self-Awareness</strong></p>

<p>Importantly, we currently see no evidence of AI possessing self-awareness or autonomous consciousness. Today’s AI systems, regardless of their sophistication, remain fundamentally tools—powerful pattern recognizers and generators that operate within the boundaries of their training and programming. They do not possess desires, intentions, or self-directed goals. This distinction is crucial: the scenarios described above remain speculative, contingent on breakthroughs in AI capabilities that may or may not occur, and that would raise entirely new categories of philosophical, ethical, and governance challenges.</p>

<p><strong>The Critical Imperative:</strong></p>

<p>As we advance along the path of AI-augmented development, maintaining human agency, judgment, and ethical oversight remains not merely advisable but essential. The “human-in-the-loop” is not a limitation to be overcome, but a safeguard ensuring that technology serves humanity’s authentic interests and reflects our values, priorities, and collective wisdom.</p>

<hr />

<h4 id="modern-software-development-workflow-enhanced-by-ai"><strong>Modern Software Development Workflow Enhanced by AI</strong></h4>

<pre><code class="language-mermaid">flowchart TD
    subgraph A [Traditional Workflow Comparison]
        A1[Requirements Analysis] --&gt; A2[Design and Planning]
        A2 --&gt; A3[Manual Coding]
        A3 --&gt; A4[Manual Testing]
        A4 --&gt; A5[Code Review]
        A5 --&gt; A6[Manual Deployment]
        A6 --&gt; A7[Production Testing]
    end

    subgraph B [AI-Enhanced Modern Workflow]
        direction TB
        B1[Deep Requirements Analysis and Decomposition] --&gt; B2[Write Precise Issue/Prompt]
        B2 --&gt; B3{AI Agent Execution}

        B3 --&gt; B4[AI Writes Code and Submits PR]

        subgraph B5 [Pre-Merge Quality Gates&lt;br/&gt;Pre-Merge Validation]
            direction LR
            B5a[⏱️ Automated Unit Tests] --&gt; B5b[🔍 Code Quality Scan&lt;br/&gt;SonarQube etc]
            B5b --&gt; B5c[🛡️ Security Scan&lt;br/&gt;SAST/SCA]
            B5c --&gt; B5d[✅ Basic Integration Tests]
        end

        B4 --&gt; B5
        B5 --&gt; B6{Pre-Merge Pass?}

        B6 -- ✅ Yes --&gt; B7[Auto-Merge to Main Branch]
        B6 -- ❌ No --&gt; B8[Developer/Reviewer Intervenes]
        B8 --&gt; B9[Modify Prompt/Code or Close PR]
        B9 --&gt; B2

        B7 --&gt; B10[Post-Merge Auto-Trigger]

        subgraph B11 [Post-Merge Validation&lt;br/&gt;Post-Merge Verification &amp; Delivery]
            direction LR
            B11a[🚀 Auto-Deploy to Test Env] --&gt; B11b[🧪 Automated E2E Tests]
            B11b --&gt; B11c[📊 Performance Testing]
            B11c --&gt; B11d[🎯 Automated UAT]
        end

        B10 --&gt; B11
        B11 --&gt; B12[Auto-Generate Test Report]
        B12 --&gt; B13[Notify Stakeholders&lt;br/&gt;Ready for Production]
    end

    subgraph C [Key Role &amp; Process Changes]
        C1[Pre-Merge Gatekeeper&lt;br/&gt;Reviewers ensure code quality baseline]
        C2[Post-Merge Validator&lt;br/&gt;Verify system integration &amp; behavior]
        C3[Human Responsibilities Focus&lt;br/&gt;Design/Decision/Exception Handling]

        C1 -- Quality Defense Forward --&gt; C2
        C3 -- Supervise Both Ends --&gt; C1
        C3 -- Focus on Results --&gt; C2
    end

    A -- Workflow Intelligence Restructuring --&gt; B
    A3 -. Manual Coding Reduced .-&gt; B3
    B5 -. Requires Precise Prompts and Context .-&gt; B2
    B6 -. Core Human Decision Point .-&gt; C3
    B12 -. Increased Automation Level .-&gt; C2
</code></pre>

<hr />

<h3 id="分析报告ai工具引入对软件研发工作流的重构与影响"><strong>分析报告：AI工具引入对软件研发工作流的重构与影响</strong></h3>

<p><strong>报告摘要：</strong> 以GitHub Copilot、Claude、ChatGPT等为代表的AI编码助手，正从辅助工具演变为工作流的核心参与者。本报告分析指出，其带来的并非简单的”效率提升”，而是一次对开发者角色、工作重心和团队协作模式的系统性重构。开发者的核心价值正从”编写代码”上移至”架构设计、需求分析与质量把控”，并推动研发全流程向更自动化、更智能化的方向演进。</p>

<hr />

<h4 id="一-核心转变从代码编写者到ai调度与质量指挥官"><strong>一、 核心转变：从”代码编写者”到”AI调度与质量指挥官”</strong></h4>

<p>AI工具的深度集成，直接导致了开发者时间分配的显著转移，其角色发生了根本性变化：</p>

<h5 id="11-工作重心转移">1.1 工作重心转移</h5>

<ul>
  <li><strong>减少：</strong> 直接手写具体实现代码、编写基础样板文件、查阅基础API文档的时间</li>
  <li><strong>增加：</strong>
    <ul>
      <li><strong>深度分析与拆解：</strong> 更专注于理解复杂业务逻辑，并将宏观需求精准拆解为AI可理解、可执行的细颗粒度任务（Issue/Prompt）</li>
      <li><strong>学习与提示工程：</strong> 学习如何高效与AI协作，包括编写清晰的Prompt、提供有效的上下文、迭代优化指令</li>
      <li><strong>审核与集成：</strong> 核心工作变为<strong>审核AI提交的代码（PR）</strong>，判断其正确性、安全性、性能及与整体架构的契合度</li>
      <li><strong>系统设计与规划：</strong> 有更多精力投入到更高层次的架构设计、技术选型和长期技术债务管理</li>
    </ul>
  </li>
</ul>

<h5 id="12-能力要求演变">1.2 能力要求演变</h5>

<ul>
  <li><strong>对”整体把握能力”要求极高：</strong> 开发者必须对系统全貌、模块间关系、数据流有更清晰的认识，才能有效指导AI和判断其产出。<strong>“知道要什么”比”知道怎么写”更重要。</strong></li>
  <li><strong>批判性思维与甄别能力成为关键：</strong> 必须具备火眼金睛，能快速识别AI代码中潜在的逻辑漏洞、安全风险、性能瓶颈或”一本正经的胡说八道”</li>
  <li><strong>沟通与定义能力被放大：</strong> 与AI（以及通过AI与团队）的沟通能力——即精准定义问题边界和验收标准的能力——直接决定产出质量</li>
</ul>

<h4 id="二-直接影响效率密度与自动化水平的跃升"><strong>二、 直接影响：效率、密度与自动化水平的跃升</strong></h4>

<h5 id="21-开发效率与进度显著加快">2.1 开发效率与进度显著加快</h5>

<ul>
  <li><strong>缩短编码周期：</strong> 重复性、模式化的编码工作被极大压缩，功能实现速度提升</li>
  <li><strong>加速学习曲线：</strong> AI作为实时导师，能快速解答技术疑问、提供示例，帮助开发者快速掌握新语言、新框架，从而提升学习强度与效果</li>
</ul>

<h5 id="22-工作密度与产出期望提升">2.2 工作密度与产出期望提升</h5>

<p>在单位时间内，由于基础编码加速，个体被期望能处理更复杂的逻辑、完成更多的功能模块或负责更广的领域。这带来了更高的<strong>认知工作密度</strong>。</p>

<h5 id="23-触发研发全流程自动化增强">2.3 触发研发全流程自动化增强</h5>

<p>AI的引入成为催化剂，推动了理想化的”全自动流水线”愿景更接近现实：</p>

<ul>
  <li><strong>起点：</strong> 用户或开发者提交结构化的Issue（可视为自然语言指令）</li>
  <li><strong>AI执行：</strong> AI代理（Agent）理解任务，编写代码，自动提交PR</li>
  <li><strong>自动化质量关卡：</strong> 触发自动化测试（单元、集成）、代码质量扫描、安全检测</li>
  <li><strong>自动交付：</strong> 测试通过后，自动合并代码，自动部署至测试环境，并触发更复杂的端到端自动化测试</li>
  <li><strong>自动反馈：</strong> 测试报告自动生成并提交</li>
</ul>

<p><strong>在这一流程中，开发者的核心职责是设计和维护这条自动化流水线，并处理其中需要人类智慧介入的异常与关键决策点。</strong></p>

<h4 id="三-潜在挑战与未来展望"><strong>三、 潜在挑战与未来展望</strong></h4>

<h5 id="31-挑战与风险">3.1 挑战与风险</h5>

<ul>
  <li><strong>过度依赖与技能退化风险：</strong> 需警惕在基础编码能力、调试深度和底层原理理解上可能出现的”用进废退”</li>
  <li><strong>代码质量与一致性的治理：</strong> AI生成的代码可能风格不一、存在隐藏缺陷，需要更强的代码审查文化和自动化质量门禁</li>
  <li><strong>安全与合规新课题：</strong> AI可能引入存在安全漏洞的代码或受版权污染的代码，需要新的检测工具和审计流程</li>
  <li><strong>团队协作模式调整：</strong> Issue的描述需要极度精确，代码审核的标准和流程需要重新定义，以适配”人审AI码”的新场景</li>
</ul>

<h5 id="32-未来展望">3.2 未来展望</h5>

<ul>
  <li><strong>开发者分层加剧：</strong> 善于利用AI、具备全局视野和强大批判性思维的”指挥官型”开发者价值将更加凸显。工作流可能进一步分层，一部分人专注业务与架构定义，另一部分人专注AI调度与结果优化</li>
  <li><strong>“AI原生”工作流诞生：</strong> 未来的开发工具和项目管理平台将从设计之初就融入AI智能体，实现从需求文档到上线部署的更无缝、更智能的衔接</li>
  <li><strong>创新门槛降低，创造力释放：</strong> 开发者得以从繁重的实现细节中解脱，将更多时间和智力投入真正的创新、用户体验优化和解决复杂业务难题上</li>
</ul>

<h4 id="结论"><strong>结论</strong></h4>

<p>AI工具的引入，绝非一次简单的工具升级，而是一次<strong>对软件研发工作流的深度重构</strong>。它正将开发者从传统的”码农”角色中解放出来，推向价值链条的更上游——成为<strong>系统的设计者、AI的培训师与调度员、以及最终质量的责任人</strong>。成功适应这一变革的组织与个人，将能实现生产效率与创新能力的双重跃迁，构建起更强大、更自动化的智能研发体系。这一进程的核心在于：<strong>人类将智慧专注于定义”做什么”和”为什么”，而将”如何做”的具体执行，increasingly，委托给AI去完成和优化。</strong></p>

<h4 id="四-更远的地平线当ai走向完全自主"><strong>四、 更远的地平线：当AI走向完全自主</strong></h4>

<p>目前的工作流范式仍然保持人类主导——人类定义需求、引导AI执行、做出最终决策。然而，展望更遥远的未来，如果AI能够自主创造需求、整理和排列优先级、完全接管测试、实现自我迭代，会怎样？在这样的场景下，整个开发周期可能无需人类介入即可运转。</p>

<p>这种可能性引发了超越技术层面的深刻问题：</p>

<p><strong>4.1 AI生成需求的人本中心性</strong></p>

<p>如果AI自主创建产品需求和功能路线图，我们能否确保这些需求真正服务于人类需要、以人类价值为中心？缺少人类参与需求生成阶段，存在这样的风险：AI可能会优化那些表面上看起来合理、但偏离真实人类需求的指标——追求效率、可扩展性或算法优雅性，却忽略人类体验的细微差别、情感需求或文化语境。</p>

<p><strong>4.2 AI世界模型与人类理解的对齐</strong></p>

<p>AI对世界的理解是否与人类的理解和目标一致？当前的AI系统从人类生成的数据中学习，展现出模式匹配能力，但它们缺乏对意义、语境和人类意图的真正理解。如果AI系统完全自主运作，它们关于”什么是有价值的”、”什么是正确的”、”什么是值得追求的”的模型，是否会与人类的集体价值观和长远利益趋同？</p>

<p><strong>4.3 当下现实：AI自主意识的缺失</strong></p>

<p>重要的是，我们目前没有看到任何AI拥有自我意识或自主意识的证据。今天的AI系统，无论多么复杂，本质上仍然是工具——在其训练和编程边界内运作的强大模式识别器和生成器。它们不具备欲望、意图或自主目标。这个区别至关重要：上述场景仍然是推测性的，依赖于AI能力的突破——这些突破可能发生也可能不发生，并且会引发全新类别的哲学、伦理和治理挑战。</p>

<p><strong>关键要务：</strong></p>

<p>随着我们沿着AI增强开发的道路前进，保持人类的主体性、判断力和伦理监督不仅仅是明智之举，而是至关重要的。”人在回路中”（human-in-the-loop）不是需要克服的限制，而是确保技术服务于人类真实利益、反映我们的价值观、优先事项和集体智慧的保障机制。</p>

<hr />

<h4 id="ai增强的现代软件研发工作流"><strong>AI增强的现代软件研发工作流</strong></h4>

<pre><code class="language-mermaid">flowchart TD
    subgraph A [传统工作流（对比）]
        A1[需求分析] --&gt; A2[设计与规划]
        A2 --&gt; A3[手动编码]
        A3 --&gt; A4[手动测试]
        A4 --&gt; A5[代码审查]
        A5 --&gt; A6[手动部署]
        A6 --&gt; A7[生产测试]
    end

    subgraph B [AI增强现代工作流]
        direction TB
        B1[深度需求分析与拆解] --&gt; B2[撰写精准Issue/Prompt]
        B2 --&gt; B3{AI代理执行}

        B3 --&gt; B4[AI编写代码并提交PR]

        subgraph B5 [Pre-Merge质量门禁&lt;br/&gt;合并前验证]
            direction LR
            B5a[⏱️ 自动化单元测试] --&gt; B5b[🔍 代码质量扫描&lt;br/&gt;SonarQube等]
            B5b --&gt; B5c[🛡️ 安全扫描&lt;br/&gt;SAST/SCA]
            B5c --&gt; B5d[✅ 基础集成测试]
        end

        B4 --&gt; B5
        B5 --&gt; B6{Pre-Merge通过?}

        B6 -- ✅ 是 --&gt; B7[自动合并至主分支]
        B6 -- ❌ 否 --&gt; B8[开发者/审核者介入]
        B8 --&gt; B9[修改Prompt/代码或关闭PR]
        B9 --&gt; B2

        B7 --&gt; B10[Post-Merge自动触发]

        subgraph B11 [Post-Merge验证&lt;br/&gt;合并后验证与交付]
            direction LR
            B11a[🚀 自动部署至测试环境] --&gt; B11b[🧪 自动化端到端测试]
            B11b --&gt; B11c[📊 性能测试]
            B11c --&gt; B11d[🎯 用户验收测试自动化]
        end

        B10 --&gt; B11
        B11 --&gt; B12[自动生成综合测试报告]
        B12 --&gt; B13[通知相关人员&lt;br/&gt;部署就绪可上线]
    end

    subgraph C [角色与流程关键变化]
        C1[Pre-Merge Gatekeeper&lt;br/&gt;审核者确保代码质量底线]
        C2[Post-Merge Validator&lt;br/&gt;验证系统集成与行为]
        C3[人类职责聚焦&lt;br/&gt;设计/决策/异常处理]

        C1 -- 质量防线前移 --&gt; C2
        C3 -- 监督两端 --&gt; C1
        C3 -- 关注结果 --&gt; C2
    end

    A -- 工作流智能化重构 --&gt; B
    A3 -. 手动编码减少 .-&gt; B3
    B5 -. 要求：精准Prompt与上下文 .-&gt; B2
    B6 -. 核心人工决策点 .-&gt; C3
    B12 -. 自动化程度提升 .-&gt; C2
</code></pre>

<hr />]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">One Year of AI-Assisted Programming: Insights, Practices, and Reflections</title><link href="https://weinan.io/2026/01/31/one-year-with-ai.html" rel="alternate" type="text/html" title="One Year of AI-Assisted Programming: Insights, Practices, and Reflections" /><published>2026-01-31T00:00:00+00:00</published><updated>2026-01-31T00:00:00+00:00</updated><id>https://weinan.io/2026/01/31/one-year-with-ai</id><content type="html" xml:base="https://weinan.io/2026/01/31/one-year-with-ai.html"><![CDATA[<p><strong>Abstract:</strong> Over the past year, my journey with AI in programming has evolved from viewing it as a novel tool to deeply integrating it into my daily development workflow. This report systematically summarizes the key insights gained, explains how AI truly augments development capabilities, and clarifies the current boundaries between human and AI roles. The core conclusion is: <strong>personal expertise remains the foundation for unlocking AI’s value; AI is a powerful force multiplier, not a substitute for wisdom; and adapting to a new, high-intensity, iterative workflow is crucial for maximizing productivity.</strong></p>

<hr />

<h4 id="1-core-insights-from-understanding-the-tool-to-defining-the-partnership"><strong>1. Core Insights: From Understanding the Tool to Defining the Partnership</strong></h4>

<h5 id="11-the-key-driver-personal-knowledge-determines-the-ceiling-of-ai-tools">1.1 The Key Driver: Personal Knowledge Determines the Ceiling of AI Tools</h5>
<p>AI functions like a highly capable but intentionless “intern.” The quality of its output is directly governed by the clarity, technical accuracy, and structure of my instructions (prompts). My knowledge base—understanding of the business, grasp of architecture, and familiarity with design patterns—forms the “language” I use to direct AI. The more proficient I am, the more precisely I can leverage and combine AI’s capabilities to deliver value. <strong>The focus of learning has shifted from “memorizing syntax” to “understanding patterns and principles,” as the latter constitutes the meta-skills for effective human-AI collaboration.</strong></p>

<h5 id="12-the-fundamental-limitation-ai-cannot-autonomously-leap-beyond-established-human-knowledge">1.2 The Fundamental Limitation: AI Cannot Autonomously Leap Beyond Established Human Knowledge</h5>
<p>I maintain a clear understanding that current mainstream AI is based on pattern recombination and generation from existing data. While it excels within the known solution space and provides excellent “reference answers,” it often falls short or produces fundamentally flawed outputs when faced with <strong>truly original architectural design from first principles, disruptive algorithmic innovation, or problems requiring deep, subtle logical reasoning.</strong> Therefore, in creative work like technical decision-making and solution design, I remain the ultimate decision-maker, positioning AI as a “consultant” for inspiration and reference.</p>

<h5 id="13-redefinition-ai-as-a-next-generation-cognitive-acceleration-engine">1.3 Redefinition: AI as a Next-Generation “Cognitive Acceleration Engine”</h5>
<p>AI transcends traditional search engines, becoming a powerful tool for analysis, summarization, and structuring. It liberates me from time-consuming “information gathering and sorting” tasks, allowing me to jump directly into the high-value stages of “comparison, judgment, and decision-making.” Whether quickly comparing technical options, summarizing lengthy documentation, or translating vague requirements into technical specifications, AI dramatically compresses the initial phase of the cognitive loop.</p>

<h4 id="2-development-practices-leveraging-strengths-and-adapting-to-new-patterns"><strong>2. Development Practices: Leveraging Strengths and Adapting to New Patterns</strong></h4>

<h5 id="21-unique-advantage-source-code-as-super-context">2.1 Unique Advantage: Source Code as “Super Context”</h5>
<p>Programming is currently one of the fields most empowered by AI, primarily because AI can “understand” code. This transforms it into:</p>
<ul>
  <li><strong>A real-time code reviewer:</strong> Quickly identifying potential bugs, style inconsistencies, and security vulnerabilities.</li>
  <li><strong>An interactive documenter and explainer:</strong> Generating comments for complex logic or explaining unfamiliar code blocks.</li>
  <li><strong>A precise code editor:</strong> Making intent-aligned modifications within a specified context.</li>
  <li><strong>A technical debt analysis assistant:</strong> Highlighting code duplication and highly coupled modules.
<strong>The Key Practice:</strong> Providing <strong>precise, relevant, and complete</strong> context is the prerequisite for high-quality responses. This has honed my ability to rapidly locate core code segments.</li>
</ul>

<h5 id="22-project-reality-the-human-ai-iteration-model-in-large-scale-complex-projects">2.2 Project Reality: The “Human-AI Iteration” Model in Large-Scale, Complex Projects</h5>
<p>While AI can quickly produce usable code for small tasks or isolated modules, the dynamic changes fundamentally in <strong>large-scale, complex projects</strong>:</p>
<ul>
  <li><strong>AI excels at “local optima,”</strong> performing well on a function or a class level.</li>
  <li><strong>Humans must own the “global” view:</strong> This includes system architecture, module boundaries, data flow, state management, and external dependencies—areas where AI lacks holistic project awareness.</li>
  <li><strong>“Hundreds of iterations become the norm”:</strong> This is not a sign of inefficiency but a manifestation of the new workflow. I must decompose macro objectives into a series of micro-tasks that AI can reliably execute, constantly aligning, correcting, and refining through sustained dialogue. This demands greater skills in <strong>task decomposition, progress management, and patience.</strong></li>
</ul>

<h4 id="3-impact-and-adaptation-the-new-balance-of-efficiency-and-intensity"><strong>3. Impact and Adaptation: The New Balance of Efficiency and Intensity</strong></h4>

<h5 id="31-the-dual-effect-concurrent-surge-in-efficiency-and-intensity">3.1 The Dual Effect: Concurrent Surge in Efficiency and Intensity</h5>
<ul>
  <li><strong>Efficiency gains</strong> are evident in: rapid prototyping of code drafts, automation of tedious tasks, and instant query resolution, significantly accelerating the development of “proofs-of-concept.”</li>
  <li><strong>Intensity increases</strong> because: lowered technical barriers lead to higher expectations and more ambitious attempts. Deep refactoring that might have been avoided in the past now becomes feasible. The proliferation of decision points results in a <strong>sharp rise in the density of thinking and review work.</strong></li>
  <li><strong>Adapting to the new rhythm</strong> is crucial: The key lies in establishing new workflows (e.g., Conceive -&gt; AI Generate -&gt; Rigorously Review/Test -&gt; Iterate) and learning to switch flexibly between “letting AI experiment quickly” and “engaging in deep personal thought.” Protecting valuable periods of focused work is essential to avoid getting trapped in endless, low-cost micro-iterations.</li>
</ul>

<h4 id="4-future-outlook-and-action-plan"><strong>4. Future Outlook and Action Plan</strong></h4>

<p>Based on these insights, I have outlined the following focal points for my future practice:</p>

<ol>
  <li><strong>Systematize and Optimize the “AI-Augmented Workflow”:</strong> Formalize and toolify the insights above, creating standard operating procedures for different tasks (e.g., bug fixing, feature development, code refactoring) to enhance the stability and efficiency of collaboration.</li>
  <li><strong>Deepen “Prompt Engineering” and “Critical Thinking”:</strong> Consciously improve prompt engineering skills while developing a muscle-memory level habit of critically reviewing AI output, cultivating a sharp intuition for spotting “AI hallucinations” and logical flaws.</li>
  <li><strong>Strategically Focus on High-Value Activities:</strong> Proactively shift personal effort towards <strong>requirements analysis, architectural design, complex problem decomposition, and code quality governance</strong>, creating a tighter integration between AI’s “execution” capabilities and my own “decision-making” abilities.</li>
  <li><strong>Maintain Independent Tracking of Technological Evolution:</strong> AI cannot predict the future. I will continue independent learning and judgment regarding foundational technologies, emerging frameworks, and industry trends. This serves as the fundamental compass for directing AI to explore uncharted territories and create differentiated value.</li>
</ol>

<p><strong>Conclusion:</strong> Over the past year, I have transitioned from being a “tool user” to a “human-AI collaboration architect.” I have come to understand deeply that <strong>AI is not a replacement, but a “capability multiplier” that infinitely amplifies my professional judgment and creativity.</strong> Harnessing it requires more solid foundational knowledge, clearer thinking, and stronger control over the work rhythm. Moving forward, I will continue to explore optimization points along this dynamic boundary, striving for a higher state of human-AI synergy.</p>

<hr />

<h3 id="ai辅助编程一周年认知实践与反思"><strong>AI辅助编程一周年：认知、实践与反思</strong></h3>

<p><strong>摘要</strong>：在过去一年中，我从将AI视为新奇工具，到将其深度融入日常编程工作流，经历了一个认知不断迭代深化的过程。本报告旨在系统性地总结这一年的核心心得，阐述AI如何真正赋能开发工作，并清晰地界定人与AI在当前技术阶段的角色边界。核心结论是：<strong>个人专业素养是AI发挥价值的基石；AI是强大的能力放大器，而非智慧替代品；适应“高强度、高迭代”的新工作节奏，是提升整体产效的关键。</strong></p>

<hr />

<h4 id="一核心认知从工具理解到角色定位"><strong>一、核心认知：从工具理解到角色定位</strong></h4>

<h5 id="1-核心驱动力个人知识储备决定ai工具的上限">1. 核心驱动力：个人知识储备决定AI工具的上限</h5>
<p>AI如同一名能力超群但缺乏意图的“实习生”。它的能力边界由我的指令（Prompt）的清晰度、技术准确性和结构性所决定。我的知识储备——包括对业务的理解、架构的把握、设计模式的认知——构成了指挥AI的“语言”。我越精通，就越能精准调用并组合AI的能力，将其潜力转化为实际价值。<strong>学习的方向从“记忆知识”转向了“理解模式与原则”，因为后者正是与AI高效协作的元能力。</strong></p>

<h5 id="2-根本局限性ai无法自主跨越人类既有知识边界">2. 根本局限性：AI无法自主跨越人类既有知识边界</h5>
<p>我清醒地认识到，当前主流的AI是基于已有数据的模式重组与生成。它在已知解空间内表现卓越，能提供优秀的“参考答案”，但在面对<strong>从零到一的原创性架构设计、颠覆性算法创新或涉及复杂、隐蔽逻辑推理的问题</strong>时，其输出往往流于表面或存在根本性错误。因此，在技术决策、方案选型等创造性工作中，我始终保持最终决策者的角色，将AI定位为提供灵感和参考的“顾问”。</p>

<h5 id="3-重新定义ai是新一代的认知加速引擎">3. 重新定义：AI是新一代的“认知加速引擎”</h5>
<p>AI超越了传统搜索引擎，成为一个强大的分析、总结与结构化工具。它能将我从“信息收集与整理”的耗时工作中解放出来，直接进入“对比、判断、决策”的高价值阶段。无论是快速对比技术方案、总结长篇文档，还是将模糊需求转化为技术要点，AI都极大地压缩了认知闭环的前期时间。</p>

<h4 id="二编程实践优势聚焦与模式转变"><strong>二、编程实践：优势聚焦与模式转变</strong></h4>

<h5 id="4-独特优势源代码作为超级上下文">4. 独特优势：源代码作为“超级上下文”</h5>
<p>编程是AI目前赋能最深的领域之一，核心在于它能“理解”代码。这使得AI成为：</p>
<ul>
  <li><strong>实时代码审查员</strong>：快速定位潜在缺陷、风格问题。</li>
  <li><strong>交互式文档与解释器</strong>：为复杂逻辑生成注释，或解释陌生代码块。</li>
  <li><strong>精准的代码编辑工具</strong>：在指定上下文中进行符合意图的修改。</li>
  <li><strong>技术债分析助手</strong>：识别重复代码、高耦合模块。
<strong>实践关键</strong>：提供<strong>精准、相关、完整</strong>的上下文，是获得高质量回应的前提。这锻炼了我快速定位核心代码的能力。</li>
</ul>

<h5 id="5-项目现实大规模复杂项目中的人机迭代模式">5. 项目现实：大规模复杂项目中的“人机迭代”模式</h5>
<p>在小型任务或独立模块中，AI能快速产出可用代码。然而，在<strong>大规模复杂项目</strong>中，情况发生根本变化：</p>
<ul>
  <li><strong>AI擅长“局部最优”</strong>，能出色完成一个函数、一个类。</li>
  <li><strong>人类必须把握“全局”</strong>：包括系统架构、模块边界、数据流、状态管理与外部依赖。AI缺乏对项目全景的认知。</li>
  <li><strong>“上百轮迭代成为常态”</strong>：这并非效率低下，而是新工作模式的体现。我必须将宏观目标拆解为一系列AI可可靠执行的微观任务，并在持续对话中不断对齐、修正和细化。这对我<strong>任务分解、进度把控和耐心</strong>提出了更高要求。</li>
</ul>

<h4 id="三影响与适应效率与强度的新平衡"><strong>三、影响与适应：效率与强度的新平衡</strong></h4>

<h5 id="6-双重效应效率提升与强度提升并存">6. 双重效应：效率提升与强度提升并存</h5>
<ul>
  <li><strong>效率提升</strong>体现在：快速生成代码草稿、自动化繁琐任务、即时解答疑问，开发“原型”速度显著加快。</li>
  <li><strong>强度提升</strong>源于：技术门槛的降低带来了更高的预期和更复杂的尝试。过去可能规避的深度重构现在变得可行，决策点大大增加，导致<strong>思考与评审的密度急剧上升</strong>。</li>
  <li><strong>新节奏的适应</strong>：关键在于建立新的工作流（如：构思 -&gt; AI生成 -&gt; 严格审查/测试 -&gt; 迭代），并学会在“让AI快速尝试”与“自己深入思考”之间灵活切换，保护宝贵的深度工作时段，避免陷入无限低成本的微迭代漩涡。</li>
</ul>

<h4 id="四未来展望与行动方向"><strong>四、未来展望与行动方向</strong></h4>

<p>基于以上认知，我规划了下一步的实践重点：</p>

<ol>
  <li><strong>固化与优化“AI增强工作流”</strong>：将上述心得模式化、工具化，形成针对不同任务（如bug修复、功能开发、代码重构）的标准操作流程，进一步提升协作的稳定性和效率。</li>
  <li><strong>深耕“提问工程”与“批判性思维”</strong>：有意识地提升Prompt工程技巧，同时将AI输出审查培养为肌肉记忆，培养一眼识别“AI幻觉”和逻辑漏洞的敏锐直觉。</li>
  <li><strong>战略聚焦高价值活动</strong>：主动将个人精力更多投向<strong>需求分析、架构设计、复杂问题拆解和代码质量管控</strong>，将AI的“执行”能力与我个人的“决策”能力更紧密地结合。</li>
  <li><strong>保持独立的技术演进跟踪</strong>：AI无法预测未来。我将继续保持对底层技术、新兴框架和行业趋势的独立学习与判断，这是我指挥AI探索未知领域、创造差异化价值的根本罗盘。</li>
</ol>

<p><strong>结论</strong>：过去一年，我完成了从“工具使用者”到“人机协作架构师”的思维转变。我深刻认识到，<strong>AI不是替代者，而是将我的专业判断与创造力无限放大的“能力乘子”</strong>。驾驭它，需要更扎实的功底、更清晰的思维和更强的节奏把控力。未来，我将继续探索这一动态边界的优化点，追求人与AI协同的更高境界。</p>

<hr />]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[Abstract: Over the past year, my journey with AI in programming has evolved from viewing it as a novel tool to deeply integrating it into my daily development workflow. This report systematically summarizes the key insights gained, explains how AI truly augments development capabilities, and clarifies the current boundaries between human and AI roles. The core conclusion is: personal expertise remains the foundation for unlocking AI’s value; AI is a powerful force multiplier, not a substitute for wisdom; and adapting to a new, high-intensity, iterative workflow is crucial for maximizing productivity.]]></summary></entry><entry><title type="html">OpenShift Disconnected Cluster安装步骤与实践</title><link href="https://weinan.io/2025/07/06/openshift-bootstrap.html" rel="alternate" type="text/html" title="OpenShift Disconnected Cluster安装步骤与实践" /><published>2025-07-06T00:00:00+00:00</published><updated>2025-07-06T00:00:00+00:00</updated><id>https://weinan.io/2025/07/06/openshift-bootstrap</id><content type="html" xml:base="https://weinan.io/2025/07/06/openshift-bootstrap.html"><![CDATA[<p>本文总结OpenShift断连集群（disconnected cluster，无法直接访问公网的集群）在AWS上的安装步骤，适合有OpenShift或Kubernetes基础的读者。</p>

<ol>
  <li>
    <p><strong>配置VPC以支持断连集群</strong><br />
public和private subnet通过NAT Gateway隔离，确保安全性。手工创建IAM用户以分配最小权限（学习总结：<a href="https://github.com/liweinan/deepseek-answers/blob/main/files/oc-disconnected-cluster.md">https://github.com/liweinan/deepseek-answers/blob/main/files/oc-disconnected-cluster.md</a>）。</p>
  </li>
  <li>
    <p><strong>创建VPC endpoints以访问AWS服务</strong><br />
VPC需创建endpoints（如S3、EC2 API）以确保bootstrap节点在private subnet中访问AWS服务，使用CloudFormation模板自动化配置（模板示例：<a href="https://github.com/liweinan/ocp-aws-vpc-ipi-examples/pull/1/files#diff-6218dc7ba3aaacccc7d0328827fbefef33e253d6ca2460e0d8f2d353a0ffaf3bR133">https://github.com/liweinan/ocp-aws-vpc-ipi-examples/pull/1/files#diff-6218dc7ba3aaacccc7d0328827fbefef33e253d6ca2460e0d8f2d353a0ffaf3bR133</a>）。</p>
  </li>
  <li>
    <p><strong>配置bootstrap节点访问mirror registry</strong><br />
bootstrap节点需通过VPC路由表访问bastion主机的mirror registry，添加指向registry的路由规则（样例：<a href="https://github.com/liweinan/ocp-aws-vpc-ipi-examples/pull/1/files#diff-24a44acdcecfb902f56d79c8bcf9580e288b96ee0c092d2508e114200d74c7d3R10">https://github.com/liweinan/ocp-aws-vpc-ipi-examples/pull/1/files#diff-24a44acdcecfb902f56d79c8bcf9580e288b96ee0c092d2508e114200d74c7d3R10</a>）。</p>
  </li>
  <li>
    <p><strong>生成安装配置文件与点火文件</strong><br />
OpenShift安装从config文件生成manifests文件，再转换为Ignition点火文件，用于节点初始化。</p>
    <ol>
      <li><strong><code class="language-plaintext highlighter-rouge">openshift-install</code>定制安装</strong><br />
通过<code class="language-plaintext highlighter-rouge">MachineConfig</code>定义节点配置，生成<code class="language-plaintext highlighter-rouge">bootstrap.ign</code>等文件（<code class="language-plaintext highlighter-rouge">openshift-install create ignition-configs</code>，关键行167-170：<a href="https://github.com/liweinan/ocp-aws-vpc-ipi-examples/pull/1/files#diff-cfb8d6ddbeb137bdd82a3f15ec3b5d3e5470f6bfd7446d774aafb103c34c70efR167">https://github.com/liweinan/ocp-aws-vpc-ipi-examples/pull/1/files#diff-cfb8d6ddbeb137bdd82a3f15ec3b5d3e5470f6bfd7446d774aafb103c34c70efR167</a>）。</li>
      <li><strong>点火文件与脚本</strong><br />
点火文件包含节点初始化脚本，解码bootstrap脚本以便调试（核心功能：配置镜像仓库，执行<code class="language-plaintext highlighter-rouge">bootkube.sh</code>）：<a href="https://github.com/liweinan/ocp-aws-vpc-ipi-examples/pull/1/files#diff-eca50a42b09cea58a45168d832418b81d5365465ba99512bd82e927d6085f754">https://github.com/liweinan/ocp-aws-vpc-ipi-examples/pull/1/files#diff-eca50a42b09cea58a45168d832418b81d5365465ba99512bd82e927d6085f754</a>。</li>
      <li><strong>学习<code class="language-plaintext highlighter-rouge">bootkube.sh</code></strong><br />
<code class="language-plaintext highlighter-rouge">bootkube.sh</code>启动Kubernetes控制平面，初始化etcd和API Server（关键步骤：<a href="https://github.com/liweinan/deepseek-answers/blob/main/files/oc-bootstrap.md#bootkube">https://github.com/liweinan/deepseek-answers/blob/main/files/oc-bootstrap.md#bootkube</a>）。</li>
    </ol>
  </li>
</ol>

<p><img src="https://raw.githubusercontent.com/liweinan/blogpics2025/main/0706/01.jpg" alt="" /></p>

<p><img src="https://raw.githubusercontent.com/liweinan/blogpics2025/main/0706/02.png" alt="" /></p>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[本文总结OpenShift断连集群（disconnected cluster，无法直接访问公网的集群）在AWS上的安装步骤，适合有OpenShift或Kubernetes基础的读者。]]></summary></entry><entry><title type="html">My Blog Posts Summary 2025</title><link href="https://weinan.io/2025/06/16/blog-summary.html" rel="alternate" type="text/html" title="My Blog Posts Summary 2025" /><published>2025-06-16T00:00:00+00:00</published><updated>2025-06-16T00:00:00+00:00</updated><id>https://weinan.io/2025/06/16/blog-summary</id><content type="html" xml:base="https://weinan.io/2025/06/16/blog-summary.html"><![CDATA[<p>This document provides a comprehensive analysis of all blog posts from 2017 to 2025, organized by topic and including detailed insights into key articles.</p>

<h2 id="overview">Overview</h2>

<p>Total Posts: 475
Time Span: 2017-2025
Most Active Years: 2018 (135 posts), 2020 (98 posts), 2019 (94 posts)</p>

<h2 id="major-topics">Major Topics</h2>

<h3 id="1-java-enterprise--middleware-2017-2024">1. Java Enterprise &amp; Middleware (2017-2024)</h3>
<ul>
  <li><strong>WildFly &amp; JBoss</strong>
    <ul>
      <li><a href="https://weinan.io/2024/06/15/build-wildfly.html">Build WildFly from Source (2024)</a> - Detailed guide on building WildFly from source code</li>
      <li><a href="https://weinan.io/2023/06/07/wildfly-k8s.html">WildFly Kubernetes Integration (2023)</a> - Deploying WildFly on Kubernetes</li>
      <li><a href="https://weinan.io/2017/05/05/wildfly-src.html">WildFly Source Code Analysis (2017)</a> - Deep dive into WildFly’s architecture</li>
    </ul>
  </li>
  <li><strong>Spring Framework</strong>
    <ul>
      <li><a href="https://weinan.io/2024/02/17/spring-beans.html">Spring Bean Lifecycle (2024)</a> - Comprehensive analysis of Spring bean creation process</li>
      <li><a href="https://weinan.io/2019/05/24/spring-security.html">Spring Security Integration (2019)</a> - Security implementation in Spring applications</li>
    </ul>
  </li>
</ul>

<h3 id="2-cloud--containerization-2018-2024">2. Cloud &amp; Containerization (2018-2024)</h3>
<ul>
  <li><strong>Docker &amp; Containerization</strong>
    <ul>
      <li><a href="https://weinan.io/2023/09/15/docker-macos.html">Docker on macOS (2023)</a> - Docker setup and optimization for macOS</li>
      <li><a href="https://weinan.io/2018/01/04/docker.html">Docker Networking (2018)</a> - Container networking concepts and practices</li>
    </ul>
  </li>
  <li><strong>Kubernetes &amp; Cloud Native</strong>
    <ul>
      <li><a href="https://weinan.io/2023/06/23/jkube.html">Kubernetes Deployment (2023)</a> - Deploying applications to Kubernetes</li>
      <li><a href="https://weinan.io/2023/08/27/minikube.html">Minikube Setup (2023)</a> - Local Kubernetes development environment</li>
    </ul>
  </li>
</ul>

<h3 id="3-development-tools--practices-2017-2025">3. Development Tools &amp; Practices (2017-2025)</h3>
<ul>
  <li><strong>Build Tools</strong>
    <ul>
      <li><a href="https://weinan.io/2024/09/29/maven-plugin-info.html">Maven Plugin Development (2024)</a> - Creating and using Maven plugins</li>
      <li><a href="https://weinan.io/2018/01/01/gradle.html">Gradle Configuration (2018)</a> - Advanced Gradle build configurations</li>
    </ul>
  </li>
  <li><strong>Version Control &amp; CI/CD</strong>
    <ul>
      <li><a href="https://weinan.io/2023/12/28/github-ci-cross-build.html">GitHub Actions (2023)</a> - Cross-platform build automation</li>
      <li><a href="https://weinan.io/2019/11/14/git.html">Git Best Practices (2019)</a> - Advanced Git workflows and techniques</li>
    </ul>
  </li>
</ul>

<h3 id="4-programming-languages--frameworks-2017-2025">4. Programming Languages &amp; Frameworks (2017-2025)</h3>
<ul>
  <li><strong>Java &amp; JVM</strong>
    <ul>
      <li><a href="https://weinan.io/2017/05/11/jdbc-part4.html">JDBC Implementation Series (2017)</a> - Deep dive into JDBC internals</li>
      <li><a href="https://weinan.io/2017/12/22/concurrency.html">Java Concurrency (2017)</a> - Advanced concurrency patterns</li>
    </ul>
  </li>
  <li><strong>Web Development</strong>
    <ul>
      <li><a href="https://weinan.io/2017/08/07/jersey-extend-wadl-support.html">RESTEasy Implementation (2017)</a> - REST API development with RESTEasy</li>
      <li><a href="https://weinan.io/2020/07/23/vue.html">Vue.js Integration (2020)</a> - Modern frontend development</li>
    </ul>
  </li>
</ul>

<h3 id="5-system--devops-2017-2025">5. System &amp; DevOps (2017-2025)</h3>
<ul>
  <li><strong>System Administration</strong>
    <ul>
      <li><a href="https://weinan.io/2017/12/13/linux-driver.html">Linux Driver Development (2017)</a> - Kernel module development</li>
      <li><a href="https://weinan.io/2019/03/26/supervisord.html">System Monitoring (2019)</a> - Process management and monitoring</li>
    </ul>
  </li>
  <li><strong>Networking &amp; Security</strong>
    <ul>
      <li><a href="https://weinan.io/2020/02/17/ssl.html">SSL/TLS Configuration (2020)</a> - Security best practices</li>
      <li><a href="https://weinan.io/2023/06/20/wireshark.html">Network Analysis (2023)</a> - Network troubleshooting tools</li>
    </ul>
  </li>
</ul>

<h3 id="6-ai--machine-learning-2021-2025">6. AI &amp; Machine Learning (2021-2025)</h3>
<ul>
  <li><strong>Machine Learning</strong>
    <ul>
      <li><a href="https://weinan.io/2025/02/17/tensor-for-mac.html">TensorFlow on Mac (2025)</a> - ML development environment setup</li>
      <li><a href="https://weinan.io/2025/02/20/langchain-deepseek.html">LangChain Integration (2025)</a> - AI application development</li>
    </ul>
  </li>
</ul>

<h2 id="notable-series">Notable Series</h2>

<ol>
  <li><strong>JDBC Implementation Series (2017)</strong>
    <ul>
      <li>Part 1-8: Comprehensive coverage of JDBC internals</li>
      <li>Key articles: <a href="https://weinan.io/2017/05/11/jdbc-part4.html">Part 4</a>, <a href="https://weinan.io/2017/05/28/jdbc-part8.html">Part 8</a></li>
    </ul>
  </li>
  <li><strong>RESTEasy &amp; Jersey Comparison (2017)</strong>
    <ul>
      <li>Detailed analysis of JAX-RS implementations</li>
      <li>Key article: <a href="https://weinan.io/2017/08/07/jersey-extend-wadl-support.html">Extended WADL Support</a></li>
    </ul>
  </li>
  <li><strong>Spring Framework Deep Dive (2024)</strong>
    <ul>
      <li>Bean lifecycle and dependency injection</li>
      <li>Key article: <a href="https://weinan.io/2024/02/17/spring-beans.html">Spring Beans</a></li>
    </ul>
  </li>
</ol>

<h2 id="evolution-of-topics">Evolution of Topics</h2>

<ol>
  <li><strong>2017-2018</strong>: Focus on core Java technologies and system programming
    <ul>
      <li>Enterprise middleware (WildFly, JBoss)</li>
      <li>System-level programming (Linux drivers, networking)</li>
      <li>Build tools and development practices</li>
    </ul>
  </li>
  <li><strong>2019-2020</strong>: Shift towards cloud and modern development
    <ul>
      <li>Containerization and Kubernetes</li>
      <li>Microservices architecture</li>
      <li>CI/CD and automation</li>
    </ul>
  </li>
  <li><strong>2021-2025</strong>: Emphasis on modern technologies
    <ul>
      <li>AI and machine learning</li>
      <li>Cloud-native development</li>
      <li>Modern development practices</li>
    </ul>
  </li>
</ol>

<h2 id="statistics">Statistics</h2>

<ul>
  <li><strong>Most Active Topics</strong>:
    <ol>
      <li>Java Enterprise &amp; Middleware (120+ posts)</li>
      <li>Cloud &amp; Containerization (90+ posts)</li>
      <li>Development Tools &amp; Practices (80+ posts)</li>
      <li>Programming Languages &amp; Frameworks (70+ posts)</li>
      <li>System &amp; DevOps (60+ posts)</li>
      <li>AI &amp; Machine Learning (20+ posts)</li>
    </ol>
  </li>
  <li><strong>Average Post Length</strong>:
    <ul>
      <li>Technical Deep Dives: 2000-4000 words</li>
      <li>Tutorials &amp; How-tos: 1000-2000 words</li>
      <li>Quick Tips &amp; Notes: 500-1000 words</li>
    </ul>
  </li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>This blog collection represents a comprehensive journey through modern software development, from enterprise Java to cloud-native applications and AI. The content is particularly strong in:</p>

<ol>
  <li>Enterprise Java development and middleware</li>
  <li>Cloud and containerization technologies</li>
  <li>System-level programming and DevOps</li>
  <li>Modern development practices and tools</li>
</ol>

<p>The evolution of topics reflects the changing landscape of software development, with a clear progression from traditional enterprise development to modern cloud-native and AI-focused approaches.</p>

<p>Note: This summary is based on the analysis of all 475 posts, with particular attention to longer, more detailed articles that provide deeper technical insights. Links to original markdown files are provided for each major article.</p>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[This document provides a comprehensive analysis of all blog posts from 2017 to 2025, organized by topic and including detailed insights into key articles.]]></summary></entry><entry><title type="html">Enable SSH login on Ubuntu</title><link href="https://weinan.io/2025/03/29/enable-ubuntu-login.html" rel="alternate" type="text/html" title="Enable SSH login on Ubuntu" /><published>2025-03-29T00:00:00+00:00</published><updated>2025-03-29T00:00:00+00:00</updated><id>https://weinan.io/2025/03/29/enable-ubuntu-login</id><content type="html" xml:base="https://weinan.io/2025/03/29/enable-ubuntu-login.html"><![CDATA[<p><img src="https://raw.githubusercontent.com/liweinan/blogpics2025/main/0329/01.jpg" alt="" /></p>

<p>I followed DeepSeek’s instruction to enable SSH login on Ubuntu:</p>

<ul>
  <li><a href="https://github.com/liweinan/deepseek-answers/blob/main/enable-ssh-login-in-ubuntu.md">Enable SSH login on Ubuntu</a></li>
</ul>

<p>There are several notes about the SSH daemon configuration on Ubuntu. Firstly, better to disable the <code class="language-plaintext highlighter-rouge">ufw</code> firewall for testing environment. In addition, better to disable the <code class="language-plaintext highlighter-rouge">gcr-ssh-agent</code> like this:</p>

<ul>
  <li><a href="https://github.com/liweinan/deepseek-answers/blob/main/what-is-gcr-ssh-agent.md">What is gcr-ssh-agent?</a></li>
</ul>

<p>And if I need to debug the <code class="language-plaintext highlighter-rouge">sshd</code>, here is the reference:</p>

<ul>
  <li><a href="https://github.com/liweinan/deepseek-answers/blob/main/ssh-in-debug-mode.md">How to Run sshd in Debug Mode</a></li>
</ul>

<p>Most importantly, the configurations are required in <code class="language-plaintext highlighter-rouge">/etc/ssh/sshd_config</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PubkeyAuthentication yes
AllowUsers anan
</code></pre></div></div>

<p>Some more troubleshooting info:</p>

<ul>
  <li><a href="https://github.com/liweinan/deepseek-answers/blob/main/what-does-type-51-mean-in-ssh.md">What Does “receive packet: type 51” Mean in SSH?</a></li>
  <li><a href="https://github.com/liweinan/deepseek-answers/blob/main/fix-ssd-dir-error.md">How to Fix “Missing Privilege Separation Directory: /run/sshd” Error in SSH</a></li>
  <li><a href="https://github.com/liweinan/deepseek-answers/blob/main/disable-selinux.md">Disable SELinux On Ubuntu</a></li>
</ul>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">The solution to the Hackerrank Bomberman Quiz.</title><link href="https://weinan.io/2025/03/25/hackerrank-bomberman.html" rel="alternate" type="text/html" title="The solution to the Hackerrank Bomberman Quiz." /><published>2025-03-25T00:00:00+00:00</published><updated>2025-03-25T00:00:00+00:00</updated><id>https://weinan.io/2025/03/25/hackerrank-bomberman</id><content type="html" xml:base="https://weinan.io/2025/03/25/hackerrank-bomberman.html"><![CDATA[<p>I have put the solution of the Hackerrank Bomberman solution here:</p>

<p><a href="https://github.com/liweinan/java-snippets/blob/master/src/main/java/io/weli/hackerrank/BomberMan.java">https://github.com/liweinan/java-snippets/blob/master/src/main/java/io/weli/hackerrank/BomberMan.java</a></p>

<p>The idea is quite straightforward.</p>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[I have put the solution of the Hackerrank Bomberman solution here:]]></summary></entry><entry><title type="html">The usage of const component in Vue</title><link href="https://weinan.io/2025/02/22/vue.html" rel="alternate" type="text/html" title="The usage of const component in Vue" /><published>2025-02-22T00:00:00+00:00</published><updated>2025-02-22T00:00:00+00:00</updated><id>https://weinan.io/2025/02/22/vue</id><content type="html" xml:base="https://weinan.io/2025/02/22/vue.html"><![CDATA[<p>Here is an example showing the usage of the <code class="language-plaintext highlighter-rouge">const component</code> usage of Vue:</p>

<ul>
  <li><a href="https://github.com/liweinan/play-vue/pull/2">const comp</a></li>
</ul>

<p>Please note that the <code class="language-plaintext highlighter-rouge">vue.esm-bundler.js</code> must be added into dependency for the compilation:</p>

<ul>
  <li><a href="https://github.com/liweinan/play-vue/pull/2/commits/7907120a98c8b8321c76c8e7102b99bc5c1831bf">vue.esm-bundler.js</a></li>
</ul>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[Here is an example showing the usage of the const component usage of Vue:]]></summary></entry><entry><title type="html">Using LangChain4j to connect with locally deployed DeepSeek</title><link href="https://weinan.io/2025/02/20/langchain-deepseek.html" rel="alternate" type="text/html" title="Using LangChain4j to connect with locally deployed DeepSeek" /><published>2025-02-20T00:00:00+00:00</published><updated>2025-02-20T00:00:00+00:00</updated><id>https://weinan.io/2025/02/20/langchain-deepseek</id><content type="html" xml:base="https://weinan.io/2025/02/20/langchain-deepseek.html"><![CDATA[<p>I have put an example showing the usage of LangChain4j to connect with locally deployed DeepSeek to answer questions:</p>

<ul>
  <li><a href="https://github.com/liweinan/java-snippets/blob/master/src/main/java/io/weli/ai/PlayWithLangchain.java">https://github.com/liweinan/java-snippets/blob/master/src/main/java/io/weli/ai/PlayWithLangchain.java</a></li>
</ul>

<p>To learn how to deploy a DeepSeek locally, you can check this blog post I have written:</p>

<ul>
  <li><a href="https://weinan.io/2025/02/06/install-deepseek-on-arm-based-apple.html">Install DeepSeek locally on an Apple M4 Pro chip based computer.</a></li>
</ul>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[I have put an example showing the usage of LangChain4j to connect with locally deployed DeepSeek to answer questions:]]></summary></entry><entry><title type="html">Installing Tensorflow on MacOS.</title><link href="https://weinan.io/2025/02/17/tensor-for-mac.html" rel="alternate" type="text/html" title="Installing Tensorflow on MacOS." /><published>2025-02-17T00:00:00+00:00</published><updated>2025-02-17T00:00:00+00:00</updated><id>https://weinan.io/2025/02/17/tensor-for-mac</id><content type="html" xml:base="https://weinan.io/2025/02/17/tensor-for-mac.html"><![CDATA[<p>I have put and example project here to demonstrate how to install the tensorflow on MacOS:</p>

<ul>
  <li><a href="https://github.com/liweinan/tensor_for_macos">https://github.com/liweinan/tensor_for_macos</a></li>
</ul>]]></content><author><name>阿男</name></author><summary type="html"><![CDATA[I have put and example project here to demonstrate how to install the tensorflow on MacOS:]]></summary></entry></feed>