<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Llm on Devlog in the SKY</title>
    <link>https://skyoo2003.github.io/ko/tags/llm/</link>
    <description>Recent content in Llm on Devlog in the SKY</description>
    <generator>Hugo</generator>
    <language>ko</language>
    <lastBuildDate>Sun, 19 Apr 2026 00:00:00 +0900</lastBuildDate>
    <atom:link href="https://skyoo2003.github.io/ko/tags/llm/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>M4 MacBook에서 3.2B LLM을 구동하는 삼층 샌드위치 아키텍처</title>
      <link>https://skyoo2003.github.io/ko/posts/2026/04/19/three-layer-sandwich-llm/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0900</pubDate>
      <guid>https://skyoo2003.github.io/ko/posts/2026/04/19/three-layer-sandwich-llm/</guid>
      <description>&lt;h2 id=&#34;들어가며&#34;&gt;들어가며&lt;/h2&gt;
&lt;p&gt;MacBook Air M4에 16GB 통합 메모리가 달려있다. PyTorch로 3B 모델을 학습시키면 몇 분 안에 팬이 돌아가고, 무팬 모델에서는 서멀 스로틀링이 걸린다. &lt;a href=&#34;https://github.com/skyoo2003/bit-axon&#34;&gt;Bit-Axon&lt;/a&gt;은 이 제약을 아키텍처 단에서 해결한 3.2B 파라미터 하이브리드 언어 모델이다.&lt;/p&gt;
&lt;p&gt;핵심 아이디어는 &lt;strong&gt;삼층 샌드위치 구조&lt;/strong&gt;다: 24개 레이어를 세 구간으로 나누어 각각 다른 연산 방식을 적용한다.&lt;/p&gt;



&lt;div class=&#34;goat svg-container &#34;&gt;
  
    &lt;svg
      xmlns=&#34;http://www.w3.org/2000/svg&#34;
      font-family=&#34;Menlo,Lucida Console,monospace&#34;
      
        viewBox=&#34;0 0 592 57&#34;
      &gt;
      &lt;g transform=&#39;translate(8,16)&#39;&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;0&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;L&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;0&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;L&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;0&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;L&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;8&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;a&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;8&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;a&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;8&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;a&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;16&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;y&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;16&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;y&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;16&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;y&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;24&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;e&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;24&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;e&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;24&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;e&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;32&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;r&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;32&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;r&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;32&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;r&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;48&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;1&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;56&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;1&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;56&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;9&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;56&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;7&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;64&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;-&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;64&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;-&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;64&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;-&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;72&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;8&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;72&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;1&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;72&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;2&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;80&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;:&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;80&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;6&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;80&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;4&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;88&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;:&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;88&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;:&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;104&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;104&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;104&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;112&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;112&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;112&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;120&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;120&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;120&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;128&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;128&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;128&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;136&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;136&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;136&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;144&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;144&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;144&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;152&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;152&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;152&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;160&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;160&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;160&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;168&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;168&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;168&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;176&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;176&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;176&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;184&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;184&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;184&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;192&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;192&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;192&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;200&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;200&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;200&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;208&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;208&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;208&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;216&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;216&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;216&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;224&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;224&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;224&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;232&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;232&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;232&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;240&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;240&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;240&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;248&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;248&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;256&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;█&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;256&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;S&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;264&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;S&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;264&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;S&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;272&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;P&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;272&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;W&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;272&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;M&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;280&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;u&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;280&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;A&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;288&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;r&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;288&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;+&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;296&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;e&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;296&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;+&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;304&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;M&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;312&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;A&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;312&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;M&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;312&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;o&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;320&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;x&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;320&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;o&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;320&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;E&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;328&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;o&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;328&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;E&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;336&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;n&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;344&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;-&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;352&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;S&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;360&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;S&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;368&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;M&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;440&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;→&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;440&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;→&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;440&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;→&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;456&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;문&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;456&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;심&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;456&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;출&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;464&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;맥&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;464&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;층&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;464&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;력&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;480&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;흡&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;480&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;추&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;480&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;합&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;488&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;수&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;488&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;론&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;488&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;성&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;504&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;(&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;504&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;(&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;504&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;(&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;512&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;O&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;512&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;O&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;512&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;선&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;520&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;(&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;520&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;(&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;520&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;형&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;528&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;1&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;528&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;n&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;536&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;)&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;536&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;)&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;536&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;+&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;552&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;메&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;552&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;어&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;552&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;희&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;560&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;모&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;560&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;텐&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;560&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;소&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;568&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;리&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;568&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;션&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;568&#39; y=&#39;36&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;)&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;576&#39; y=&#39;4&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;)&lt;/text&gt;
&lt;text text-anchor=&#39;middle&#39; x=&#39;576&#39; y=&#39;20&#39; fill=&#39;currentColor&#39; style=&#39;font-size:1em&#39;&gt;)&lt;/text&gt;
&lt;/g&gt;

    &lt;/svg&gt;
  
&lt;/div&gt;
&lt;p&gt;이 구조는 단순한 직관적 분할이 아니다. Transformer 아키텍처가 직면한 세 가지 근본적인 한계 — &lt;strong&gt;제곱 복잡도, 메모리 폭발, 연산 밀도&lt;/strong&gt; — 에 대해 각 구간이 다른 해결책을 제시한다. 이 포스트에서는 각 레이어 그룹의 수학적 기초부터 MLX 프레임워크 최적화, 서멀 인식 학습까지, MacBook에서 LLM을 구동하는 전체 설계를 살펴본다.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<h2 id="들어가며">들어가며</h2>
<p>MacBook Air M4에 16GB 통합 메모리가 달려있다. PyTorch로 3B 모델을 학습시키면 몇 분 안에 팬이 돌아가고, 무팬 모델에서는 서멀 스로틀링이 걸린다. <a href="https://github.com/skyoo2003/bit-axon">Bit-Axon</a>은 이 제약을 아키텍처 단에서 해결한 3.2B 파라미터 하이브리드 언어 모델이다.</p>
<p>핵심 아이디어는 <strong>삼층 샌드위치 구조</strong>다: 24개 레이어를 세 구간으로 나누어 각각 다른 연산 방식을 적용한다.</p>



<div class="goat svg-container ">
  
    <svg
      xmlns="http://www.w3.org/2000/svg"
      font-family="Menlo,Lucida Console,monospace"
      
        viewBox="0 0 592 57"
      >
      <g transform='translate(8,16)'>
<text text-anchor='middle' x='0' y='4' fill='currentColor' style='font-size:1em'>L</text>
<text text-anchor='middle' x='0' y='20' fill='currentColor' style='font-size:1em'>L</text>
<text text-anchor='middle' x='0' y='36' fill='currentColor' style='font-size:1em'>L</text>
<text text-anchor='middle' x='8' y='4' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='8' y='20' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='8' y='36' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='16' y='4' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='16' y='20' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='16' y='36' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='24' y='4' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='24' y='20' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='24' y='36' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='32' y='4' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='32' y='20' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='32' y='36' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='48' y='36' fill='currentColor' style='font-size:1em'>1</text>
<text text-anchor='middle' x='56' y='4' fill='currentColor' style='font-size:1em'>1</text>
<text text-anchor='middle' x='56' y='20' fill='currentColor' style='font-size:1em'>9</text>
<text text-anchor='middle' x='56' y='36' fill='currentColor' style='font-size:1em'>7</text>
<text text-anchor='middle' x='64' y='4' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='64' y='20' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='64' y='36' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='72' y='4' fill='currentColor' style='font-size:1em'>8</text>
<text text-anchor='middle' x='72' y='20' fill='currentColor' style='font-size:1em'>1</text>
<text text-anchor='middle' x='72' y='36' fill='currentColor' style='font-size:1em'>2</text>
<text text-anchor='middle' x='80' y='4' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='80' y='20' fill='currentColor' style='font-size:1em'>6</text>
<text text-anchor='middle' x='80' y='36' fill='currentColor' style='font-size:1em'>4</text>
<text text-anchor='middle' x='88' y='20' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='88' y='36' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='104' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='104' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='104' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='112' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='112' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='112' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='120' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='120' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='120' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='128' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='128' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='128' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='136' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='136' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='136' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='144' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='144' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='144' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='152' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='152' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='152' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='160' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='160' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='160' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='168' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='168' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='168' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='176' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='176' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='176' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='184' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='184' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='184' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='192' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='192' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='192' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='200' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='200' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='200' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='208' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='208' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='208' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='216' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='216' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='216' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='224' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='224' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='224' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='232' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='232' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='232' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='240' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='240' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='240' y='36' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='248' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='248' y='20' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='256' y='4' fill='currentColor' style='font-size:1em'>█</text>
<text text-anchor='middle' x='256' y='36' fill='currentColor' style='font-size:1em'>S</text>
<text text-anchor='middle' x='264' y='20' fill='currentColor' style='font-size:1em'>S</text>
<text text-anchor='middle' x='264' y='36' fill='currentColor' style='font-size:1em'>S</text>
<text text-anchor='middle' x='272' y='4' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='272' y='20' fill='currentColor' style='font-size:1em'>W</text>
<text text-anchor='middle' x='272' y='36' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='280' y='4' fill='currentColor' style='font-size:1em'>u</text>
<text text-anchor='middle' x='280' y='20' fill='currentColor' style='font-size:1em'>A</text>
<text text-anchor='middle' x='288' y='4' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='288' y='36' fill='currentColor' style='font-size:1em'>+</text>
<text text-anchor='middle' x='296' y='4' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='296' y='20' fill='currentColor' style='font-size:1em'>+</text>
<text text-anchor='middle' x='304' y='36' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='312' y='4' fill='currentColor' style='font-size:1em'>A</text>
<text text-anchor='middle' x='312' y='20' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='312' y='36' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='320' y='4' fill='currentColor' style='font-size:1em'>x</text>
<text text-anchor='middle' x='320' y='20' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='320' y='36' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='328' y='4' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='328' y='20' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='336' y='4' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='344' y='4' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='352' y='4' fill='currentColor' style='font-size:1em'>S</text>
<text text-anchor='middle' x='360' y='4' fill='currentColor' style='font-size:1em'>S</text>
<text text-anchor='middle' x='368' y='4' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='440' y='4' fill='currentColor' style='font-size:1em'>→</text>
<text text-anchor='middle' x='440' y='20' fill='currentColor' style='font-size:1em'>→</text>
<text text-anchor='middle' x='440' y='36' fill='currentColor' style='font-size:1em'>→</text>
<text text-anchor='middle' x='456' y='4' fill='currentColor' style='font-size:1em'>문</text>
<text text-anchor='middle' x='456' y='20' fill='currentColor' style='font-size:1em'>심</text>
<text text-anchor='middle' x='456' y='36' fill='currentColor' style='font-size:1em'>출</text>
<text text-anchor='middle' x='464' y='4' fill='currentColor' style='font-size:1em'>맥</text>
<text text-anchor='middle' x='464' y='20' fill='currentColor' style='font-size:1em'>층</text>
<text text-anchor='middle' x='464' y='36' fill='currentColor' style='font-size:1em'>력</text>
<text text-anchor='middle' x='480' y='4' fill='currentColor' style='font-size:1em'>흡</text>
<text text-anchor='middle' x='480' y='20' fill='currentColor' style='font-size:1em'>추</text>
<text text-anchor='middle' x='480' y='36' fill='currentColor' style='font-size:1em'>합</text>
<text text-anchor='middle' x='488' y='4' fill='currentColor' style='font-size:1em'>수</text>
<text text-anchor='middle' x='488' y='20' fill='currentColor' style='font-size:1em'>론</text>
<text text-anchor='middle' x='488' y='36' fill='currentColor' style='font-size:1em'>성</text>
<text text-anchor='middle' x='504' y='4' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='504' y='20' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='504' y='36' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='512' y='4' fill='currentColor' style='font-size:1em'>O</text>
<text text-anchor='middle' x='512' y='20' fill='currentColor' style='font-size:1em'>O</text>
<text text-anchor='middle' x='512' y='36' fill='currentColor' style='font-size:1em'>선</text>
<text text-anchor='middle' x='520' y='4' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='520' y='20' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='520' y='36' fill='currentColor' style='font-size:1em'>형</text>
<text text-anchor='middle' x='528' y='4' fill='currentColor' style='font-size:1em'>1</text>
<text text-anchor='middle' x='528' y='20' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='536' y='4' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='536' y='20' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='536' y='36' fill='currentColor' style='font-size:1em'>+</text>
<text text-anchor='middle' x='552' y='4' fill='currentColor' style='font-size:1em'>메</text>
<text text-anchor='middle' x='552' y='20' fill='currentColor' style='font-size:1em'>어</text>
<text text-anchor='middle' x='552' y='36' fill='currentColor' style='font-size:1em'>희</text>
<text text-anchor='middle' x='560' y='4' fill='currentColor' style='font-size:1em'>모</text>
<text text-anchor='middle' x='560' y='20' fill='currentColor' style='font-size:1em'>텐</text>
<text text-anchor='middle' x='560' y='36' fill='currentColor' style='font-size:1em'>소</text>
<text text-anchor='middle' x='568' y='4' fill='currentColor' style='font-size:1em'>리</text>
<text text-anchor='middle' x='568' y='20' fill='currentColor' style='font-size:1em'>션</text>
<text text-anchor='middle' x='568' y='36' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='576' y='4' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='576' y='20' fill='currentColor' style='font-size:1em'>)</text>
</g>

    </svg>
  
</div>
<p>이 구조는 단순한 직관적 분할이 아니다. Transformer 아키텍처가 직면한 세 가지 근본적인 한계 — <strong>제곱 복잡도, 메모리 폭발, 연산 밀도</strong> — 에 대해 각 구간이 다른 해결책을 제시한다. 이 포스트에서는 각 레이어 그룹의 수학적 기초부터 MLX 프레임워크 최적화, 서멀 인식 학습까지, MacBook에서 LLM을 구동하는 전체 설계를 살펴본다.</p>
<h2 id="왜-pytorch가-아닌-mlx인가">왜 PyTorch가 아닌 MLX인가?</h2>
<p>Apple Silicon에서 MLX를 선택한 이유는 단순하다 — <strong>통합 메모리를 제대로 활용할 수 있는 유일한 프레임워크</strong>이기 때문이다.</p>
<table>
  <thead>
      <tr>
          <th>특징</th>
          <th>PyTorch (MPS)</th>
          <th>MLX</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>메모리 배치</td>
          <td>GPU → CPU 복사 필요</td>
          <td>통합 메모리 제로카피</td>
      </tr>
      <tr>
          <td>컴파일</td>
          <td><code>torch.compile</code> (베타)</td>
          <td><code>@mx.compile</code> (안정)</td>
      </tr>
      <tr>
          <td>Apple Silicon 최적화</td>
          <td>범용 백엔드</td>
          <td>네이티브 최적화</td>
      </tr>
      <tr>
          <td>SwiftUI 연동</td>
          <td>불가</td>
          <td>네이티브 앱 가능</td>
      </tr>
  </tbody>
</table>
<p>PyTorch의 MPS 백엔드는 Apple Silicon GPU를 지원하지만, 여전히 GPU와 CPU 사이에 메모리 복사가 발생한다. 16GB 통합 메모리를 가진 MacBook Air에서 이 복사 오버헤드는 치명적이다 — 텐서를 CPU에서 GPU로 복사할 때마다 메모리 대역폭을 소모하고, 추론 지연시간이 증가한다.</p>
<p>반면 MLX는 Apple의 통합 메모리 아키텍처에 직접 설계되었다. CPU와 GPU가 물리적으로 동일한 메모리를 공유하므로, 텐서 이동이 필요 없다. <code>@mx.compile</code> 데코레이터는 성능 크리티컬한 커널을 Apple Silicon GPU에 네이티브로 컴파일하여, PyTorch MPS 백엔드보다 일관되게 빠른 성능을 제공한다.</p>



<div class="goat svg-container ">
  
    <svg
      xmlns="http://www.w3.org/2000/svg"
      font-family="Menlo,Lucida Console,monospace"
      
        viewBox="0 0 280 153"
      >
      <g transform='translate(8,16)'>
<text text-anchor='middle' x='0' y='4' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='0' y='20' fill='currentColor' style='font-size:1em'>┌</text>
<text text-anchor='middle' x='0' y='36' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='0' y='52' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='0' y='68' fill='currentColor' style='font-size:1em'>└</text>
<text text-anchor='middle' x='0' y='84' fill='currentColor' style='font-size:1em'>┌</text>
<text text-anchor='middle' x='0' y='100' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='0' y='116' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='0' y='132' fill='currentColor' style='font-size:1em'>└</text>
<text text-anchor='middle' x='8' y='4' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='8' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='8' y='68' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='8' y='84' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='8' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='16' y='4' fill='currentColor' style='font-size:1em'>T</text>
<text text-anchor='middle' x='16' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='16' y='52' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='16' y='68' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='16' y='84' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='16' y='116' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='16' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='24' y='4' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='24' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='24' y='36' fill='currentColor' style='font-size:1em'>C</text>
<text text-anchor='middle' x='24' y='52' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='24' y='68' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='24' y='84' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='24' y='100' fill='currentColor' style='font-size:1em'>G</text>
<text text-anchor='middle' x='24' y='116' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='24' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='32' y='4' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='32' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='32' y='36' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='32' y='52' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='32' y='68' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='32' y='84' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='32' y='100' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='32' y='116' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='32' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='40' y='4' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='40' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='40' y='36' fill='currentColor' style='font-size:1em'>U</text>
<text text-anchor='middle' x='40' y='52' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='40' y='68' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='40' y='84' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='40' y='100' fill='currentColor' style='font-size:1em'>U</text>
<text text-anchor='middle' x='40' y='116' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='40' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='48' y='4' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='48' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='48' y='52' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='48' y='68' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='48' y='84' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='48' y='116' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='48' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='56' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='56' y='52' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='56' y='68' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='56' y='84' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='56' y='116' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='56' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='64' y='4' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='64' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='64' y='68' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='64' y='84' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='64' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='72' y='4' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='72' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='72' y='68' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='72' y='84' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='72' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='80' y='4' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='80' y='20' fill='currentColor' style='font-size:1em'>┐</text>
<text text-anchor='middle' x='80' y='36' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='80' y='52' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='80' y='68' fill='currentColor' style='font-size:1em'>┘</text>
<text text-anchor='middle' x='80' y='84' fill='currentColor' style='font-size:1em'>┐</text>
<text text-anchor='middle' x='80' y='100' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='80' y='116' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='80' y='132' fill='currentColor' style='font-size:1em'>┘</text>
<text text-anchor='middle' x='88' y='4' fill='currentColor' style='font-size:1em'>S</text>
<text text-anchor='middle' x='96' y='4' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='96' y='36' fill='currentColor' style='font-size:1em'>←</text>
<text text-anchor='middle' x='96' y='100' fill='currentColor' style='font-size:1em'>←</text>
<text text-anchor='middle' x='104' y='4' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='112' y='36' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='112' y='100' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='120' y='36' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='120' y='100' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='128' y='36' fill='currentColor' style='font-size:1em'>p</text>
<text text-anchor='middle' x='128' y='100' fill='currentColor' style='font-size:1em'>p</text>
<text text-anchor='middle' x='136' y='36' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='136' y='100' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='152' y='36' fill='currentColor' style='font-size:1em'>→</text>
<text text-anchor='middle' x='152' y='100' fill='currentColor' style='font-size:1em'>→</text>
<text text-anchor='middle' x='176' y='36' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='176' y='100' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='184' y='4' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='184' y='20' fill='currentColor' style='font-size:1em'>┌</text>
<text text-anchor='middle' x='184' y='52' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='184' y='68' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='184' y='84' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='184' y='116' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='184' y='132' fill='currentColor' style='font-size:1em'>└</text>
<text text-anchor='middle' x='192' y='4' fill='currentColor' style='font-size:1em'>L</text>
<text text-anchor='middle' x='192' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='192' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='200' y='4' fill='currentColor' style='font-size:1em'>X</text>
<text text-anchor='middle' x='200' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='200' y='36' fill='currentColor' style='font-size:1em'>C</text>
<text text-anchor='middle' x='200' y='68' fill='currentColor' style='font-size:1em'>U</text>
<text text-anchor='middle' x='200' y='84' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='200' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='208' y='4' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='208' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='208' y='36' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='208' y='68' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='208' y='84' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='208' y='116' fill='currentColor' style='font-size:1em'>G</text>
<text text-anchor='middle' x='208' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='216' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='216' y='36' fill='currentColor' style='font-size:1em'>U</text>
<text text-anchor='middle' x='216' y='68' fill='currentColor' style='font-size:1em'>i</text>
<text text-anchor='middle' x='216' y='84' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='216' y='116' fill='currentColor' style='font-size:1em'>P</text>
<text text-anchor='middle' x='216' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='224' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='224' y='68' fill='currentColor' style='font-size:1em'>f</text>
<text text-anchor='middle' x='224' y='84' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='224' y='116' fill='currentColor' style='font-size:1em'>U</text>
<text text-anchor='middle' x='224' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='232' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='232' y='68' fill='currentColor' style='font-size:1em'>i</text>
<text text-anchor='middle' x='232' y='84' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='232' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='240' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='240' y='68' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='240' y='84' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='240' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='248' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='248' y='68' fill='currentColor' style='font-size:1em'>d</text>
<text text-anchor='middle' x='248' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='256' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='256' y='36' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='256' y='100' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='256' y='132' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='264' y='20' fill='currentColor' style='font-size:1em'>┐</text>
<text text-anchor='middle' x='264' y='52' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='264' y='68' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='264' y='84' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='264' y='116' fill='currentColor' style='font-size:1em'>│</text>
<text text-anchor='middle' x='264' y='132' fill='currentColor' style='font-size:1em'>┘</text>
</g>

    </svg>
  
</div>
<p>이 차이는 4-bit 양자화된 3.2B 모델에서 극적으로 나타난다. PyTorch는 모델 가중치를 로드할 때 CPU 메모리에 먼저 배치한 다음 GPU로 복사해야 하므로 순간적으로 두 배의 메모리가 필요하다. MLX는 한 번만 할당하면 끝난다.</p>
<h2 id="삼층-아키텍처-설계-철학">삼층 아키텍처: 설계 철학</h2>
<p>샌드위치 아키텍처를 이해하려면 먼저 <strong>왜 이 분할인가</strong>를 이해해야 한다.</p>
<p>Transformer의 핵심 문제는 어텐션의 O(n²) 복잡도다. 시퀀스 길이가 4K에서 64K로 늘어나면, 어텐션 연산량은 256배 증가한다. State Space Model(SSM)은 이 문제를 O(n)으로 해결하지만, 어텐션만큼의 복잡한 의존성을 모델링하지 못한다는 단점이 있다.</p>
<p>Bit-Axon의 접근은 <strong>두 가지 패러다임의 장점을 계층적으로 결합</strong>하는 것이다:</p>
<ul>
<li><strong>문맥 흡수(SSM)</strong>: 64K 토큰을 읽어들일 때 선형 복잡도가 필수적. 어텐션으로 64K 토큰을 처리하는 것은 16GB 메모리에서 불가능하다.</li>
<li><strong>심층 추론(SWA + MoE)</strong>: 의미적 관계, 인과 추론, 복잡한 패턴 매칭에는 어텐션이 필요하지만, 전체 시퀀스가 아니라 로컬 윈도우만 보면 충분하다.</li>
<li><strong>출력 합성(SSM + MoE)</strong>: 최종 토큰 생성에서는 이미 추론이 완료된 표현을 합성하는 것이므로, SSM의 선형 연산으로 충분하다. MoE는 전문가 지식을 선택적으로 적용하여 품질을 높인다.</li>
</ul>
<p>이 설계는 각 레이어 그룹에 <strong>최소한의 복잡도</strong>를 할당하는 원칙을 따른다. 어텐션이 필요한 곳에만 어텐션을 두고, 나머지는 더 가벼운 SSM으로 처리한다.</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">9
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>@staticmethod
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">_get_layer_type</span>(layer_idx: <span style="color:#8be9fd;font-style:italic">int</span>, total_layers: <span style="color:#8be9fd;font-style:italic">int</span>) <span style="color:#ff79c6">-&gt;</span> <span style="color:#8be9fd;font-style:italic">str</span>:
</span></span><span style="display:flex;"><span>    third <span style="color:#ff79c6">=</span> total_layers <span style="color:#ff79c6">//</span> <span style="color:#bd93f9">3</span>  <span style="color:#6272a4"># 각 8개 레이어</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">if</span> layer_idx <span style="color:#ff79c6">&lt;</span> third:           <span style="color:#6272a4"># Layer 0-7: 순수 SSM</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#f1fa8c">&#34;ssm&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">elif</span> layer_idx <span style="color:#ff79c6">&lt;</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">*</span> third:     <span style="color:#6272a4"># Layer 8-15: SWA + MoE</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#f1fa8c">&#34;swa_moe&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">else</span>:                           <span style="color:#6272a4"># Layer 16-23: SSM + MoE</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> <span style="color:#f1fa8c">&#34;ssm_moe&#34;</span>
</span></span></code></pre></td></tr></table>
</div>
</div><h2 id="layer-1-8-pure-axon-ssm-문맥-흡수">Layer 1-8: Pure Axon-SSM (문맥 흡수)</h2>
<p>첫 8개 레이어는 순수 Mamba 스타일 **State Space Model(SSM)**이다. 어텐션이 없기 때문에 KV 캐시가 필요 없고, 토큰당 메모리가 O(1)로 고정된다. 이것이 64K 컨텍스트를 처리할 수 있는 이유다.</p>
<h3 id="ssm의-수학적-기초">SSM의 수학적 기초</h3>
<p>SSM은 연속 시간 상태 공간 모델에서 출발한다:</p>



<div class="goat svg-container ">
  
    <svg
      xmlns="http://www.w3.org/2000/svg"
      font-family="Menlo,Lucida Console,monospace"
      
        viewBox="0 0 280 41"
      >
      <g transform='translate(8,16)'>
<text text-anchor='middle' x='0' y='4' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='0' y='20' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='8' y='4' fill='currentColor' style='font-size:1em'>'</text>
<text text-anchor='middle' x='8' y='20' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='16' y='4' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='16' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='24' y='4' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='24' y='20' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='32' y='4' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='48' y='4' fill='currentColor' style='font-size:1em'>=</text>
<text text-anchor='middle' x='48' y='20' fill='currentColor' style='font-size:1em'>=</text>
<text text-anchor='middle' x='64' y='4' fill='currentColor' style='font-size:1em'>A</text>
<text text-anchor='middle' x='64' y='20' fill='currentColor' style='font-size:1em'>C</text>
<text text-anchor='middle' x='72' y='4' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='72' y='20' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='80' y='4' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='80' y='20' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='88' y='4' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='88' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='96' y='4' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='96' y='20' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='112' y='4' fill='currentColor' style='font-size:1em'>+</text>
<text text-anchor='middle' x='112' y='20' fill='currentColor' style='font-size:1em'>+</text>
<text text-anchor='middle' x='128' y='4' fill='currentColor' style='font-size:1em'>B</text>
<text text-anchor='middle' x='128' y='20' fill='currentColor' style='font-size:1em'>D</text>
<text text-anchor='middle' x='136' y='4' fill='currentColor' style='font-size:1em'>x</text>
<text text-anchor='middle' x='136' y='20' fill='currentColor' style='font-size:1em'>x</text>
<text text-anchor='middle' x='144' y='4' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='144' y='20' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='152' y='4' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='152' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='160' y='4' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='160' y='20' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='200' y='4' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='208' y='4' fill='currentColor' style='font-size:1em'>상</text>
<text text-anchor='middle' x='208' y='20' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='216' y='4' fill='currentColor' style='font-size:1em'>태</text>
<text text-anchor='middle' x='216' y='20' fill='currentColor' style='font-size:1em'>출</text>
<text text-anchor='middle' x='224' y='20' fill='currentColor' style='font-size:1em'>력</text>
<text text-anchor='middle' x='232' y='4' fill='currentColor' style='font-size:1em'>방</text>
<text text-anchor='middle' x='240' y='4' fill='currentColor' style='font-size:1em'>정</text>
<text text-anchor='middle' x='240' y='20' fill='currentColor' style='font-size:1em'>방</text>
<text text-anchor='middle' x='248' y='4' fill='currentColor' style='font-size:1em'>식</text>
<text text-anchor='middle' x='248' y='20' fill='currentColor' style='font-size:1em'>정</text>
<text text-anchor='middle' x='256' y='4' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='256' y='20' fill='currentColor' style='font-size:1em'>식</text>
<text text-anchor='middle' x='264' y='20' fill='currentColor' style='font-size:1em'>)</text>
</g>

    </svg>
  
</div>
<p>여기서 <code>x(t)</code>는 입력, <code>h(t)</code>는 상태 벡터, <code>y(t)</code>는 출력, <code>A/B/C/D</code>는 학습 가능한 파라미터 행렬이다. 연속 시간 모델을 이산화하면:</p>



<div class="goat svg-container ">
  
    <svg
      xmlns="http://www.w3.org/2000/svg"
      font-family="Menlo,Lucida Console,monospace"
      
        viewBox="0 0 184 41"
      >
      <g transform='translate(8,16)'>
<text text-anchor='middle' x='0' y='4' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='0' y='20' fill='currentColor' style='font-size:1em'>y</text>
<text text-anchor='middle' x='8' y='4' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='8' y='20' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='16' y='4' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='16' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='32' y='4' fill='currentColor' style='font-size:1em'>=</text>
<text text-anchor='middle' x='32' y='20' fill='currentColor' style='font-size:1em'>=</text>
<text text-anchor='middle' x='48' y='4' fill='currentColor' style='font-size:1em'>Ā</text>
<text text-anchor='middle' x='48' y='20' fill='currentColor' style='font-size:1em'>C</text>
<text text-anchor='middle' x='56' y='4' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='56' y='20' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='64' y='4' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='64' y='20' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='72' y='4' fill='currentColor' style='font-size:1em'>{</text>
<text text-anchor='middle' x='72' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='80' y='4' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='88' y='4' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='88' y='20' fill='currentColor' style='font-size:1em'>+</text>
<text text-anchor='middle' x='96' y='4' fill='currentColor' style='font-size:1em'>1</text>
<text text-anchor='middle' x='104' y='4' fill='currentColor' style='font-size:1em'>}</text>
<text text-anchor='middle' x='104' y='20' fill='currentColor' style='font-size:1em'>D</text>
<text text-anchor='middle' x='112' y='20' fill='currentColor' style='font-size:1em'>x</text>
<text text-anchor='middle' x='120' y='4' fill='currentColor' style='font-size:1em'>+</text>
<text text-anchor='middle' x='120' y='20' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='128' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='136' y='4' fill='currentColor' style='font-size:1em'>B</text>
<text text-anchor='middle' x='144' y='4' fill='currentColor' style='font-size:1em'>̄</text>
<text text-anchor='middle' x='152' y='4' fill='currentColor' style='font-size:1em'>x</text>
<text text-anchor='middle' x='160' y='4' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='168' y='4' fill='currentColor' style='font-size:1em'>t</text>
</g>

    </svg>
  
</div>
<p>이산화는 <strong>Zero-Order Hold (ZOH)</strong> 방식으로 수행되며, <code>dt</code> (step size)가 학습 가능한 파라미터다. 이 <code>dt</code>가 토큰마다 다른 값을 가질 수 있다는 점이 Mamba의 핵심 혁신이다 — 입력에 따라 상태 업데이트 속도가 조절된다.</p>
<h3 id="axonssm-구현-상세">AxonSSM 구현 상세</h3>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">9
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">AxonSSM</span>(nn<span style="color:#ff79c6">.</span>Module):
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> __init__(self, config: BitAxonConfig):
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>in_proj <span style="color:#ff79c6">=</span> nn<span style="color:#ff79c6">.</span>Linear(D, <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">*</span> E, bias<span style="color:#ff79c6">=</span><span style="color:#ff79c6">False</span>)          <span style="color:#6272a4"># 입력을 x와 z 브랜치로 분할</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>conv1d <span style="color:#ff79c6">=</span> nn<span style="color:#ff79c6">.</span>Conv1d(E, E, kernel_size<span style="color:#ff79c6">=</span>d_conv, groups<span style="color:#ff79c6">=</span>E)  <span style="color:#6272a4"># 깊이별 인과 컨볼루션</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>x_proj <span style="color:#ff79c6">=</span> nn<span style="color:#ff79c6">.</span>Linear(E, d_state <span style="color:#ff79c6">*</span> <span style="color:#bd93f9">2</span> <span style="color:#ff79c6">+</span> <span style="color:#bd93f9">1</span>, bias<span style="color:#ff79c6">=</span><span style="color:#ff79c6">False</span>)   <span style="color:#6272a4"># B, C, dt 파라미터로 투영</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>dt_proj <span style="color:#ff79c6">=</span> nn<span style="color:#ff79c6">.</span>Linear(<span style="color:#bd93f9">1</span>, E, bias<span style="color:#ff79c6">=</span><span style="color:#ff79c6">True</span>)                <span style="color:#6272a4"># 채널별 스텝 사이즈</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>out_proj <span style="color:#ff79c6">=</span> nn<span style="color:#ff79c6">.</span>Linear(E, D, bias<span style="color:#ff79c6">=</span><span style="color:#ff79c6">False</span>)              <span style="color:#6272a4"># 출력 투영</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>A_log <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>log(mx<span style="color:#ff79c6">.</span>arange(<span style="color:#bd93f9">1</span>, d_state <span style="color:#ff79c6">+</span> <span style="color:#bd93f9">1</span>))            <span style="color:#6272a4"># 대각선 SSM 상태 행렬</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>D <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>ones((E,))                                    <span style="color:#6272a4"># 스킵 연결 파라미터</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>핵심 컴포넌트 설계 결정:</p>
<ul>
<li><strong><code>A_log</code> 초기화</strong>: <code>log(1), log(2), ..., log(d_state)</code>으로 초기화하여 <code>A = -exp(A_log)</code>는 음수 대각선 행렬이 된다. 이는 상태가 시간에 따라 지수적으로 감쇠하도록 보장하여 수치적 안정성을 제공한다.</li>
<li><strong>인과 컨볼루션 (<code>conv1d</code>)</strong>: 커널 사이즈 4의 1D 컨볼루션으로 로컬 문맥을 먼저 추출한다. 이것은 &ldquo;최근 4개 토큰의 패턴을 먼저 보고, 그 다음 SSM 상태에 반영&quot;하는 직관과 일치한다.</li>
<li><strong>게이팅</strong>: <code>z</code> 브랜치는 SiLU 활성화로 정보 흐름을 제어한다. <code>y = SiLU(z) * SSM(x)</code> 형태로, SSM 출력에 선택적으로 가중치를 부여한다.</li>
</ul>
<h3 id="병렬-스캔-알고리즘">병렬 스캔 알고리즘</h3>
<p>순차적 순환 <code>h_t = Āh_{t-1} + B̄x_t</code>은 O(n)이지만 순차적이어서 병렬화가 불가능해 보인다. Mamba의 핵심 혁신은 이를 **연관 스캔(associative scan)**으로 병렬화하는 것이다.</p>
<p>Bit-Axon은 이를 청크 기반으로 구현한다:</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">9
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">_ssm_scan_parallel</span>(self, x, dt, B_in, C_in):
</span></span><span style="display:flex;"><span>    step <span style="color:#ff79c6">=</span> config<span style="color:#ff79c6">.</span>ssm_scan_step  <span style="color:#6272a4"># 기본값 64</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">for</span> j <span style="color:#ff79c6">in</span> <span style="color:#8be9fd;font-style:italic">range</span>(d_state):     <span style="color:#6272a4"># 상태 차원별로 독립 처리</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">for</span> i <span style="color:#ff79c6">in</span> <span style="color:#8be9fd;font-style:italic">range</span>(<span style="color:#bd93f9">0</span>, L, step):
</span></span><span style="display:flex;"><span>            S <span style="color:#ff79c6">=</span> <span style="color:#8be9fd;font-style:italic">min</span>(step, L <span style="color:#ff79c6">-</span> i)
</span></span><span style="display:flex;"><span>            dtA_chunk <span style="color:#ff79c6">=</span> dtA[:, i : i <span style="color:#ff79c6">+</span> S, :]
</span></span><span style="display:flex;"><span>            dtx_chunk <span style="color:#ff79c6">=</span> dtx[:, i : i <span style="color:#ff79c6">+</span> S, :]
</span></span><span style="display:flex;"><span>            B_chunk <span style="color:#ff79c6">=</span> B_in[:, i : i <span style="color:#ff79c6">+</span> S, j]
</span></span><span style="display:flex;"><span>            C_chunk <span style="color:#ff79c6">=</span> C_in[:, i : i <span style="color:#ff79c6">+</span> S, j]
</span></span></code></pre></td></tr></table>
</div>
</div><p>청크 사이즈 64는 Apple Silicon GPU의 워프 크기와 메모리 계산 균형에 최적화된 값이다. 너무 작으면 커널 런치 오버헤드가 크고, 너무 크면 메모리 사용량이 증가한다.</p>
<h3 id="세그먼트-합-최적화">세그먼트 합 최적화</h3>
<p>병렬 스캔의 핵심 연산인 세그먼트 합(<code>segsum</code>)은 MLX에 네이티브로 컴파일된다:</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">7
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">segsum</span>(x: mx<span style="color:#ff79c6">.</span>array) <span style="color:#ff79c6">-&gt;</span> mx<span style="color:#ff79c6">.</span>array:
</span></span><span style="display:flex;"><span>    <span style="color:#f1fa8c">&#34;&#34;&#34;하드웨어 효율적인 병렬 세그먼트 합&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    seq_len <span style="color:#ff79c6">=</span> x<span style="color:#ff79c6">.</span>shape[<span style="color:#ff79c6">-</span><span style="color:#bd93f9">1</span>]
</span></span><span style="display:flex;"><span>    cs <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>cumsum(x, axis<span style="color:#ff79c6">=-</span><span style="color:#bd93f9">1</span>)
</span></span><span style="display:flex;"><span>    diff <span style="color:#ff79c6">=</span> cs[<span style="color:#ff79c6">...</span>, :, <span style="color:#ff79c6">None</span>] <span style="color:#ff79c6">-</span> cs[<span style="color:#ff79c6">...</span>, <span style="color:#ff79c6">None</span>, :]
</span></span><span style="display:flex;"><span>    mask <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>tril(mx<span style="color:#ff79c6">.</span>ones((seq_len, seq_len), dtype<span style="color:#ff79c6">=</span>diff<span style="color:#ff79c6">.</span>dtype), <span style="color:#ff79c6">-</span><span style="color:#bd93f9">1</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> diff <span style="color:#ff79c6">*</span> mask
</span></span></code></pre></td></tr></table>
</div>
</div><p>이 연산은 <code>@mx.compile</code>로 컴파일되어 Apple Silicon GPU에서 네이티브로 실행된다.</p>
<h2 id="layer-9-16-swa--moe-심층-추론">Layer 9-16: SWA + MoE (심층 추론)</h2>
<p>중간 8개 레이어는 **Sliding Window Attention(SWA)**과 **Mixture of Experts(MoE)**를 결합한다. 이 구간이 모델의 <strong>추론 능력</strong>을 담당한다.</p>
<h3 id="sliding-window-attention">Sliding Window Attention</h3>
<p>표준 어텐션은 모든 토큰 쌍에 대해 점곱을 계산하여 O(n²) 복잡도를 가진다. SWA는 각 토큰이 이전 <code>window_size</code>개 토큰만 참조하도록 제한하여 O(n × window_size)로 줄인다.</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 9
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">10
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">11
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">12
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">_make_sliding_window_mask</span>(self, seq_len: <span style="color:#8be9fd;font-style:italic">int</span>, kv_len: <span style="color:#8be9fd;font-style:italic">int</span>, q_offset: <span style="color:#8be9fd;font-style:italic">int</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">0</span>):
</span></span><span style="display:flex;"><span>    q_pos <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>arange(q_offset, q_offset <span style="color:#ff79c6">+</span> seq_len)
</span></span><span style="display:flex;"><span>    k_pos <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>arange(kv_len)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 인과 제약: 미래 토큰을 볼 수 없음</span>
</span></span><span style="display:flex;"><span>    causal_mask <span style="color:#ff79c6">=</span> k_pos[<span style="color:#ff79c6">None</span>, :] <span style="color:#ff79c6">&lt;=</span> (q_pos[:, <span style="color:#ff79c6">None</span>] <span style="color:#ff79c6">+</span> causal_offset)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 윈도우 제약: 제한된 어텐션 범위</span>
</span></span><span style="display:flex;"><span>    window_mask <span style="color:#ff79c6">=</span> (q_pos[:, <span style="color:#ff79c6">None</span>] <span style="color:#ff79c6">+</span> causal_offset) <span style="color:#ff79c6">-</span> k_pos[<span style="color:#ff79c6">None</span>, :] <span style="color:#ff79c6">&lt;</span> self<span style="color:#ff79c6">.</span>window_size
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 결합: 인과+윈도우 밖의 위치는 -inf</span>
</span></span><span style="display:flex;"><span>    mask <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>where(causal_mask <span style="color:#ff79c6">&amp;</span> window_mask, <span style="color:#bd93f9">0.0</span>, <span style="color:#ff79c6">-</span>mx<span style="color:#ff79c6">.</span>inf)
</span></span></code></pre></td></tr></table>
</div>
</div><p>윈도우 사이즈 4096은 정확히 의도된 선택이다. 대부분의 자연어 의존성은 4K 토큰 내에서 해결된다 — 더 긴 거리의 의존성은 이미 Layer 1-8의 SSM이 처리했다. 따라서 SWA는 &ldquo;SSM이 흡수한 문맥 위에 로컬 정제를 수행&quot;하는 역할을 한다.</p>
<h3 id="kv-캐시-트리밍">KV 캐시 트리밍</h3>
<p>SWA의 핵심 메모리 최적화는 KV 캐시 트리밍이다:</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 9
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">10
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">KVCache</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> __init__(self, window_size: <span style="color:#8be9fd;font-style:italic">int</span> <span style="color:#ff79c6">|</span> <span style="color:#ff79c6">None</span> <span style="color:#ff79c6">=</span> <span style="color:#ff79c6">None</span>):
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>window_size <span style="color:#ff79c6">=</span> window_size
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> <span style="color:#50fa7b">update_and_fetch</span>(self, xk: mx<span style="color:#ff79c6">.</span>array, xv: mx<span style="color:#ff79c6">.</span>array) <span style="color:#ff79c6">-&gt;</span> <span style="color:#8be9fd;font-style:italic">tuple</span>[mx<span style="color:#ff79c6">.</span>array, mx<span style="color:#ff79c6">.</span>array]:
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>k <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>concatenate([self<span style="color:#ff79c6">.</span>k, xk], axis<span style="color:#ff79c6">=</span><span style="color:#bd93f9">2</span>)
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>v <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>concatenate([self<span style="color:#ff79c6">.</span>v, xv], axis<span style="color:#ff79c6">=</span><span style="color:#bd93f9">2</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> self<span style="color:#ff79c6">.</span>window_size <span style="color:#ff79c6">is</span> <span style="color:#ff79c6">not</span> <span style="color:#ff79c6">None</span>:
</span></span><span style="display:flex;"><span>            self<span style="color:#ff79c6">.</span>k <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>k[:, :, <span style="color:#ff79c6">-</span>self<span style="color:#ff79c6">.</span>window_size:]  <span style="color:#6272a4"># 윈도우로 트리밍</span>
</span></span><span style="display:flex;"><span>            self<span style="color:#ff79c6">.</span>v <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>v[:, :, <span style="color:#ff79c6">-</span>self<span style="color:#ff79c6">.</span>window_size:]
</span></span></code></pre></td></tr></table>
</div>
</div><p>이것이 64K 시퀀스를 처리하면서도 메모리가 O(window_size)로만 증가하는 이유다. 윈도우 밖의 KV 캐시는 버려진다 — SWA가 참조하지 않으므로 정보 손실이 없다.</p>
<h3 id="mixture-of-experts-구현">Mixture of Experts 구현</h3>
<p>MoE는 토큰마다 전문가(expert)를 동적으로 선택하여 연산을 희소화(sparse)한다:</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 9
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">10
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">11
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">12
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">SharedExpertMoE</span>(nn<span style="color:#ff79c6">.</span>Module):
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> __call__(self, x: mx<span style="color:#ff79c6">.</span>array) <span style="color:#ff79c6">-&gt;</span> mx<span style="color:#ff79c6">.</span>array:
</span></span><span style="display:flex;"><span>        gates <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>gate(x)                      <span style="color:#6272a4"># (batch, seq_len, num_experts)</span>
</span></span><span style="display:flex;"><span>        gates <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>softmax(gates, axis<span style="color:#ff79c6">=-</span><span style="color:#bd93f9">1</span>)        <span style="color:#6272a4"># 전문가별 소프트맥스</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># Top-k 전문가 선택</span>
</span></span><span style="display:flex;"><span>        inds <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>stop_gradient(mx<span style="color:#ff79c6">.</span>argpartition(<span style="color:#ff79c6">-</span>gates, kth<span style="color:#ff79c6">=</span>k<span style="color:#ff79c6">-</span><span style="color:#bd93f9">1</span>, axis<span style="color:#ff79c6">=-</span><span style="color:#bd93f9">1</span>)[<span style="color:#ff79c6">...</span>, :k])
</span></span><span style="display:flex;"><span>        scores <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>take_along_axis(gates, inds, axis<span style="color:#ff79c6">=-</span><span style="color:#bd93f9">1</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 전문가 처리</span>
</span></span><span style="display:flex;"><span>        y <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>switch_mlp(x, inds)
</span></span><span style="display:flex;"><span>        y <span style="color:#ff79c6">=</span> (y <span style="color:#ff79c6">*</span> scores[<span style="color:#ff79c6">...</span>, <span style="color:#ff79c6">None</span>])<span style="color:#ff79c6">.</span>sum(axis<span style="color:#ff79c6">=-</span><span style="color:#bd93f9">2</span>)  <span style="color:#6272a4"># 가중치 합산</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><strong><code>gather_mm</code> 최적화</strong>가 핵심이다. 전문가 라우팅은 수학적으로는 &ldquo;각 토큰에 대해 선택된 전문가의 가중치 행렬과 곱셈&quot;이지만, 이를 그대로 구현하면 모든 전문가의 가중치를 메모리에 올려야 한다.</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">9
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">SwitchLinear</span>(nn<span style="color:#ff79c6">.</span>Module):
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> __call__(self, x: mx<span style="color:#ff79c6">.</span>array, indices: mx<span style="color:#ff79c6">.</span>array) <span style="color:#ff79c6">-&gt;</span> mx<span style="color:#ff79c6">.</span>array:
</span></span><span style="display:flex;"><span>        B, L, K <span style="color:#ff79c6">=</span> indices<span style="color:#ff79c6">.</span>shape
</span></span><span style="display:flex;"><span>        flat_idx <span style="color:#ff79c6">=</span> indices<span style="color:#ff79c6">.</span>reshape(<span style="color:#ff79c6">-</span><span style="color:#bd93f9">1</span>)
</span></span><span style="display:flex;"><span>        x_flat <span style="color:#ff79c6">=</span> x<span style="color:#ff79c6">.</span>reshape(<span style="color:#ff79c6">-</span><span style="color:#bd93f9">1</span>, <span style="color:#bd93f9">1</span>, D)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        w_t <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>weight<span style="color:#ff79c6">.</span>swapaxes(<span style="color:#ff79c6">-</span><span style="color:#bd93f9">1</span>, <span style="color:#ff79c6">-</span><span style="color:#bd93f9">2</span>)
</span></span><span style="display:flex;"><span>        out <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>gather_mm(x_flat, w_t, rhs_indices<span style="color:#ff79c6">=</span>flat_idx,
</span></span><span style="display:flex;"><span>                          sorted_indices<span style="color:#ff79c6">=</span>sorted_indices)
</span></span></code></pre></td></tr></table>
</div>
</div><p><code>mx.gather_mm</code>은 MLX의 네이티브 연산으로, <strong>인덱스를 기반으로 가중치 행렬의 해당 행만 수집하여 곱셈</strong>을 수행한다. 전체 가중치 행렬을 순회할 필요 없이, 각 토큰이 할당된 전문가의 행만 계산한다. 정렬된 인덱스(<code>sorted_indices</code>)를 사용하면 메모리 접근 패턴이 연속적이 되어 캐시 효율이 극대화된다.</p>
<p>**공유 전문가(Shared Expert)**는 모든 토큰에 적용되는 추가 MLP다:</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">4
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6272a4"># 공유 전문가 게이팅</span>
</span></span><span style="display:flex;"><span>shared_out <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>shared_expert(x)
</span></span><span style="display:flex;"><span>gate <span style="color:#ff79c6">=</span> sigmoid(shared_expert_gate(x))
</span></span><span style="display:flex;"><span>output <span style="color:#ff79c6">=</span> gated_expert_output <span style="color:#ff79c6">+</span> gate <span style="color:#ff79c6">*</span> shared_out
</span></span></code></pre></td></tr></table>
</div>
</div><p>공유 전문가가 존재하는 이유는 Top-2 라우팅이 놓칠 수 있는 <strong>공통 지식을 보장</strong>하기 위함이다. &ldquo;자연어의 기본 문법&quot;이나 &ldquo;일반적인 세계 지식&rdquo; 같은 것은 전문가 라우팅에 의존하지 않고 항상 적용되어야 한다.</p>
<h3 id="파라미터-활성화-효율">파라미터 활성화 효율</h3>
<p>8개 전문가 중 Top-2만 활성화하므로, MoE FFN 파라미터 중 25%만 연산에 참여한다. 공유 전문가까지 포함해도 토큰당 활성화 파라미터는 약 1.4B로, 전체 3.2B의 44%에 불과하다.</p>
<h2 id="layer-17-24-ssm--moe-출력-합성">Layer 17-24: SSM + MoE (출력 합성)</h2>
<p>마지막 8개 레이어는 어텐션을 완전히 제거하고 SSM + MoE로 구성된다. 선형 순환과 희소 전문가만으로 <strong>빠른 출력 생성</strong>을 수행한다.</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 9
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">10
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">11
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">12
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">13
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">AxonSSMMoEBlock</span>(nn<span style="color:#ff79c6">.</span>Module):
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> __call__(self, x, cache<span style="color:#ff79c6">=</span><span style="color:#ff79c6">None</span>):
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># SSM with residual</span>
</span></span><span style="display:flex;"><span>        residual <span style="color:#ff79c6">=</span> x
</span></span><span style="display:flex;"><span>        x <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>input_norm(x)
</span></span><span style="display:flex;"><span>        ssm_out, ssm_cache <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>ssm(x, cache<span style="color:#ff79c6">=</span>cache)
</span></span><span style="display:flex;"><span>        x <span style="color:#ff79c6">=</span> residual <span style="color:#ff79c6">+</span> ssm_out
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># MoE with residual</span>
</span></span><span style="display:flex;"><span>        residual <span style="color:#ff79c6">=</span> x
</span></span><span style="display:flex;"><span>        x <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>post_ssm_norm(x)
</span></span><span style="display:flex;"><span>        x <span style="color:#ff79c6">=</span> residual <span style="color:#ff79c6">+</span> self<span style="color:#ff79c6">.</span>moe(x)
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> x, ssm_cache
</span></span></code></pre></td></tr></table>
</div>
</div><p>왜 마지막 구간에 어텐션이 없는가? 자기 회귀 생성(autoregressive generation)에서 중요한 것은 <strong>마지막 토큰의 표현</strong>이다. 이 시점에서 이미 Layer 9-16의 SWA가 추론을 완료했고, Layer 17-24는 이 추론 결과를 최종 토큰 분포로 변환하는 합성 단계다. 합성에는 어텐션의 전역 문맥이 필요 없다 — SSM의 선형 연산과 MoE의 전문가 지식으로 충분하다.</p>
<h2 id="메모리-예산-상세-분석">메모리 예산 상세 분석</h2>
<p>MacBook Air M4 (16GB 통합 메모리)에서 모델을 구동하려면 메모리를 정밀하게 관리해야 한다. macOS가 시스템에 약 6-8GB를 할당하므로, 모델에 가용한 메모리는 약 8GB다.</p>
<h3 id="가중치-메모리">가중치 메모리</h3>
<table>
  <thead>
      <tr>
          <th>구성</th>
          <th>파라미터 수</th>
          <th>메모리 (FP16)</th>
          <th>메모리 (4-bit)</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>전체 모델</td>
          <td>3.2B</td>
          <td>~6,400 MB</td>
          <td>~1,600 MB</td>
      </tr>
      <tr>
          <td>SSM 레이어 (8개)</td>
          <td>~0.8B</td>
          <td>~1,600 MB</td>
          <td>~400 MB</td>
      </tr>
      <tr>
          <td>SWA+MoE 레이어 (8개)</td>
          <td>~1.6B</td>
          <td>~3,200 MB</td>
          <td>~800 MB</td>
      </tr>
      <tr>
          <td>SSM+MoE 레이어 (8개)</td>
          <td>~0.8B</td>
          <td>~1,600 MB</td>
          <td>~400 MB</td>
      </tr>
  </tbody>
</table>
<h3 id="추론-메모리-kv-캐시--활성화">추론 메모리 (KV 캐시 + 활성화)</h3>
<table>
  <thead>
      <tr>
          <th>시퀀스 길이</th>
          <th>KV 캐시 (SWA 8레이어)</th>
          <th>활성화 메모리</th>
          <th>총 추론 메모리</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>4K</td>
          <td>~200 MB</td>
          <td>~400 MB</td>
          <td>~600 MB</td>
      </tr>
      <tr>
          <td>16K</td>
          <td>~200 MB</td>
          <td>~600 MB</td>
          <td>~800 MB</td>
      </tr>
      <tr>
          <td>64K</td>
          <td>~200 MB</td>
          <td>~1,200 MB</td>
          <td>~1,400 MB</td>
      </tr>
  </tbody>
</table>
<p>SWA의 KV 캐시가 시퀀스 길이에 따라 증가하지 않는 이유는 <code>window_size</code> 트리밍 때문이다. 윈도우 4096만큼만 KV 캐시를 유지하므로, 64K 시퀀스에서도 캐시 크기는 4K와 동일하다.</p>
<h3 id="4-bit-nf4-양자화">4-bit NF4 양자화</h3>
<p>양자화는 모델 크기를 4배 줄이는 핵심 기술이다. NF4 (NormalFloat 4)는 정규 분포에 최적화된 4-bit 데이터 포맷으로, 일반적인 int4 양자화보다 정보 손실이 적다.</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 9
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">10
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">11
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">12
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">13
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">QuantizedSwitchLinear</span>(nn<span style="color:#ff79c6">.</span>Module):
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> __init__(self, input_dims, output_dims, num_experts,
</span></span><span style="display:flex;"><span>                 group_size<span style="color:#ff79c6">=</span><span style="color:#bd93f9">64</span>, bits<span style="color:#ff79c6">=</span><span style="color:#bd93f9">4</span>):
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 그룹별 양자화: 64개 원소마다 스케일 팩터와 바이어스</span>
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>weight, self<span style="color:#ff79c6">.</span>scales, self<span style="color:#ff79c6">.</span>biases_quant <span style="color:#ff79c6">=</span> \
</span></span><span style="display:flex;"><span>            mx<span style="color:#ff79c6">.</span>quantize(weight, group_size<span style="color:#ff79c6">=</span>group_size, bits<span style="color:#ff79c6">=</span>bits)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> __call__(self, x: mx<span style="color:#ff79c6">.</span>array, indices: mx<span style="color:#ff79c6">.</span>array):
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 양자화된 가중치 + gather를 결합한 단일 연산</span>
</span></span><span style="display:flex;"><span>        out <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>gather_qmm(
</span></span><span style="display:flex;"><span>            x_flat, self<span style="color:#ff79c6">.</span>weight, self<span style="color:#ff79c6">.</span>scales, self<span style="color:#ff79c6">.</span>biases_quant,
</span></span><span style="display:flex;"><span>            rhs_indices<span style="color:#ff79c6">=</span>flat_idx, group_size<span style="color:#ff79c6">=</span>self<span style="color:#ff79c6">.</span>group_size, bits<span style="color:#ff79c6">=</span>self<span style="color:#ff79c6">.</span>bits
</span></span><span style="display:flex;"><span>        )
</span></span></code></pre></td></tr></table>
</div>
</div><p><code>mx.gather_qmm</code>는 양자화 해제(dequantization)와 gather를 하나의 퓨전 연산으로 결합한다. 별도의 디코딩 단계 없이 양자화된 가중치를 직접 사용하므로 메모리 대역폭이 절약된다.</p>
<h3 id="최종-메모리-구성">최종 메모리 구성</h3>



<div class="goat svg-container ">
  
    <svg
      xmlns="http://www.w3.org/2000/svg"
      font-family="Menlo,Lucida Console,monospace"
      
        viewBox="0 0 248 105"
      >
      <g transform='translate(8,16)'>
<text text-anchor='middle' x='0' y='4' fill='currentColor' style='font-size:1em'>총</text>
<text text-anchor='middle' x='0' y='20' fill='currentColor' style='font-size:1em'>├</text>
<text text-anchor='middle' x='0' y='36' fill='currentColor' style='font-size:1em'>├</text>
<text text-anchor='middle' x='0' y='52' fill='currentColor' style='font-size:1em'>├</text>
<text text-anchor='middle' x='0' y='68' fill='currentColor' style='font-size:1em'>├</text>
<text text-anchor='middle' x='0' y='84' fill='currentColor' style='font-size:1em'>└</text>
<text text-anchor='middle' x='8' y='20' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='8' y='36' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='8' y='52' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='8' y='68' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='8' y='84' fill='currentColor' style='font-size:1em'>─</text>
<text text-anchor='middle' x='16' y='4' fill='currentColor' style='font-size:1em'>가</text>
<text text-anchor='middle' x='24' y='4' fill='currentColor' style='font-size:1em'>용</text>
<text text-anchor='middle' x='24' y='20' fill='currentColor' style='font-size:1em'>모</text>
<text text-anchor='middle' x='24' y='36' fill='currentColor' style='font-size:1em'>K</text>
<text text-anchor='middle' x='24' y='52' fill='currentColor' style='font-size:1em'>활</text>
<text text-anchor='middle' x='24' y='68' fill='currentColor' style='font-size:1em'>O</text>
<text text-anchor='middle' x='24' y='84' fill='currentColor' style='font-size:1em'>남</text>
<text text-anchor='middle' x='32' y='20' fill='currentColor' style='font-size:1em'>델</text>
<text text-anchor='middle' x='32' y='36' fill='currentColor' style='font-size:1em'>V</text>
<text text-anchor='middle' x='32' y='52' fill='currentColor' style='font-size:1em'>성</text>
<text text-anchor='middle' x='32' y='68' fill='currentColor' style='font-size:1em'>S</text>
<text text-anchor='middle' x='32' y='84' fill='currentColor' style='font-size:1em'>은</text>
<text text-anchor='middle' x='40' y='4' fill='currentColor' style='font-size:1em'>메</text>
<text text-anchor='middle' x='40' y='52' fill='currentColor' style='font-size:1em'>화</text>
<text text-anchor='middle' x='48' y='4' fill='currentColor' style='font-size:1em'>모</text>
<text text-anchor='middle' x='48' y='20' fill='currentColor' style='font-size:1em'>가</text>
<text text-anchor='middle' x='48' y='36' fill='currentColor' style='font-size:1em'>캐</text>
<text text-anchor='middle' x='48' y='68' fill='currentColor' style='font-size:1em'>예</text>
<text text-anchor='middle' x='48' y='84' fill='currentColor' style='font-size:1em'>공</text>
<text text-anchor='middle' x='56' y='4' fill='currentColor' style='font-size:1em'>리</text>
<text text-anchor='middle' x='56' y='20' fill='currentColor' style='font-size:1em'>중</text>
<text text-anchor='middle' x='56' y='36' fill='currentColor' style='font-size:1em'>시</text>
<text text-anchor='middle' x='56' y='52' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='56' y='68' fill='currentColor' style='font-size:1em'>비</text>
<text text-anchor='middle' x='56' y='84' fill='currentColor' style='font-size:1em'>간</text>
<text text-anchor='middle' x='64' y='4' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='64' y='20' fill='currentColor' style='font-size:1em'>치</text>
<text text-anchor='middle' x='64' y='52' fill='currentColor' style='font-size:1em'>4</text>
<text text-anchor='middle' x='64' y='68' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='64' y='84' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='72' y='36' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='72' y='52' fill='currentColor' style='font-size:1em'>K</text>
<text text-anchor='middle' x='80' y='4' fill='currentColor' style='font-size:1em'>~</text>
<text text-anchor='middle' x='80' y='20' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='80' y='36' fill='currentColor' style='font-size:1em'>고</text>
<text text-anchor='middle' x='80' y='68' fill='currentColor' style='font-size:1em'>~</text>
<text text-anchor='middle' x='80' y='84' fill='currentColor' style='font-size:1em'>~</text>
<text text-anchor='middle' x='88' y='4' fill='currentColor' style='font-size:1em'>8</text>
<text text-anchor='middle' x='88' y='20' fill='currentColor' style='font-size:1em'>4</text>
<text text-anchor='middle' x='88' y='36' fill='currentColor' style='font-size:1em'>정</text>
<text text-anchor='middle' x='88' y='52' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='88' y='68' fill='currentColor' style='font-size:1em'>1</text>
<text text-anchor='middle' x='88' y='84' fill='currentColor' style='font-size:1em'>4</text>
<text text-anchor='middle' x='96' y='4' fill='currentColor' style='font-size:1em'>,</text>
<text text-anchor='middle' x='96' y='20' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='96' y='36' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='96' y='52' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='96' y='68' fill='currentColor' style='font-size:1em'>,</text>
<text text-anchor='middle' x='96' y='84' fill='currentColor' style='font-size:1em'>,</text>
<text text-anchor='middle' x='104' y='4' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='104' y='20' fill='currentColor' style='font-size:1em'>b</text>
<text text-anchor='middle' x='104' y='36' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='104' y='52' fill='currentColor' style='font-size:1em'>x</text>
<text text-anchor='middle' x='104' y='68' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='104' y='84' fill='currentColor' style='font-size:1em'>8</text>
<text text-anchor='middle' x='112' y='4' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='112' y='20' fill='currentColor' style='font-size:1em'>i</text>
<text text-anchor='middle' x='112' y='52' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='112' y='68' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='112' y='84' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='120' y='4' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='120' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='120' y='36' fill='currentColor' style='font-size:1em'>~</text>
<text text-anchor='middle' x='120' y='52' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='120' y='68' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='120' y='84' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='128' y='20' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='128' y='36' fill='currentColor' style='font-size:1em'>2</text>
<text text-anchor='middle' x='136' y='4' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='136' y='20' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='136' y='36' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='136' y='52' fill='currentColor' style='font-size:1em'>~</text>
<text text-anchor='middle' x='136' y='68' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='136' y='84' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='144' y='4' fill='currentColor' style='font-size:1em'>B</text>
<text text-anchor='middle' x='144' y='36' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='144' y='52' fill='currentColor' style='font-size:1em'>4</text>
<text text-anchor='middle' x='144' y='68' fill='currentColor' style='font-size:1em'>B</text>
<text text-anchor='middle' x='144' y='84' fill='currentColor' style='font-size:1em'>B</text>
<text text-anchor='middle' x='152' y='20' fill='currentColor' style='font-size:1em'>~</text>
<text text-anchor='middle' x='152' y='52' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='160' y='20' fill='currentColor' style='font-size:1em'>1</text>
<text text-anchor='middle' x='160' y='36' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='160' y='52' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='160' y='84' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='168' y='20' fill='currentColor' style='font-size:1em'>,</text>
<text text-anchor='middle' x='168' y='36' fill='currentColor' style='font-size:1em'>B</text>
<text text-anchor='middle' x='168' y='84' fill='currentColor' style='font-size:1em'>다</text>
<text text-anchor='middle' x='176' y='20' fill='currentColor' style='font-size:1em'>6</text>
<text text-anchor='middle' x='176' y='52' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='176' y='84' fill='currentColor' style='font-size:1em'>른</text>
<text text-anchor='middle' x='184' y='20' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='184' y='52' fill='currentColor' style='font-size:1em'>B</text>
<text text-anchor='middle' x='192' y='20' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='192' y='84' fill='currentColor' style='font-size:1em'>작</text>
<text text-anchor='middle' x='200' y='84' fill='currentColor' style='font-size:1em'>업</text>
<text text-anchor='middle' x='208' y='20' fill='currentColor' style='font-size:1em'>M</text>
<text text-anchor='middle' x='216' y='20' fill='currentColor' style='font-size:1em'>B</text>
<text text-anchor='middle' x='216' y='84' fill='currentColor' style='font-size:1em'>가</text>
<text text-anchor='middle' x='224' y='84' fill='currentColor' style='font-size:1em'>능</text>
<text text-anchor='middle' x='232' y='84' fill='currentColor' style='font-size:1em'>)</text>
</g>

    </svg>
  
</div>
<p>4-bit 양자화만으로 4K 컨텍스트에서 약 2.2GB, 64K 컨텍스트에서 약 3.2GB를 사용하여 16GB MacBook에서 여유롭게 구동된다.</p>
<h2 id="서멀-인식-학습">서멀 인식 학습</h2>
<p>Bit-Axon의 가장 실용적인 혁신은 <strong>서멀 인식 학습 파이프라인</strong>이다. 무팬 MacBook Air에서 지속적인 학습이 가능하도록 세 단계 서멀 정책을 적용한다.</p>
<h3 id="서멀-정책-구현">서멀 정책 구현</h3>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">6
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>@dataclass
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">ThermalPolicy</span>:
</span></span><span style="display:flex;"><span>    max_speed_temp: <span style="color:#8be9fd;font-style:italic">float</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">75.0</span>    <span style="color:#6272a4"># 이 온도 이하: 전속 학습</span>
</span></span><span style="display:flex;"><span>    pause_temp: <span style="color:#8be9fd;font-style:italic">float</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">85.0</span>        <span style="color:#6272a4"># 이 온도 이상: 학습 일시 정지</span>
</span></span><span style="display:flex;"><span>    stop_temp: <span style="color:#8be9fd;font-style:italic">float</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">95.0</span>         <span style="color:#6272a4"># 이 온도 이상: 학습 중단</span>
</span></span><span style="display:flex;"><span>    pause_duration: <span style="color:#8be9fd;font-style:italic">float</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">0.5</span>      <span style="color:#6272a4"># 냉각 대기 시간 (초)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 9
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">10
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">11
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">12
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">13
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">14
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">15
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">16
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">17
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">18
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">CoolingScheduler</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> __init__(self, monitor, policy: ThermalPolicy <span style="color:#ff79c6">=</span> <span style="color:#ff79c6">None</span>):
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>_monitor <span style="color:#ff79c6">=</span> monitor
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>_policy <span style="color:#ff79c6">=</span> policy <span style="color:#ff79c6">or</span> ThermalPolicy()
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>_total_pause_time: <span style="color:#8be9fd;font-style:italic">float</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">0.0</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> <span style="color:#50fa7b">check_before_step</span>(self, step: <span style="color:#8be9fd;font-style:italic">int</span>) <span style="color:#ff79c6">-&gt;</span> <span style="color:#ff79c6">None</span>:
</span></span><span style="display:flex;"><span>        temp <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>_monitor<span style="color:#ff79c6">.</span>temperature
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> temp <span style="color:#ff79c6">&gt;=</span> self<span style="color:#ff79c6">.</span>_policy<span style="color:#ff79c6">.</span>stop_temp:  <span style="color:#6272a4"># 95°C 임계</span>
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">raise</span> ThermalShutdownError(
</span></span><span style="display:flex;"><span>                <span style="color:#f1fa8c">f</span><span style="color:#f1fa8c">&#34;SoC temperature </span><span style="color:#f1fa8c">{</span>temp<span style="color:#f1fa8c">:</span><span style="color:#f1fa8c">.1f</span><span style="color:#f1fa8c">}</span><span style="color:#f1fa8c">C exceeds stop threshold&#34;</span>)
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">while</span> temp <span style="color:#ff79c6">&gt;=</span> self<span style="color:#ff79c6">.</span>_policy<span style="color:#ff79c6">.</span>pause_temp:  <span style="color:#6272a4"># 85°C 임계</span>
</span></span><span style="display:flex;"><span>            time<span style="color:#ff79c6">.</span>sleep(self<span style="color:#ff79c6">.</span>_policy<span style="color:#ff79c6">.</span>pause_duration)  <span style="color:#6272a4"># 0.5초 대기</span>
</span></span><span style="display:flex;"><span>            self<span style="color:#ff79c6">.</span>_total_pause_time <span style="color:#ff79c6">+=</span> self<span style="color:#ff79c6">.</span>_policy<span style="color:#ff79c6">.</span>pause_duration
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> <span style="color:#50fa7b">should_reduce_batch</span>(self) <span style="color:#ff79c6">-&gt;</span> <span style="color:#8be9fd;font-style:italic">bool</span>:
</span></span><span style="display:flex;"><span>        temp <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>_monitor<span style="color:#ff79c6">.</span>temperature
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">return</span> self<span style="color:#ff79c6">.</span>_policy<span style="color:#ff79c6">.</span>max_speed_temp <span style="color:#ff79c6">&lt;=</span> temp <span style="color:#ff79c6">&lt;</span> self<span style="color:#ff79c6">.</span>_policy<span style="color:#ff79c6">.</span>pause_temp
</span></span></code></pre></td></tr></table>
</div>
</div>


<div class="goat svg-container ">
  
    <svg
      xmlns="http://www.w3.org/2000/svg"
      font-family="Menlo,Lucida Console,monospace"
      
        viewBox="0 0 416 73"
      >
      <g transform='translate(8,16)'>
<text text-anchor='middle' x='0' y='4' fill='currentColor' style='font-size:1em'>온</text>
<text text-anchor='middle' x='0' y='20' fill='currentColor' style='font-size:1em'>온</text>
<text text-anchor='middle' x='0' y='36' fill='currentColor' style='font-size:1em'>온</text>
<text text-anchor='middle' x='0' y='52' fill='currentColor' style='font-size:1em'>온</text>
<text text-anchor='middle' x='8' y='4' fill='currentColor' style='font-size:1em'>도</text>
<text text-anchor='middle' x='8' y='20' fill='currentColor' style='font-size:1em'>도</text>
<text text-anchor='middle' x='8' y='36' fill='currentColor' style='font-size:1em'>도</text>
<text text-anchor='middle' x='8' y='52' fill='currentColor' style='font-size:1em'>도</text>
<text text-anchor='middle' x='24' y='4' fill='currentColor' style='font-size:1em'>&lt;</text>
<text text-anchor='middle' x='24' y='20' fill='currentColor' style='font-size:1em'>7</text>
<text text-anchor='middle' x='24' y='36' fill='currentColor' style='font-size:1em'>8</text>
<text text-anchor='middle' x='24' y='52' fill='currentColor' style='font-size:1em'>≥</text>
<text text-anchor='middle' x='32' y='20' fill='currentColor' style='font-size:1em'>5</text>
<text text-anchor='middle' x='32' y='36' fill='currentColor' style='font-size:1em'>5</text>
<text text-anchor='middle' x='40' y='4' fill='currentColor' style='font-size:1em'>7</text>
<text text-anchor='middle' x='40' y='20' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='40' y='36' fill='currentColor' style='font-size:1em'>-</text>
<text text-anchor='middle' x='40' y='52' fill='currentColor' style='font-size:1em'>9</text>
<text text-anchor='middle' x='48' y='4' fill='currentColor' style='font-size:1em'>5</text>
<text text-anchor='middle' x='48' y='20' fill='currentColor' style='font-size:1em'>8</text>
<text text-anchor='middle' x='48' y='36' fill='currentColor' style='font-size:1em'>9</text>
<text text-anchor='middle' x='48' y='52' fill='currentColor' style='font-size:1em'>5</text>
<text text-anchor='middle' x='56' y='4' fill='currentColor' style='font-size:1em'>°</text>
<text text-anchor='middle' x='56' y='20' fill='currentColor' style='font-size:1em'>5</text>
<text text-anchor='middle' x='56' y='36' fill='currentColor' style='font-size:1em'>5</text>
<text text-anchor='middle' x='56' y='52' fill='currentColor' style='font-size:1em'>°</text>
<text text-anchor='middle' x='64' y='4' fill='currentColor' style='font-size:1em'>C</text>
<text text-anchor='middle' x='64' y='20' fill='currentColor' style='font-size:1em'>°</text>
<text text-anchor='middle' x='64' y='36' fill='currentColor' style='font-size:1em'>°</text>
<text text-anchor='middle' x='64' y='52' fill='currentColor' style='font-size:1em'>C</text>
<text text-anchor='middle' x='72' y='20' fill='currentColor' style='font-size:1em'>C</text>
<text text-anchor='middle' x='72' y='36' fill='currentColor' style='font-size:1em'>C</text>
<text text-anchor='middle' x='88' y='4' fill='currentColor' style='font-size:1em'>→</text>
<text text-anchor='middle' x='88' y='20' fill='currentColor' style='font-size:1em'>→</text>
<text text-anchor='middle' x='88' y='36' fill='currentColor' style='font-size:1em'>→</text>
<text text-anchor='middle' x='88' y='52' fill='currentColor' style='font-size:1em'>→</text>
<text text-anchor='middle' x='104' y='4' fill='currentColor' style='font-size:1em'>정</text>
<text text-anchor='middle' x='104' y='20' fill='currentColor' style='font-size:1em'>경</text>
<text text-anchor='middle' x='104' y='36' fill='currentColor' style='font-size:1em'>위</text>
<text text-anchor='middle' x='104' y='52' fill='currentColor' style='font-size:1em'>임</text>
<text text-anchor='middle' x='112' y='4' fill='currentColor' style='font-size:1em'>상</text>
<text text-anchor='middle' x='112' y='20' fill='currentColor' style='font-size:1em'>고</text>
<text text-anchor='middle' x='112' y='36' fill='currentColor' style='font-size:1em'>험</text>
<text text-anchor='middle' x='112' y='52' fill='currentColor' style='font-size:1em'>계</text>
<text text-anchor='middle' x='120' y='4' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='120' y='20' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='120' y='36' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='120' y='52' fill='currentColor' style='font-size:1em'>:</text>
<text text-anchor='middle' x='136' y='4' fill='currentColor' style='font-size:1em'>전</text>
<text text-anchor='middle' x='136' y='20' fill='currentColor' style='font-size:1em'>배</text>
<text text-anchor='middle' x='136' y='36' fill='currentColor' style='font-size:1em'>0</text>
<text text-anchor='middle' x='136' y='52' fill='currentColor' style='font-size:1em'>학</text>
<text text-anchor='middle' x='144' y='4' fill='currentColor' style='font-size:1em'>속</text>
<text text-anchor='middle' x='144' y='20' fill='currentColor' style='font-size:1em'>치</text>
<text text-anchor='middle' x='144' y='36' fill='currentColor' style='font-size:1em'>.</text>
<text text-anchor='middle' x='144' y='52' fill='currentColor' style='font-size:1em'>습</text>
<text text-anchor='middle' x='152' y='36' fill='currentColor' style='font-size:1em'>5</text>
<text text-anchor='middle' x='160' y='4' fill='currentColor' style='font-size:1em'>학</text>
<text text-anchor='middle' x='160' y='20' fill='currentColor' style='font-size:1em'>사</text>
<text text-anchor='middle' x='160' y='36' fill='currentColor' style='font-size:1em'>초</text>
<text text-anchor='middle' x='160' y='52' fill='currentColor' style='font-size:1em'>중</text>
<text text-anchor='middle' x='168' y='4' fill='currentColor' style='font-size:1em'>습</text>
<text text-anchor='middle' x='168' y='20' fill='currentColor' style='font-size:1em'>이</text>
<text text-anchor='middle' x='168' y='52' fill='currentColor' style='font-size:1em'>단</text>
<text text-anchor='middle' x='176' y='20' fill='currentColor' style='font-size:1em'>즈</text>
<text text-anchor='middle' x='176' y='36' fill='currentColor' style='font-size:1em'>단</text>
<text text-anchor='middle' x='184' y='36' fill='currentColor' style='font-size:1em'>위</text>
<text text-anchor='middle' x='184' y='52' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='192' y='20' fill='currentColor' style='font-size:1em'>자</text>
<text text-anchor='middle' x='192' y='52' fill='currentColor' style='font-size:1em'>T</text>
<text text-anchor='middle' x='200' y='20' fill='currentColor' style='font-size:1em'>동</text>
<text text-anchor='middle' x='200' y='36' fill='currentColor' style='font-size:1em'>일</text>
<text text-anchor='middle' x='200' y='52' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='208' y='36' fill='currentColor' style='font-size:1em'>시</text>
<text text-anchor='middle' x='208' y='52' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='216' y='20' fill='currentColor' style='font-size:1em'>축</text>
<text text-anchor='middle' x='216' y='52' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='224' y='20' fill='currentColor' style='font-size:1em'>소</text>
<text text-anchor='middle' x='224' y='36' fill='currentColor' style='font-size:1em'>정</text>
<text text-anchor='middle' x='224' y='52' fill='currentColor' style='font-size:1em'>m</text>
<text text-anchor='middle' x='232' y='36' fill='currentColor' style='font-size:1em'>지</text>
<text text-anchor='middle' x='232' y='52' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='240' y='20' fill='currentColor' style='font-size:1em'>(</text>
<text text-anchor='middle' x='240' y='52' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='248' y='20' fill='currentColor' style='font-size:1em'>s</text>
<text text-anchor='middle' x='248' y='36' fill='currentColor' style='font-size:1em'>후</text>
<text text-anchor='middle' x='248' y='52' fill='currentColor' style='font-size:1em'>S</text>
<text text-anchor='middle' x='256' y='20' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='256' y='52' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='264' y='20' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='264' y='36' fill='currentColor' style='font-size:1em'>재</text>
<text text-anchor='middle' x='264' y='52' fill='currentColor' style='font-size:1em'>u</text>
<text text-anchor='middle' x='272' y='20' fill='currentColor' style='font-size:1em'>u</text>
<text text-anchor='middle' x='272' y='36' fill='currentColor' style='font-size:1em'>개</text>
<text text-anchor='middle' x='272' y='52' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='280' y='20' fill='currentColor' style='font-size:1em'>l</text>
<text text-anchor='middle' x='280' y='52' fill='currentColor' style='font-size:1em'>d</text>
<text text-anchor='middle' x='288' y='20' fill='currentColor' style='font-size:1em'>d</text>
<text text-anchor='middle' x='288' y='52' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='296' y='20' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='296' y='52' fill='currentColor' style='font-size:1em'>w</text>
<text text-anchor='middle' x='304' y='20' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='304' y='52' fill='currentColor' style='font-size:1em'>n</text>
<text text-anchor='middle' x='312' y='20' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='312' y='52' fill='currentColor' style='font-size:1em'>E</text>
<text text-anchor='middle' x='320' y='20' fill='currentColor' style='font-size:1em'>d</text>
<text text-anchor='middle' x='320' y='52' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='328' y='20' fill='currentColor' style='font-size:1em'>u</text>
<text text-anchor='middle' x='328' y='52' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='336' y='20' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='336' y='52' fill='currentColor' style='font-size:1em'>o</text>
<text text-anchor='middle' x='344' y='20' fill='currentColor' style='font-size:1em'>e</text>
<text text-anchor='middle' x='344' y='52' fill='currentColor' style='font-size:1em'>r</text>
<text text-anchor='middle' x='352' y='20' fill='currentColor' style='font-size:1em'>_</text>
<text text-anchor='middle' x='352' y='52' fill='currentColor' style='font-size:1em'>)</text>
<text text-anchor='middle' x='360' y='20' fill='currentColor' style='font-size:1em'>b</text>
<text text-anchor='middle' x='368' y='20' fill='currentColor' style='font-size:1em'>a</text>
<text text-anchor='middle' x='376' y='20' fill='currentColor' style='font-size:1em'>t</text>
<text text-anchor='middle' x='384' y='20' fill='currentColor' style='font-size:1em'>c</text>
<text text-anchor='middle' x='392' y='20' fill='currentColor' style='font-size:1em'>h</text>
<text text-anchor='middle' x='400' y='20' fill='currentColor' style='font-size:1em'>)</text>
</g>

    </svg>
  
</div>
<h3 id="온도-모니터링">온도 모니터링</h3>
<p>macOS의 <code>powermetrics</code>를 통해 Apple Silicon SoC의 실시간 온도를 읽는다. 이 시스템 호출은 팬 속도, 전력 소비, 서멀 스로틀링 상태도 함께 제공한다. 무팬 모델에서는 서멀 스로틀링이 100°C 근처에서 시작되므로, 95°C에서 학습을 중단하면 스로틀링에 도달하기 전에 안전하게 대응할 수 있다.</p>
<h3 id="배치-사이즈-동적-조절">배치 사이즈 동적 조절</h3>
<p><code>should_reduce_batch()</code>가 <code>True</code>를 반환하면 학습 루프는 배치 사이즈를 절반으로 줄인다. 배치 사이즈 감소는 GPU 연산량을 줄여 발열을 감소시킨다. 온도가 75°C 이하로 떨어지면 원래 배치 사이즈로 복원된다.</p>
<p>이 메커니즘은 학습 속도와 서멀 안전 사이의 자동 균형을 제공한다. 인간이 수동으로 배치 사이즈를 조절할 필요 없이, 시스템이 스스로 최적의 학습 속도를 유지한다.</p>
<h2 id="시퀀스-패킹과-학습-효율">시퀀스 패킹과 학습 효율</h2>
<p>GPU 활용률을 극대화하기 위해 <strong>시퀀스 패킹</strong>을 사용한다:</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 9
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">10
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">11
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">12
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">13
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">14
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">15
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">16
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">17
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">SequencePacker</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> __init__(self, max_seq_len: <span style="color:#8be9fd;font-style:italic">int</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">2048</span>, eos_token_id: <span style="color:#8be9fd;font-style:italic">int</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">151645</span>):
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>max_seq_len <span style="color:#ff79c6">=</span> max_seq_len
</span></span><span style="display:flex;"><span>        self<span style="color:#ff79c6">.</span>eos_token_id <span style="color:#ff79c6">=</span> eos_token_id
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">def</span> <span style="color:#50fa7b">add_example</span>(self, token_ids: <span style="color:#8be9fd;font-style:italic">list</span>[<span style="color:#8be9fd;font-style:italic">int</span>], loss_mask: <span style="color:#8be9fd;font-style:italic">list</span>[<span style="color:#8be9fd;font-style:italic">int</span>]):
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 버퍼가 비어있지 않으면 EOS 구분자 삽입</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">if</span> self<span style="color:#ff79c6">.</span>_buffer_ids:
</span></span><span style="display:flex;"><span>            self<span style="color:#ff79c6">.</span>_buffer_ids<span style="color:#ff79c6">.</span>append(self<span style="color:#ff79c6">.</span>eos_token_id)
</span></span><span style="display:flex;"><span>            self<span style="color:#ff79c6">.</span>_buffer_mask<span style="color:#ff79c6">.</span>append(<span style="color:#bd93f9">0</span>)  <span style="color:#6272a4"># 구분자에서는 손실 계산 안 함</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6272a4"># 버퍼가 가득 차면 배치 반환</span>
</span></span><span style="display:flex;"><span>        <span style="color:#ff79c6">while</span> <span style="color:#8be9fd;font-style:italic">len</span>(self<span style="color:#ff79c6">.</span>_buffer_ids) <span style="color:#ff79c6">&gt;=</span> self<span style="color:#ff79c6">.</span>max_seq_len:
</span></span><span style="display:flex;"><span>            <span style="color:#ff79c6">yield</span> PackedBatch(
</span></span><span style="display:flex;"><span>                token_ids<span style="color:#ff79c6">=</span>buffer[:self<span style="color:#ff79c6">.</span>max_seq_len],
</span></span><span style="display:flex;"><span>                loss_mask<span style="color:#ff79c6">=</span>mask[:self<span style="color:#ff79c6">.</span>max_seq_len]
</span></span><span style="display:flex;"><span>            )
</span></span></code></pre></td></tr></table>
</div>
</div><p>시퀀스 패킹은 여러 학습 예제를 하나의 시퀀스로 결합하여 GPU 메모리를 최대한 활용한다. 예를 들어 512 토큰짜리 예제 4개를 패딩 없이 2048 토큰 시퀀스 하나로 묶을 수 있다. EOS 토큰이 예제 사이의 구분자 역할을 하며, <code>loss_mask=0</code>으로 설정하여 구분자에서는 손실을 계산하지 않는다.</p>
<h2 id="orpo-학습-sft와-선호도-정렬을-동시에">ORPO 학습: SFT와 선호도 정렬을 동시에</h2>
<p>Bit-Axon은 **ORPO (Odds Ratio Preference Optimization)**를 지원한다. ORPO의 핵심 장점은 별도의 참조 모델이 필요 없다는 것이다 — SFT와 선호도 정렬을 단일 모델에서 동시에 수행한다.</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">7
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">orpo_loss</span>(chosen_logps, rejected_logps, beta<span style="color:#ff79c6">=</span><span style="color:#bd93f9">0.1</span>):
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 오즈비(odds ratio) 계산</span>
</span></span><span style="display:flex;"><span>    log_odds <span style="color:#ff79c6">=</span> (chosen_logps <span style="color:#ff79c6">-</span> rejected_logps) <span style="color:#ff79c6">-</span> \
</span></span><span style="display:flex;"><span>               (log1mexp(chosen_logps) <span style="color:#ff79c6">-</span> log1mexp(rejected_logps))
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 시그모이드 페널티</span>
</span></span><span style="display:flex;"><span>    loss <span style="color:#ff79c6">=</span> <span style="color:#ff79c6">-</span>mx<span style="color:#ff79c6">.</span>mean(nn<span style="color:#ff79c6">.</span>log_sigmoid(beta <span style="color:#ff79c6">*</span> log_odds))
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> loss
</span></span></code></pre></td></tr></table>
</div>
</div><p>ORPO의 총 손실은 두 가지로 구성된다:</p>
<ol>
<li><strong>NLL 손실</strong>: 선택된(chosen) 시퀀스에서의 교차 엔트로피 손실 (일반적인 SFT)</li>
<li><strong>오즈비 페널티</strong>: 선택된 시퀀스와 거부된(rejected) 시퀀스의 로그 확률 차이에 페널티</li>
</ol>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 8
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f"> 9
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">10
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">11
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">12
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">13
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">14
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">15
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">16
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">compute_orpo_loss</span>(model, chosen_ids, chosen_labels,
</span></span><span style="display:flex;"><span>                      rejected_ids, rejected_labels, beta<span style="color:#ff79c6">=</span><span style="color:#bd93f9">0.1</span>):
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 순방향 패스 (2회 — 참조 모델 불필요)</span>
</span></span><span style="display:flex;"><span>    logits_chosen <span style="color:#ff79c6">=</span> model(chosen_ids)
</span></span><span style="display:flex;"><span>    logits_rejected <span style="color:#ff79c6">=</span> model(rejected_ids)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 선택된 시퀀스에서 NLL 손실</span>
</span></span><span style="display:flex;"><span>    nll_loss <span style="color:#ff79c6">=</span> cross_entropy_loss(logits_chosen, chosen_labels)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 선호도 비교</span>
</span></span><span style="display:flex;"><span>    chosen_logps <span style="color:#ff79c6">=</span> get_logps(logits_chosen, chosen_labels)
</span></span><span style="display:flex;"><span>    rejected_logps <span style="color:#ff79c6">=</span> get_logps(logits_rejected, rejected_labels)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 결합 목적 함수</span>
</span></span><span style="display:flex;"><span>    orpo_penalty <span style="color:#ff79c6">=</span> orpo_loss(chosen_logps, rejected_logps, beta)
</span></span><span style="display:flex;"><span>    total_loss <span style="color:#ff79c6">=</span> nll_loss <span style="color:#ff79c6">+</span> orpo_penalty
</span></span></code></pre></td></tr></table>
</div>
</div><h3 id="수치적-안정성">수치적 안정성</h3>
<p><code>log1mexp</code> 함수는 <code>log(1 - exp(x))</code>의 수치적으로 안정적인 계산을 제공한다:</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">8
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> <span style="color:#50fa7b">log1mexp</span>(x: mx<span style="color:#ff79c6">.</span>array) <span style="color:#ff79c6">-&gt;</span> mx<span style="color:#ff79c6">.</span>array:
</span></span><span style="display:flex;"><span>    threshold <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>array(<span style="color:#ff79c6">-</span>_LN2)  <span style="color:#6272a4"># -ln(2)</span>
</span></span><span style="display:flex;"><span>    use_branch1 <span style="color:#ff79c6">=</span> x <span style="color:#ff79c6">&lt;</span> threshold
</span></span><span style="display:flex;"><span>    x_branch1 <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>where(use_branch1, x, mx<span style="color:#ff79c6">.</span>zeros_like(x))
</span></span><span style="display:flex;"><span>    x_branch2 <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>where(<span style="color:#ff79c6">~</span>use_branch1, x, mx<span style="color:#ff79c6">.</span>zeros_like(x))
</span></span><span style="display:flex;"><span>    branch1 <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>log(<span style="color:#ff79c6">-</span>mx<span style="color:#ff79c6">.</span>expm1(x_branch1))         <span style="color:#6272a4"># x &lt; -ln(2)인 경우</span>
</span></span><span style="display:flex;"><span>    branch2 <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>log1p(<span style="color:#ff79c6">-</span>mx<span style="color:#ff79c6">.</span>exp(x_branch2))          <span style="color:#6272a4"># x &gt;= -ln(2)인 경우</span>
</span></span><span style="display:flex;"><span>    <span style="color:#ff79c6">return</span> mx<span style="color:#ff79c6">.</span>where(use_branch1, branch1, branch2)
</span></span></code></pre></td></tr></table>
</div>
</div><p>x가 0에 가까워지면 <code>1 - exp(x)</code>가 소수점 아래로 수렴하여 부동소수점 정밀도가 손실된다. 두 가지 분기로 이 문제를 회피한다.</p>
<h3 id="qlora와-dora">QLoRA와 DoRA</h3>
<p>학습은 <strong>QLoRA</strong> (Quantized Low-Rank Adaptation)로 수행한다: 4-bit로 양자화된 기본 가중치를 고정하고, 저랭크 어댑터만 학습한다.</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">8
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>@dataclass
</span></span><span style="display:flex;"><span><span style="color:#ff79c6">class</span> <span style="color:#50fa7b">TrainingConfig</span>:
</span></span><span style="display:flex;"><span>    quantize_bits: <span style="color:#8be9fd;font-style:italic">int</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">4</span>
</span></span><span style="display:flex;"><span>    quantize_group_size: <span style="color:#8be9fd;font-style:italic">int</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">64</span>
</span></span><span style="display:flex;"><span>    lora_rank: <span style="color:#8be9fd;font-style:italic">int</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">8</span>
</span></span><span style="display:flex;"><span>    lora_dropout: <span style="color:#8be9fd;font-style:italic">float</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">0.0</span>
</span></span><span style="display:flex;"><span>    lora_scale: <span style="color:#8be9fd;font-style:italic">float</span> <span style="color:#ff79c6">=</span> <span style="color:#bd93f9">20.0</span>
</span></span><span style="display:flex;"><span>    use_dora: <span style="color:#8be9fd;font-style:italic">bool</span> <span style="color:#ff79c6">=</span> <span style="color:#ff79c6">True</span>  <span style="color:#6272a4"># Weight-Decomposed LoRA</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>**DoRA (Weight-Decomposed Low-Rank Adaptation)**는 LoRA의 변형으로, 가중치를 크기(magnitude)와 방향(direction)으로 분해한다:</p>
<div class="highlight"><div style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<table style="border-spacing:0;padding:0;margin:0;border:0;"><tr><td style="vertical-align:top;padding:0;margin:0;border:0;">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">1
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">2
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">3
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">4
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">5
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">6
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">7
</span><span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#7f7f7f">8
</span></code></pre></td>
<td style="vertical-align:top;padding:0;margin:0;border:0;;width:100%">
<pre tabindex="0" style="color:#f8f8f2;background-color:#282a36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff79c6">def</span> __call__(self, x):
</span></span><span style="display:flex;"><span>    y <span style="color:#ff79c6">=</span> self<span style="color:#ff79c6">.</span>linear(x)
</span></span><span style="display:flex;"><span>    z <span style="color:#ff79c6">=</span> (self<span style="color:#ff79c6">.</span>dropout(x) <span style="color:#ff79c6">@</span> self<span style="color:#ff79c6">.</span>lora_a) <span style="color:#ff79c6">@</span> self<span style="color:#ff79c6">.</span>lora_b
</span></span><span style="display:flex;"><span>    out <span style="color:#ff79c6">=</span> y <span style="color:#ff79c6">+</span> (self<span style="color:#ff79c6">.</span>scale <span style="color:#ff79c6">*</span> z)<span style="color:#ff79c6">.</span>astype(x<span style="color:#ff79c6">.</span>dtype)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6272a4"># 원래 크기 보존 (DoRA 핵심)</span>
</span></span><span style="display:flex;"><span>    denom <span style="color:#ff79c6">=</span> mx<span style="color:#ff79c6">.</span>sqrt(self<span style="color:#ff79c6">.</span>_dora_w_sq_norm <span style="color:#ff79c6">+</span> cross <span style="color:#ff79c6">+</span> d_sq)
</span></span><span style="display:flex;"><span>    out <span style="color:#ff79c6">=</span> (self<span style="color:#ff79c6">.</span>m <span style="color:#ff79c6">/</span> denom)<span style="color:#ff79c6">.</span>astype(x<span style="color:#ff79c6">.</span>dtype) <span style="color:#ff79c6">*</span> out
</span></span></code></pre></td></tr></table>
</div>
</div><p>DoRA가 일반 LoRA보다 나은 이유는 <strong>학습 중 가중치의 크기 변동을 방지</strong>하기 때문이다. 일반 LoRA는 어댑터가 가중치에 더해지면서 원래 가중치의 크기가 변할 수 있는데, DoRA는 명시적으로 크기를 정규화하여 학습 안정성을 높인다.</p>
<h2 id="모델-구성-요약">모델 구성 요약</h2>
<table>
  <thead>
      <tr>
          <th>파라미터</th>
          <th>값</th>
          <th>설명</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>총 파라미터</td>
          <td>3.2B</td>
          <td>MoE 포함 전체 파라미터</td>
      </tr>
      <tr>
          <td>활성화 파라미터</td>
          <td>~1.4B</td>
          <td>Top-2 라우팅 시</td>
      </tr>
      <tr>
          <td>vocab_size</td>
          <td>32,000</td>
          <td>BPE 어휘 크기</td>
      </tr>
      <tr>
          <td>hidden_dim</td>
          <td>2,560</td>
          <td>모델 은닉 차원</td>
      </tr>
      <tr>
          <td>num_layers</td>
          <td>24</td>
          <td>3 구간 × 8 레이어</td>
      </tr>
      <tr>
          <td>num_heads</td>
          <td>32</td>
          <td>헤드 수 (head_dim=80)</td>
      </tr>
      <tr>
          <td>ssm_d_state</td>
          <td>16</td>
          <td>SSM 상태 벡터 차원</td>
      </tr>
      <tr>
          <td>ssm_d_conv</td>
          <td>4</td>
          <td>SSM 1D 컨볼루션 커널</td>
      </tr>
      <tr>
          <td>ssm_scan_step</td>
          <td>64</td>
          <td>병렬 스캔 청크 사이즈</td>
      </tr>
      <tr>
          <td>swa_window_size</td>
          <td>4,096</td>
          <td>슬라이딩 윈도우 크기</td>
      </tr>
      <tr>
          <td>moe_num_experts</td>
          <td>8</td>
          <td>전문가 수</td>
      </tr>
      <tr>
          <td>moe_top_k</td>
          <td>2</td>
          <td>활성화 전문가 수</td>
      </tr>
      <tr>
          <td>moe_shared_expert</td>
          <td>true</td>
          <td>공유 전문가 사용</td>
      </tr>
      <tr>
          <td>max_seq_len</td>
          <td>65,536</td>
          <td>최대 시퀀스 길이</td>
      </tr>
      <tr>
          <td>양자화</td>
          <td>4-bit NF4</td>
          <td>그룹 사이즈 64</td>
      </tr>
  </tbody>
</table>
<h2 id="핵심-인사이트">핵심 인사이트</h2>
<h3 id="1-아키텍처로-하드웨어-제약을-해결하라">1. 아키텍처로 하드웨어 제약을 해결하라</h3>
<p>무팬 노트북의 서멀 한계는 소프트웨어 튜닝으로 해결할 수 없다. SSM의 선형 복잡도가 연산량을 줄이고, MoE의 희소 활성화가 메모리 대역폭을 절약하며, 서멀 스케줄러가 학습 속도를 동적으로 조절한다. 이 세 가지가 결합되어야 무팬 MacBook에서 지속적인 학습이 가능하다.</p>
<h3 id="2-프레임워크-선택이-하드웨어와-맞아야-한다">2. 프레임워크 선택이 하드웨어와 맞아야 한다</h3>
<p>MLX의 제로카피 통합 메모리는 16GB MacBook에서 모델 구동을 가능하게 하는 결정적 요인이다. PyTorch의 GPU-CPU 메모리 복사는 동일한 하드웨어에서 2배의 메모리를 요구한다. 하드웨어에 맞는 프레임워크를 선택하는 것이 최적화의 첫 번째 단계다.</p>
<h3 id="3-각-레이어-구간에-최소-복잡도를-할당하라">3. 각 레이어 구간에 최소 복잡도를 할당하라</h3>
<p>문맥 흡수엔 SSM (O(n)), 추론엔 SWA (O(n × w)), 출력엔 SSM+MoE (선형+희소). 어텐션은 16개 레이어 중 8개에만 존재한다. 각 구간에 필요한 최소한의 연산만 할당하여 전체 복잡도를 관리한다. 이것이 &ldquo;모든 레이어에 어텐션을 넣는 것&quot;보다 효율적인 이유다.</p>
<h3 id="4-참조-모델-없는-정렬이-엣지-디바이스의-필수다">4. 참조 모델 없는 정렬이 엣지 디바이스의 필수다</h3>
<p>ORPO는 참조 모델이 필요 없으므로, 16GB 메모리에서 선호도 정렬이 가능하다. PPO나 DPO는 참조 모델을 메모리에 올려야 하므로, MacBook에서는 메모리 부족으로 실행이 불가능하다. 엣지 디바이스의 제약은 알고리즘 선택에 직접적인 영향을 미친다.</p>
<h2 id="마치며">마치며</h2>
<p>Bit-Axon은 엣지 디바이스에서 LLM을 구동하기 위한 하나의 실험이다. 삼층 샌드위치 아키텍처가 하드웨어 제약에 맞는 연산을 할당하고, MLX가 통합 메모리를 최대한 활용하며, 서멀 인식 학습이 물리적 한계 내에서 지속 가능한 학습을 가능하게 한다.</p>
<p>이 세 가지가 결합하면 무팬 MacBook에서도 3.2B 모델을 실용적으로 구동할 수 있다. 16GB 통합 메모리, 4-bit 양자화, Apple Silicon의 효율적인 GPU — 이 하드웨어 조합이 소비자 기기에서 LLM을 구동하는 새로운 가능성을 열고 있다.</p>
<p>전체 소스 코드는 <a href="https://github.com/skyoo2003/bit-axon">github.com/skyoo2003/bit-axon</a>에서, 모델은 <a href="https://huggingface.co/skyoo2003/bit-axon">HuggingFace</a>에서 확인할 수 있다.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
