<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Complete Skeptic]]></title><description><![CDATA[Sane takes in an insane world]]></description><link>https://www.completeskeptic.com</link><image><url>https://www.completeskeptic.com/img/substack.png</url><title>Complete Skeptic</title><link>https://www.completeskeptic.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 04 Jul 2026 18:31:13 GMT</lastBuildDate><atom:link href="https://www.completeskeptic.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Diogo]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[completeskeptic@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[completeskeptic@substack.com]]></itunes:email><itunes:name><![CDATA[Diogo]]></itunes:name></itunes:owner><itunes:author><![CDATA[Diogo]]></itunes:author><googleplay:owner><![CDATA[completeskeptic@substack.com]]></googleplay:owner><googleplay:email><![CDATA[completeskeptic@substack.com]]></googleplay:email><googleplay:author><![CDATA[Diogo]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Scaling Laws, Honestly]]></title><description><![CDATA[TL;DR: The original scaling laws were wrong due to a bug]]></description><link>https://www.completeskeptic.com/p/scaling-laws-honestly</link><guid isPermaLink="false">https://www.completeskeptic.com/p/scaling-laws-honestly</guid><dc:creator><![CDATA[Diogo]]></dc:creator><pubDate>Sat, 04 Jul 2026 05:22:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Fmw7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><span>Background</span></h3><p><span>Scaling laws were one of OpenAI&#8217;s most important results, both technically and philosophically (so much so that being </span><em><span>scaling-pilled</span></em><span> became a thing). They allow us to predict results for ever larger language model runs, and also allow for debugging models as we use exponentially more resources. All of this led to the era of LLMs we&#8217;re in today, but the craziest part was&#8230; the original Kaplan et al scaling laws were wrong.</span></p><p><span>Recently, Lilian Weng posted another awesome (and highly recommended) </span><a href="https://lilianweng.github.io/posts/2026-06-24-scaling-laws/"><span>blog post on scaling laws</span></a><span>. I was extra excited about the section &#8220;Reconciling Kaplan and Chinchilla&#8221;, the former being </span><a href="https://arxiv.org/abs/2001.08361"><span>OpenAI&#8217;s original scaling laws</span></a><span> and the latter being </span><a href="https://arxiv.org/abs/2203.15556"><span>DeepMind&#8217;s follow-up</span></a><span> with completely different scaling laws.</span></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.completeskeptic.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fmw7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fmw7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png 424w, https://substackcdn.com/image/fetch/$s_!Fmw7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png 848w, https://substackcdn.com/image/fetch/$s_!Fmw7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!Fmw7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fmw7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png" width="1456" height="930" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:930,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fmw7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png 424w, https://substackcdn.com/image/fetch/$s_!Fmw7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png 848w, https://substackcdn.com/image/fetch/$s_!Fmw7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!Fmw7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d826564-0046-440d-8c70-93b6eb88396f_1640x1048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><span>Figure 1 from Chinchilla. The black dotted line shows the original scaling laws, and the cyan star shows that significantly smaller models should be used.</span></p><p><span>Lilian&#8217;s article goes into the mainstream explanation of the difference between them from </span><a href="https://arxiv.org/abs/2406.12907"><span>follow-up research</span></a><span> (namely that it&#8217;s about how they counted the total number of parameters). That follow-up research unfortunately is inaccurate, though not due to any fault of the authors.</span></p><p><span>The reality of the difference between the original scaling laws and Chinchilla&#8217;s is that the former had a bug!</span></p><h3><span>The bug: 3 ingredients</span></h3><h4><span>Non-researcher summary</span></h4><ul><li><p><span>The 2 scaling laws (original and Chinchilla) give different &#8220;scaling recipes&#8221; for how to efficiently train large language models</span></p></li><li><p><span>The former was incorrect because they:</span></p><ul><li><p><span>Did not train on enough data (Step 1)</span></p></li><li><p><span>Gradually decreased the impact of data to make it look like more data wasn&#8217;t needed (Step 2)</span></p></li><li><p><span>Claimed that the gradual decrease was unimportant (Step 3)</span></p></li></ul></li><li><p><span>Thus, for a few years, people trained models that were much too large on too little data</span></p></li></ul><h4><span>Clue: Data scales with size.</span></h4><p><span>It&#8217;s easier to identify this when working backwards: both scaling laws predict that data should scale with model size. The handwavy explanation is that bigger models have more capacity to soak up that data. Thus the amount of data is a </span><strong><span>very important parameter.</span></strong></p><h4><span>Step 1: Use a fixed amount of data.</span></h4><p><span>The Chinchilla paper points out the root issue stating the original Kaplan et al paper authors &#8220;use a fixed number of training tokens and learning rate schedule for all models&#8221;. When every model is trained on the same fixed amount of data, the tiny model trained on ~130B tokens is getting way more training relative to its size than a giant model trained on the same ~130B tokens.</span></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LNqg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ab3c0bf-7742-4645-bdd2-5b8a136f562f_1394x734.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LNqg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ab3c0bf-7742-4645-bdd2-5b8a136f562f_1394x734.png 424w, https://substackcdn.com/image/fetch/$s_!LNqg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ab3c0bf-7742-4645-bdd2-5b8a136f562f_1394x734.png 848w, https://substackcdn.com/image/fetch/$s_!LNqg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ab3c0bf-7742-4645-bdd2-5b8a136f562f_1394x734.png 1272w, https://substackcdn.com/image/fetch/$s_!LNqg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ab3c0bf-7742-4645-bdd2-5b8a136f562f_1394x734.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LNqg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ab3c0bf-7742-4645-bdd2-5b8a136f562f_1394x734.png" width="1394" height="734" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ab3c0bf-7742-4645-bdd2-5b8a136f562f_1394x734.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:734,&quot;width&quot;:1394,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LNqg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ab3c0bf-7742-4645-bdd2-5b8a136f562f_1394x734.png 424w, https://substackcdn.com/image/fetch/$s_!LNqg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ab3c0bf-7742-4645-bdd2-5b8a136f562f_1394x734.png 848w, https://substackcdn.com/image/fetch/$s_!LNqg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ab3c0bf-7742-4645-bdd2-5b8a136f562f_1394x734.png 1272w, https://substackcdn.com/image/fetch/$s_!LNqg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ab3c0bf-7742-4645-bdd2-5b8a136f562f_1394x734.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><span>Relevant quote from Chinchilla&#8217;s related work section.</span></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wvqa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0469cd2f-40ad-45d4-8330-f20ec3196ed0_754x760.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wvqa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0469cd2f-40ad-45d4-8330-f20ec3196ed0_754x760.png 424w, https://substackcdn.com/image/fetch/$s_!Wvqa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0469cd2f-40ad-45d4-8330-f20ec3196ed0_754x760.png 848w, https://substackcdn.com/image/fetch/$s_!Wvqa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0469cd2f-40ad-45d4-8330-f20ec3196ed0_754x760.png 1272w, https://substackcdn.com/image/fetch/$s_!Wvqa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0469cd2f-40ad-45d4-8330-f20ec3196ed0_754x760.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wvqa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0469cd2f-40ad-45d4-8330-f20ec3196ed0_754x760.png" width="754" height="760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0469cd2f-40ad-45d4-8330-f20ec3196ed0_754x760.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:760,&quot;width&quot;:754,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wvqa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0469cd2f-40ad-45d4-8330-f20ec3196ed0_754x760.png 424w, https://substackcdn.com/image/fetch/$s_!Wvqa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0469cd2f-40ad-45d4-8330-f20ec3196ed0_754x760.png 848w, https://substackcdn.com/image/fetch/$s_!Wvqa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0469cd2f-40ad-45d4-8330-f20ec3196ed0_754x760.png 1272w, https://substackcdn.com/image/fetch/$s_!Wvqa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0469cd2f-40ad-45d4-8330-f20ec3196ed0_754x760.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><span>Figure 2 from Kaplan et al. showing all model sizes trained to the same ~130B tokens.</span></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AOKq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f8862a-e1b3-4539-a65d-cbbfe6796a84_1840x1474.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AOKq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f8862a-e1b3-4539-a65d-cbbfe6796a84_1840x1474.png 424w, https://substackcdn.com/image/fetch/$s_!AOKq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f8862a-e1b3-4539-a65d-cbbfe6796a84_1840x1474.png 848w, https://substackcdn.com/image/fetch/$s_!AOKq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f8862a-e1b3-4539-a65d-cbbfe6796a84_1840x1474.png 1272w, https://substackcdn.com/image/fetch/$s_!AOKq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f8862a-e1b3-4539-a65d-cbbfe6796a84_1840x1474.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AOKq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f8862a-e1b3-4539-a65d-cbbfe6796a84_1840x1474.png" width="1456" height="1166" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25f8862a-e1b3-4539-a65d-cbbfe6796a84_1840x1474.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1166,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AOKq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f8862a-e1b3-4539-a65d-cbbfe6796a84_1840x1474.png 424w, https://substackcdn.com/image/fetch/$s_!AOKq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f8862a-e1b3-4539-a65d-cbbfe6796a84_1840x1474.png 848w, https://substackcdn.com/image/fetch/$s_!AOKq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f8862a-e1b3-4539-a65d-cbbfe6796a84_1840x1474.png 1272w, https://substackcdn.com/image/fetch/$s_!AOKq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25f8862a-e1b3-4539-a65d-cbbfe6796a84_1840x1474.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><span>Figure 2 from Chinchilla with a pink arrow added to show roughly where the training curve would have been cut off if only trained to 130B tokens. It would have been obvious that training ended before reaching the scaling laws&#8217; pareto frontier.</span></p><p><span>Keeping the amount of data fixed would be sufficient to get incorrect scaling laws, but if that was the only mistake, the results would look obviously incorrect. Except if you also&#8230;</span></p><h4><span>Step 2: Use a cosine decayed learning rate schedule to zero.</span></h4><p><span>This learning rate schedule caused learning to slow as training approached the target number of tokens. Performance naturally plateaued, appearing as if training is saturated. We now know that large models </span><em><span>would have </span></em><span>kept improving with more data and a different learning rate schedule, but the learning rate schedule artificially constrained results, making it appear that more data would not help.</span></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!blN3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3e20d86-5052-4dcb-9dad-9f4a86af1308_1400x579.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!blN3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3e20d86-5052-4dcb-9dad-9f4a86af1308_1400x579.png 424w, https://substackcdn.com/image/fetch/$s_!blN3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3e20d86-5052-4dcb-9dad-9f4a86af1308_1400x579.png 848w, https://substackcdn.com/image/fetch/$s_!blN3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3e20d86-5052-4dcb-9dad-9f4a86af1308_1400x579.png 1272w, https://substackcdn.com/image/fetch/$s_!blN3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3e20d86-5052-4dcb-9dad-9f4a86af1308_1400x579.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!blN3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3e20d86-5052-4dcb-9dad-9f4a86af1308_1400x579.png" width="1400" height="579" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c3e20d86-5052-4dcb-9dad-9f4a86af1308_1400x579.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:579,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!blN3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3e20d86-5052-4dcb-9dad-9f4a86af1308_1400x579.png 424w, https://substackcdn.com/image/fetch/$s_!blN3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3e20d86-5052-4dcb-9dad-9f4a86af1308_1400x579.png 848w, https://substackcdn.com/image/fetch/$s_!blN3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3e20d86-5052-4dcb-9dad-9f4a86af1308_1400x579.png 1272w, https://substackcdn.com/image/fetch/$s_!blN3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3e20d86-5052-4dcb-9dad-9f4a86af1308_1400x579.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><span>Visualization of a cosine learning rate decay with a warmup (</span><a href="https://scorrea92.medium.com/cosine-learning-rate-decay-e8b50aa455b"><span>source</span></a><span>) - you can see a smooth decay to lr=0, where learning stops entirely</span></p><p><span>The fixed amount of data and the learning rate schedule lead to both incorrect and hard to debug scaling laws, and it becomes </span><em><span>even</span></em><span> harder to debug if you&#8230;</span></p><h4><span>Step 3: Claim that results were &#8220;largely independent of learning rate schedule&#8221;.</span></h4><p><span>Given a maximum number of tokens, their conclusion is entirely accurate, but doesn&#8217;t apply to the true infinite data limit that scaling laws aim to model.</span></p><p><span>Aside: I too </span><a href="https://arxiv.org/abs/2106.00958"><span>worked on LLM optimization</span></a><span> at OpenAI at the time and missed the bug as well. &#128517; The learning rate schedule seemed so obviously an important hyperparameter that it looked intentionally set.</span></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Nmj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477a7509-31cb-4ed7-995d-b2d57faf9b1c_1536x330.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Nmj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477a7509-31cb-4ed7-995d-b2d57faf9b1c_1536x330.png 424w, https://substackcdn.com/image/fetch/$s_!1Nmj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477a7509-31cb-4ed7-995d-b2d57faf9b1c_1536x330.png 848w, https://substackcdn.com/image/fetch/$s_!1Nmj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477a7509-31cb-4ed7-995d-b2d57faf9b1c_1536x330.png 1272w, https://substackcdn.com/image/fetch/$s_!1Nmj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477a7509-31cb-4ed7-995d-b2d57faf9b1c_1536x330.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Nmj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477a7509-31cb-4ed7-995d-b2d57faf9b1c_1536x330.png" width="1456" height="313" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/477a7509-31cb-4ed7-995d-b2d57faf9b1c_1536x330.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:313,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Nmj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477a7509-31cb-4ed7-995d-b2d57faf9b1c_1536x330.png 424w, https://substackcdn.com/image/fetch/$s_!1Nmj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477a7509-31cb-4ed7-995d-b2d57faf9b1c_1536x330.png 848w, https://substackcdn.com/image/fetch/$s_!1Nmj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477a7509-31cb-4ed7-995d-b2d57faf9b1c_1536x330.png 1272w, https://substackcdn.com/image/fetch/$s_!1Nmj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477a7509-31cb-4ed7-995d-b2d57faf9b1c_1536x330.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p style="text-align: center;"><span>Section 2.2 of Kaplan et al., describing how it was trained. Green box shows calculation for a constant number of tokens with model size. Red box shows the learning rate schedule.</span></p><h4><span>Result: Models were undertrained and too large.</span></h4><p><span>You can see how the difference of learning rate shows up: Chinchilla ended up with a model less than half the size of GPT-3, trained on over 4x more tokens. They could not have achieved this result if the learning rate decayed to 0 at just 300B tokens. &#128579;</span></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mQ3a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5c9dfa1-8845-48cc-a902-56420504e995_1638x662.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mQ3a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5c9dfa1-8845-48cc-a902-56420504e995_1638x662.png 424w, https://substackcdn.com/image/fetch/$s_!mQ3a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5c9dfa1-8845-48cc-a902-56420504e995_1638x662.png 848w, https://substackcdn.com/image/fetch/$s_!mQ3a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5c9dfa1-8845-48cc-a902-56420504e995_1638x662.png 1272w, https://substackcdn.com/image/fetch/$s_!mQ3a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5c9dfa1-8845-48cc-a902-56420504e995_1638x662.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mQ3a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5c9dfa1-8845-48cc-a902-56420504e995_1638x662.png" width="1456" height="588" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5c9dfa1-8845-48cc-a902-56420504e995_1638x662.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:588,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mQ3a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5c9dfa1-8845-48cc-a902-56420504e995_1638x662.png 424w, https://substackcdn.com/image/fetch/$s_!mQ3a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5c9dfa1-8845-48cc-a902-56420504e995_1638x662.png 848w, https://substackcdn.com/image/fetch/$s_!mQ3a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5c9dfa1-8845-48cc-a902-56420504e995_1638x662.png 1272w, https://substackcdn.com/image/fetch/$s_!mQ3a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5c9dfa1-8845-48cc-a902-56420504e995_1638x662.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><span>Table 1 from Chinchilla: showing how GPT-3 was both undertrained and oversized.</span></p><h3><span>Conclusion</span></h3><p><span>Eventually, the bug was discovered but not explicitly acknowledged (that I know of). By now, every big AI lab has long known this.</span></p><p><span>For future non-big-lab researchers: don&#8217;t waste your time on this question. Chinchilla&#8217;s scaling laws are the correct ones.</span></p><p><span>For whoever can amend the original scaling laws paper, it would be great to add a note that there was a bug.</span></p><p><em><span>Big thanks to Ke Deng, Sasha Sheng, Erik Gafni, David Dohan, and Sander Dieleman for helping me write/review this post.</span></em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.completeskeptic.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>