<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>5L Labs Blog</title>
        <link>https://www.5l-labs.com/frontier-research</link>
        <description>5L Labs Blog</description>
        <lastBuildDate>Mon, 23 Feb 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[Learning about learning?]]></title>
            <link>https://www.5l-labs.com/frontier-research/learning-again</link>
            <guid>https://www.5l-labs.com/frontier-research/learning-again</guid>
            <pubDate>Mon, 23 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Exploring the limits of Transformer architectures, the Platonic Representation Hypothesis, and the role of curiosity-driven learning in the next generation of AI.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="continuing-beyond-llmstransformers">[Continuing] Beyond LLMs/Transformers<a href="https://www.5l-labs.com/frontier-research/learning-again#continuing-beyond-llmstransformers" class="hash-link" aria-label="Direct link to [Continuing] Beyond LLMs/Transformers" title="Direct link to [Continuing] Beyond LLMs/Transformers" translate="no">​</a></h2>
<p>The power-waste, inefficiencies, and general limits of throwing large corpora of labeled data into a blender to create probability distributions of next-token are becoming apparent to the broader world.</p>
<p>Useful to create once, and part of the solution, but not the answer.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="connected-but-disconnected-thoughts">Connected, but disconnected thoughts:<a href="https://www.5l-labs.com/frontier-research/learning-again#connected-but-disconnected-thoughts" class="hash-link" aria-label="Direct link to Connected, but disconnected thoughts:" title="Direct link to Connected, but disconnected thoughts:" translate="no">​</a></h3>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="its-the-data-stupid">It's the data, stupid!<a href="https://www.5l-labs.com/frontier-research/learning-again#its-the-data-stupid" class="hash-link" aria-label="Direct link to It's the data, stupid!" title="Direct link to It's the data, stupid!" translate="no">​</a></h2>
<p>Multiple papers are coming to interesting observations:</p>
<ul>
<li class=""><a href="https://arxiv.org/abs/2405.07987" target="_blank" rel="noopener noreferrer" class="">The Platonic Representation Hypothesis</a></li>
<li class=""><a href="https://arxiv.org/abs/2505.12540" target="_blank" rel="noopener noreferrer" class="">Harnessing the Universal Geometry of Embeddings</a></li>
<li class=""><a href="https://arxiv.org/abs/2512.03750" target="_blank" rel="noopener noreferrer" class="">Universally Converging Representations of Matter Across Scientific Foundation Models</a></li>
</ul>
<p>Which I've taken away as: if the same underlying data is taken, regardless of which blender, the models are going to encode/decode into similar spaces. This has fun implications:</p>
<ul>
<li class="">Less high-quality and detailed open-source data can generalize to the same embedding concepts as proprietary labeled data.</li>
<li class="">We can (and should) be able to move between same-dim embedding spaces as they should evolve without much loss of the original text.</li>
<li class="">We can use <a href="https://arxiv.org/abs/2205.13147" target="_blank" rel="noopener noreferrer" class="">Matryoshka Representation Learning</a> to focus on "what's shared" between embedding spaces.</li>
</ul>
<p>See:</p>
<ol>
<li class=""><a href="https://huggingface.co/papers/2409.17146" target="_blank" rel="noopener noreferrer" class="">Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models</a></li>
<li class=""><a href="https://youtu.be/mQOK0Mfyrkk?si=-HcyNzyOl67fTLkS&amp;t=2918" target="_blank" rel="noopener noreferrer" class="">Stanford CS231N Deep Learning for Computer Vision | Spring 2025 | Lecture 16: Vision and Language</a></li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="out-of-distribution">Out-of-Distribution<a href="https://www.5l-labs.com/frontier-research/learning-again#out-of-distribution" class="hash-link" aria-label="Direct link to Out-of-Distribution" title="Direct link to Out-of-Distribution" translate="no">​</a></h2>
<p>"Slop" is the natural consequence of leveraging output probabilities fitting a PDF/PD curve. More parameters can, of course, make the field/curve more varied but in the end, it's probability and stats.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="its-a-blessing">It's a blessing<a href="https://www.5l-labs.com/frontier-research/learning-again#its-a-blessing" class="hash-link" aria-label="Direct link to It's a blessing" title="Direct link to It's a blessing" translate="no">​</a></h3>
<p>Just as most early middle-school kids want to "fit in," we can feel confident going into areas that we are not familiar with and landing towards the median of the curve.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="its-a-curse">It's a curse<a href="https://www.5l-labs.com/frontier-research/learning-again#its-a-curse" class="hash-link" aria-label="Direct link to It's a curse" title="Direct link to It's a curse" translate="no">​</a></h3>
<p>Outside the median's boilerplate, the edges of language are where real ideas live. <strong>How can we spend our energies at the edge?</strong></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="graphql-is-not-the-answer">GraphQL is not the answer.<a href="https://www.5l-labs.com/frontier-research/learning-again#graphql-is-not-the-answer" class="hash-link" aria-label="Direct link to GraphQL is not the answer." title="Direct link to GraphQL is not the answer." translate="no">​</a></h3>
<p>While I do love talking to the technologists at GraphQL / SurrealDB, they are not the answer as they require manual relationship mapping. We know from papers like <a href="https://arxiv.org/abs/2507.18546" target="_blank" rel="noopener noreferrer" class="">GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface</a>, that we should be able to use "stand-ins" for unique concepts with lower effort at scale. How does that change embedding space distributions and other models?</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="on-success-outside-the-straight-line">On success outside the straight line<a href="https://www.5l-labs.com/frontier-research/learning-again#on-success-outside-the-straight-line" class="hash-link" aria-label="Direct link to On success outside the straight line" title="Direct link to On success outside the straight line" translate="no">​</a></h2>
<p>One of the more interesting books that I've come across came from the folks at <a href="https://sakana.ai/" target="_blank" rel="noopener noreferrer" class="">Sakana.ai</a>, entitled <a href="https://www.goodreads.com/book/show/25670869-why-greatness-cannot-be-planned" target="_blank" rel="noopener noreferrer" class="">"Why Greatness Cannot Be Planned: The Myth of the Objective"</a>. Similar to my academic paper backlog, it's slowly being iterated through but is indeed quite interesting, especially when you think about machine learning beyond the transformer architecture.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="on-curiosity">On Curiosity<a href="https://www.5l-labs.com/frontier-research/learning-again#on-curiosity" class="hash-link" aria-label="Direct link to On Curiosity" title="Direct link to On Curiosity" translate="no">​</a></h2>
<p>Along similarly delightful lines was coming across this video on my YouTube backlog from <a href="https://www.youtube.com/watch?v=N2nIie7K7nU" target="_blank" rel="noopener noreferrer" class="">Pierre-Yves Oudyer on Curiosity Driven Learning</a>.</p>
<p>Fun new word:</p>
<ul>
<li class=""><strong>Autotelic</strong>: From the Greek <em>autos</em> (self) and <em>telos</em> (goal). In the context of curiosity-driven learning, an autotelic agent is one that sets its own goals and finds intrinsic reward in the process of learning itself, rather than just optimizing for an external objective.</li>
</ul>
<p>Explore more at the <a href="https://flowers.inria.fr/" target="_blank" rel="noopener noreferrer" class="">Flowers Inria</a> project.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-am-i-going-to-prove-it">How am I going to prove it?<a href="https://www.5l-labs.com/frontier-research/learning-again#how-am-i-going-to-prove-it" class="hash-link" aria-label="Direct link to How am I going to prove it?" title="Direct link to How am I going to prove it?" translate="no">​</a></h2>
<p>By building an "Autotelic Agent"—one that doesn't just respond to prompts but actively explores its environment (via the GarageCam/HomeKit mesh) to build its own internal model of reality. This requires moving beyond the "next-token prediction" blender and into true structured, curiosity-driven exploration.</p>]]></content:encoded>
            <category>blog</category>
            <category>ml</category>
            <category>embedding_models</category>
            <category>learning</category>
            <category>sakana-ai</category>
            <category>curiosity</category>
        </item>
        <item>
            <title><![CDATA[Time Decay of Information]]></title>
            <link>https://www.5l-labs.com/frontier-research/information-aging-out</link>
            <guid>https://www.5l-labs.com/frontier-research/information-aging-out</guid>
            <pubDate>Wed, 10 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Exploring the "Time Decay" of information in AI models—how embeddings and weights handle the aging of explicit, implicit, and undated knowledge.]]></description>
            <content:encoded><![CDATA[<p>The below is a snapshot in time of my evolving thought process on how to deal with information aging out in learning models. I may periodically refresh that thinking in place or extend to a new post.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="textual-information">(Textual) Information<a href="https://www.5l-labs.com/frontier-research/information-aging-out#textual-information" class="hash-link" aria-label="Direct link to (Textual) Information" title="Direct link to (Textual) Information" translate="no">​</a></h2>
<p>Information aging out for learning (human and machine) requires thinking about the problem space from at least two different angles:
1.) Embeddings
2.) Weights / Models</p>
<p>Of which there are three scenarios:
1.) Explicitly dated information
2.) Implicity dated information
3.) Undated information</p>
<p>Of sources across a different set of vectors:
a.) Mostly Trusted
b.) Untrusted</p>
<p>from multiple sources including, but not limited to:</p>
<ol>
<li class="">Books (our oldest form of information) - Permanent form of information</li>
<li class="">Articles (news, blogs, journals) - Semipermanent form of information</li>
<li class="">Social Media (the most ephemeral form of information)</li>
</ol>
<p>In addition, we need to consider whether the model or human is aiming for general or deep knowledge of the topic at hand.</p>
<p>Deep knowledge may have less stickiness over time, while general knowledge may be more resilient to time decay.</p>
<p>Finally, we need to look at how that knowledge could decay over time. Could an initially 2048-dim embedding decay into a 128-dim embedding? This "semantic evaporation" could be a mechanism for long-term memory management, where detailed nuances are pruned while the core concept (the "centroid") remains.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="multimodal-information">Multimodal Information?<a href="https://www.5l-labs.com/frontier-research/information-aging-out#multimodal-information" class="hash-link" aria-label="Direct link to Multimodal Information?" title="Direct link to Multimodal Information?" translate="no">​</a></h2>
<p>A lot of the public work that I've read has gone into single-mode learning study, whereas humans do not learn (well) from text-only. Even if one is "book smart," the structures that we use to retain and process information really on mapping to other concepts.</p>
<p>Textual Models like <a href="https://github.com/urchade/GLiNER" target="_blank" rel="noopener noreferrer" class="">GLiNER</a> (Generalist Named Entity Recognition) offer hints as to how that linkage might be established. Lead author <a href="https://urchade.github.io/" target="_blank" rel="noopener noreferrer" class="">Urchade Zaratiana</a> and the team are pushing this into <strong>GLiNER2</strong>, which aims for unified, schema-driven information extraction across multiple tasks like NER, classification, and structured data extraction.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="future-work-proving-out-the-ideas">Future Work: Proving out the ideas<a href="https://www.5l-labs.com/frontier-research/information-aging-out#future-work-proving-out-the-ideas" class="hash-link" aria-label="Direct link to Future Work: Proving out the ideas" title="Direct link to Future Work: Proving out the ideas" translate="no">​</a></h2>
<p>To prove out these ideas, we're looking at "Temporal Embeddings"—vector spaces that incorporate a time-decay function directly into the similarity calculation. This prioritizes recent information without entirely discarding historical context, much like how human memory functions. We also need to explore GLiNER's viability in zero-shot aspect-based sentiment analysis as a way to track changing sentiments over time.</p>]]></content:encoded>
            <category>blog</category>
            <category>frontier</category>
            <category>embedding_models</category>
            <category>time</category>
        </item>
    </channel>
</rss>