<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://www.5l-labs.com/applied-ai-engineering</id>
    <title>5L Labs Blog</title>
    <updated>2025-04-19T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://www.5l-labs.com/applied-ai-engineering"/>
    <subtitle>5L Labs Blog</subtitle>
    <icon>https://www.5l-labs.com/img/favicon.svg</icon>
    <entry>
        <title type="html"><![CDATA[Private Agents - Pim Particles (Embeddings)]]></title>
        <id>https://www.5l-labs.com/applied-ai-engineering/embeddings-pim-particles</id>
        <link href="https://www.5l-labs.com/applied-ai-engineering/embeddings-pim-particles"/>
        <updated>2025-04-19T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Exploring the role of embeddings (Pim Particles) in private agent architectures and how federated learning fits into the local-first AI model.]]></summary>
        <content type="html"><![CDATA[<p>The monetization strategy for private AI is evolving. Do we focus on hosting secure re-training servers, or is the value in providing 'Private Models' as a service? Alternatively, perhaps a data-privacy-first exchange allows for anonymized datasets to be contributed back to the collective model in exchange for reduced costs.</p>
<p>How do frameworks like <a href="https://flower.ai/" target="_blank" rel="noopener noreferrer" class="">Flower</a> fit into this? Flower allows for easy federated learning, enabling local agents to contribute to a larger model without sharing raw, sensitive data. This fits perfectly with the vision of private agency, where personal data stays local while the global model improves.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="pim-particles-the-dimensions-of-semantic-meaning">Pim Particles: The Dimensions of Semantic Meaning<a href="https://www.5l-labs.com/applied-ai-engineering/embeddings-pim-particles#pim-particles-the-dimensions-of-semantic-meaning" class="hash-link" aria-label="Direct link to Pim Particles: The Dimensions of Semantic Meaning" title="Direct link to Pim Particles: The Dimensions of Semantic Meaning" translate="no">​</a></h3>
<p>The "Pim Particles" metaphor describes how embeddings compress high-dimensional semantic space into a manageable vector format. Just as Pim Particles allow objects to shrink and grow while maintaining their fundamental properties, embeddings map complex human language into a lower-dimensional space that AI models can efficiently process.</p>
<p>Different Dimensions - what does this mean? In embeddings, "dimensions" refer to the number of numerical features used to represent a piece of text. A 1536-dimensional vector (common for models like OpenAI's <code>text-embedding-3-small</code>) captures 1536 different semantic "facets" of the data, allowing for highly nuanced similarity searches.</p>]]></content>
        <author>
            <name>Nick Lange</name>
            <uri>https://github.com/NickJLange</uri>
        </author>
        <category label="blog" term="blog"/>
        <category label="mcp" term="mcp"/>
        <category label="python" term="python"/>
        <category label="gemini" term="gemini"/>
        <category label="claude" term="claude"/>
        <category label="warp" term="warp"/>
        <category label="embeddings" term="embeddings"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[NVIDIA GTC Recap]]></title>
        <id>https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap</id>
        <link href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap"/>
        <updated>2025-03-25T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[A recap of NVIDIA GTC 2025 focusing on private agency, home-based ML, and the technical ingredients for secure, local AI systems.]]></summary>
        <content type="html"><![CDATA[<p>Lots of new information to assimilate, this first-of-many post focuses on private agency related thoughts including putting the puzzle together for Private Agency for My House.</p>
<p>Soumith Chintala (co-creator of PyTorch) provided insights into the evolution of local inference and distributed training, which are foundational for home-based ML.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="ingredients-to-the-bake-private-agency-in-the-home">Ingredients to the bake Private Agency in the home?<a href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap#ingredients-to-the-bake-private-agency-in-the-home" class="hash-link" aria-label="Direct link to Ingredients to the bake Private Agency in the home?" title="Direct link to Ingredients to the bake Private Agency in the home?" translate="no">​</a></h2>
<p>Ignorance is dangerous, so let's take a look at what are the known-knowns, known-unknowns, and unknown-unknowns.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="inference">Inference<a href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap#inference" class="hash-link" aria-label="Direct link to Inference" title="Direct link to Inference" translate="no">​</a></h3>
<p>Generalized Local-Language including Specialized Language Models (SLMs)</p>
<ul>
<li class="">Likely a mixture of experts (MoE) working together in my house—at its most extreme form, it's one-per tool (though this may be an over-optimization).</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="training">Training<a href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap#training" class="hash-link" aria-label="Direct link to Training" title="Direct link to Training" translate="no">​</a></h3>
<p>Federated Learning Participation for Local Retraining of Data</p>
<ul>
<li class="">Where is that data stored and in what format(s)? Parquet, Vector DB, or raw JSON?</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="verification">Verification<a href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap#verification" class="hash-link" aria-label="Direct link to Verification" title="Direct link to Verification" translate="no">​</a></h3>
<p>How do we validate data is not being leaked via imported models? Exploring Zero-Knowledge Proofs for model weights.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="data-tagging-by-sensitivity">Data Tagging by Sensitivity:<a href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap#data-tagging-by-sensitivity" class="hash-link" aria-label="Direct link to Data Tagging by Sensitivity:" title="Direct link to Data Tagging by Sensitivity:" translate="no">​</a></h3>
<p>Implementing metadata layers that classify data (e.g., Public, Internal, Confidential) to dictate which models can process it.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="interoperability">Interoperability<a href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap#interoperability" class="hash-link" aria-label="Direct link to Interoperability" title="Direct link to Interoperability" translate="no">​</a></h3>
<p>Embeddings are model specific, what does a transformation space between embeddings look like? Can an open standard be created (or base embedding features) to allow transformation between embeddings?</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="open-major-questions">Open major questions<a href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap#open-major-questions" class="hash-link" aria-label="Direct link to Open major questions" title="Direct link to Open major questions" translate="no">​</a></h2>
<ol>
<li class="">In addition to RL/SFT/LoRA/FP Quantization for inference, does anyone see a future without private retraining of models on a regular basis?</li>
</ol>
<ul>
<li class="">On Hardware:
<ul>
<li class="">What are the chances of data mixing in a RDMA GPU Mesh? What does a GPU Enclave look like?</li>
<li class="">Private Enclaves exist for Motherboard HBM, Disk and CPU, and <a href="https://developer.nvidia.com/blog/confidential-computing-on-h100-gpus-for-secure-and-trustworthy-ai/" target="_blank" rel="noopener noreferrer" class="">NVIDIA H100 Confidential Computing</a>.</li>
</ul>
</li>
</ul>
<ol>
<li class="">Can the performance impact of segregation at scale become cheap enough to offset the need for local high-end hardware?</li>
<li class="">Adding factual knowledge to a LLM (and suppressing old knowledge) at scale?</li>
<li class="">How can we move embeddings from one model to another model at scale?</li>
<li class="">If the Data Center (née AI Factory) is moving to 600kVA Racks, and 6 Mega-Watt hubs, what does the edge look like?</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="testing-private-agency">Testing Private Agency<a href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap#testing-private-agency" class="hash-link" aria-label="Direct link to Testing Private Agency" title="Direct link to Testing Private Agency" translate="no">​</a></h2>
<p>Testing Private Agency involves benchmarking local inference speed against cloud-based alternatives while verifying zero-leakage through network monitoring and traffic analysis.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="detailed-links-training">Detailed Links Training<a href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap#detailed-links-training" class="hash-link" aria-label="Direct link to Detailed Links Training" title="Direct link to Detailed Links Training" translate="no">​</a></h3>
<p>Hints:</p>
<ol>
<li class=""><a href="https://developer.nvidia.com/gpudirect" target="_blank" rel="noopener noreferrer" class="">https://developer.nvidia.com/gpudirect</a></li>
<li class=""><a href="https://github.com/facebookincubator/gloo" target="_blank" rel="noopener noreferrer" class="">https://github.com/facebookincubator/gloo</a></li>
<li class=""><a href="https://github.com/horovod/horovod" target="_blank" rel="noopener noreferrer" class="">https://github.com/horovod/horovod</a></li>
<li class=""><a href="https://security.apple.com/blog/private-cloud-compute/" target="_blank" rel="noopener noreferrer" class="">https://security.apple.com/blog/private-cloud-compute/</a></li>
<li class=""><a href="https://www.microsoft.com/en-us/research/blog/secure-training-of-machine-learning-models-on-azure/" target="_blank" rel="noopener noreferrer" class="">https://www.microsoft.com/en-us/research/blog/secure-training-of-machine-learning-models-on-azure/</a></li>
<li class=""><a href="https://developer.nvidia.com/blog/confidential-computing-on-h100-gpus-for-secure-and-trustworthy-ai/" target="_blank" rel="noopener noreferrer" class="">NVIDIA H100 Confidential Computing</a></li>
</ol>
<h1>Players in this space?</h1>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hosting--">Hosting -<a href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc-recap#hosting--" class="hash-link" aria-label="Direct link to Hosting -" title="Direct link to Hosting -" translate="no">​</a></h3>
<ol>
<li class=""><a href="https://www.atlantic.net/gpu-server-hosting/hipaa-gpu-hosting/" target="_blank" rel="noopener noreferrer" class="">https://www.atlantic.net/gpu-server-hosting/hipaa-gpu-hosting/</a></li>
</ol>]]></content>
        <author>
            <name>Nick Lange</name>
            <uri>https://github.com/NickJLange</uri>
        </author>
        <category label="blog" term="blog"/>
        <category label="nvidia" term="nvidia"/>
        <category label="ai" term="ai"/>
        <category label="gpu" term="gpu"/>
        <category label="deep-learning" term="deep-learning"/>
        <category label="machine-learning" term="machine-learning"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Off to Nvidia GTC]]></title>
        <id>https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc</id>
        <link href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc"/>
        <updated>2025-03-16T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Exploring the privacy landscape at NVIDIA GTC 2025, with a focus on Distributed Training and Private LLMs.]]></summary>
        <content type="html"><![CDATA[<p>Going to California for NVIDIA GTC 2025, what are the areas of focus for privacy?</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="focus-areas">Focus Areas<a href="https://www.5l-labs.com/applied-ai-engineering/nvdia-gtc#focus-areas" class="hash-link" aria-label="Direct link to Focus Areas" title="Direct link to Focus Areas" translate="no">​</a></h3>
<ul>
<li class=""><strong>Distributed Training</strong>: How can we train large models across multiple nodes while maintaining data isolation?</li>
<li class=""><strong>Private LLMs</strong>: Exploring the latest in quantization (FP8/FP4) and local inference to keep enterprise data on-premise.</li>
<li class=""><strong>NVIDIA Confidential Computing</strong>: Investigating H100/B200 support for hardware-level isolation.</li>
</ul>
<p>We'll be looking for sessions that bridge the gap between massive compute and strict data sovereignty. Stay tuned for the recap!</p>]]></content>
        <author>
            <name>Nick Lange</name>
            <uri>https://github.com/NickJLange</uri>
        </author>
        <category label="blog" term="blog"/>
        <category label="nvidia" term="nvidia"/>
        <category label="ai" term="ai"/>
        <category label="gpu" term="gpu"/>
        <category label="deep-learning" term="deep-learning"/>
        <category label="machine-learning" term="machine-learning"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[ai.engineer summit nyc]]></title>
        <id>https://www.5l-labs.com/applied-ai-engineering/ai-dot-engineer-summit-nyc</id>
        <link href="https://www.5l-labs.com/applied-ai-engineering/ai-dot-engineer-summit-nyc"/>
        <updated>2025-02-22T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Key takeaways from the AI Engineering Summit in NYC, focusing on private agents, federated systems, and the Model Context Protocol (MCP).]]></summary>
        <content type="html"><![CDATA[<p>The <a href="https://www.ai.engineer/" target="_blank" rel="noopener noreferrer" class="">AI Engineering Summit</a> was a definite eye-opener to the speed with which IT is transforming mundane "busy" work and lowering the startup cost for exploring new ideas.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="take-aways-for-a-private-agent">Take aways for a private agent:<a href="https://www.5l-labs.com/applied-ai-engineering/ai-dot-engineer-summit-nyc#take-aways-for-a-private-agent" class="hash-link" aria-label="Direct link to Take aways for a private agent:" title="Direct link to Take aways for a private agent:" translate="no">​</a></h3>
<ul>
<li class="">Do Agents need to be local to be private?</li>
<li class="">Locally on a MacBook Pro (using frameworks like Ollama or llama.cpp)</li>
<li class="">Hosted in a secure enclave on AWS / Azure / GCP using <strong>Trusted Execution Environments (TEEs)</strong>—hardware-isolated areas of a processor that ensure data and code are protected from the host operating system or cloud provider during computation.</li>
<li class="">Or can some sort of formal proof be done to leave a multi-tenant agent in the cloud with data privacy? (e.g., exploring Zero-Knowledge Proofs or Fully Homomorphic Encryption)</li>
<li class="">For either of the above, how to manage the balance of cost?</li>
<li class="">How are we protecting state?</li>
<li class="">Where are we storing state?</li>
</ul>
<p>Base Research for a federated system of local agents:</p>
<ul>
<li class=""><a href="https://www.anthropic.com/news/model-context-protocol" target="_blank" rel="noopener noreferrer" class="">Model Context Protocol (MCP)</a></li>
<li class="">See AI Entourage</li>
<li class="">MCP provides a standardized way for agents to access local data sources and tools without exposing the entire system.</li>
<li class="">FP32-&gt;FP8 (FP1 for MoE) and LoRA to shrink the model size, improve inference speed, reduce latency, increase throughput, and reduce cost</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="network-architecture--details">Network Architecture &amp; Details<a href="https://www.5l-labs.com/applied-ai-engineering/ai-dot-engineer-summit-nyc#network-architecture--details" class="hash-link" aria-label="Direct link to Network Architecture &amp; Details" title="Direct link to Network Architecture &amp; Details" translate="no">​</a></h3>
<p>The network architecture relies on a hub-and-spoke model where a central orchestrator communicates with federated local agents via MCP. This ensures that sensitive data remains within the local "spoke" while still allowing the "hub" to coordinate complex tasks.</p>
<ul>
<li class="">Standardized API specifications for agent-to-agent communication.</li>
<li class="">Deployment guides for TEE-based secure enclaves on major cloud providers.</li>
</ul>]]></content>
        <author>
            <name>Nick Lange</name>
            <uri>https://github.com/NickJLange</uri>
        </author>
        <category label="blog" term="blog"/>
    </entry>
</feed>