5L Labs Blog

Private Agents - Pim Particles (Embeddings)

2025-04-19T00:00:00.000Z

The monetization strategy for private AI is evolving. Do we focus on hosting secure re-training servers, or is the value in providing 'Private Models' as a service? Alternatively, perhaps a data-privacy-first exchange allows for anonymized datasets to be contributed back to the collective model in exchange for reduced costs.

How do frameworks like Flower fit into this? Flower allows for easy federated learning, enabling local agents to contribute to a larger model without sharing raw, sensitive data. This fits perfectly with the vision of private agency, where personal data stays local while the global model improves.

Pim Particles: The Dimensions of Semantic Meaning

The "Pim Particles" metaphor describes how embeddings compress high-dimensional semantic space into a manageable vector format. Just as Pim Particles allow objects to shrink and grow while maintaining their fundamental properties, embeddings map complex human language into a lower-dimensional space that AI models can efficiently process.

Different Dimensions - what does this mean? In embeddings, "dimensions" refer to the number of numerical features used to represent a piece of text. A 1536-dimensional vector (common for models like OpenAI's text-embedding-3-small) captures 1536 different semantic "facets" of the data, allowing for highly nuanced similarity searches.

NVIDIA GTC Recap

2025-03-25T00:00:00.000Z

Lots of new information to assimilate, this first-of-many post focuses on private agency related thoughts including putting the puzzle together for Private Agency for My House.

Soumith Chintala (co-creator of PyTorch) provided insights into the evolution of local inference and distributed training, which are foundational for home-based ML.

Ingredients to the bake Private Agency in the home?

Ignorance is dangerous, so let's take a look at what are the known-knowns, known-unknowns, and unknown-unknowns.

Inference

Generalized Local-Language including Specialized Language Models (SLMs)

Likely a mixture of experts (MoE) working together in my house—at its most extreme form, it's one-per tool (though this may be an over-optimization).

Training

Federated Learning Participation for Local Retraining of Data

Where is that data stored and in what format(s)? Parquet, Vector DB, or raw JSON?

Verification

How do we validate data is not being leaked via imported models? Exploring Zero-Knowledge Proofs for model weights.

Data Tagging by Sensitivity:

Implementing metadata layers that classify data (e.g., Public, Internal, Confidential) to dictate which models can process it.

Interoperability

Embeddings are model specific, what does a transformation space between embeddings look like? Can an open standard be created (or base embedding features) to allow transformation between embeddings?

Open major questions

In addition to RL/SFT/LoRA/FP Quantization for inference, does anyone see a future without private retraining of models on a regular basis?

On Hardware:
- What are the chances of data mixing in a RDMA GPU Mesh? What does a GPU Enclave look like?
- Private Enclaves exist for Motherboard HBM, Disk and CPU, and NVIDIA H100 Confidential Computing.

Can the performance impact of segregation at scale become cheap enough to offset the need for local high-end hardware?
Adding factual knowledge to a LLM (and suppressing old knowledge) at scale?
How can we move embeddings from one model to another model at scale?
If the Data Center (née AI Factory) is moving to 600kVA Racks, and 6 Mega-Watt hubs, what does the edge look like?

Testing Private Agency

Testing Private Agency involves benchmarking local inference speed against cloud-based alternatives while verifying zero-leakage through network monitoring and traffic analysis.

Detailed Links Training

Hints:

Players in this space?

Hosting -

https://www.atlantic.net/gpu-server-hosting/hipaa-gpu-hosting/

Off to Nvidia GTC

2025-03-16T00:00:00.000Z

Going to California for NVIDIA GTC 2025, what are the areas of focus for privacy?

Focus Areas

Distributed Training: How can we train large models across multiple nodes while maintaining data isolation?
Private LLMs: Exploring the latest in quantization (FP8/FP4) and local inference to keep enterprise data on-premise.
NVIDIA Confidential Computing: Investigating H100/B200 support for hardware-level isolation.

We'll be looking for sessions that bridge the gap between massive compute and strict data sovereignty. Stay tuned for the recap!

ai.engineer summit nyc

2025-02-22T00:00:00.000Z

The AI Engineering Summit was a definite eye-opener to the speed with which IT is transforming mundane "busy" work and lowering the startup cost for exploring new ideas.

Take aways for a private agent:

Do Agents need to be local to be private?
Locally on a MacBook Pro (using frameworks like Ollama or llama.cpp)
Hosted in a secure enclave on AWS / Azure / GCP using Trusted Execution Environments (TEEs)—hardware-isolated areas of a processor that ensure data and code are protected from the host operating system or cloud provider during computation.
Or can some sort of formal proof be done to leave a multi-tenant agent in the cloud with data privacy? (e.g., exploring Zero-Knowledge Proofs or Fully Homomorphic Encryption)
For either of the above, how to manage the balance of cost?
How are we protecting state?
Where are we storing state?

Base Research for a federated system of local agents:

Model Context Protocol (MCP)
See AI Entourage
MCP provides a standardized way for agents to access local data sources and tools without exposing the entire system.
FP32->FP8 (FP1 for MoE) and LoRA to shrink the model size, improve inference speed, reduce latency, increase throughput, and reduce cost

Network Architecture & Details

The network architecture relies on a hub-and-spoke model where a central orchestrator communicates with federated local agents via MCP. This ensures that sensitive data remains within the local "spoke" while still allowing the "hub" to coordinate complex tasks.

Standardized API specifications for agent-to-agent communication.
Deployment guides for TEE-based secure enclaves on major cloud providers.

5L Labs Blog

Private Agents - Pim Particles (Embeddings)

Pim Particles: The Dimensions of Semantic Meaning​

NVIDIA GTC Recap

Ingredients to the bake Private Agency in the home?​

Inference​

Training​

Verification​

Data Tagging by Sensitivity:​

Interoperability​

Open major questions​

Testing Private Agency​

Detailed Links Training​