incentivised internet-wide training
AI training today faces a fundamental access challenge. Big tech companies like OpenAI and Meta control training through massive data centers. These centers have thousands of GPUs. Building and running these setups costs billions of dollars. This creates a big barrier to entry. Meanwhile, talented developers worldwide lack access to these resources. They cannot train large language models (LLMs) on their own.
Templar solves this by combining computing resources from participants worldwide. Developers don't need billions to build a data center. They can contribute their individual GPUs to train models together. Every GPU contributes to the same model instead of duplicating work.
This approach makes large-scale AI training available to more people. Anyone who can contribute computing power can participate. The combined resources aim to match the power of traditional systems.
Templar recently completed training a 1.2 billion parameter model. Parameters are the settings that determine how the AI behaves. This involved over 20,000 training cycles. About 200 GPUs participated in the process. Building on this success, future plans target 70 billion parameters and beyond.
CENTRALIZED ASPECTS
Currently, model storage and some governance elements remain centralized. The roadmap involves transitioning to community governance and distributed storage solutions.
The training process runs on a precise schedule. Every 84 seconds, the subnet assigns each miner a specific data portion called a "page." This assignment happens randomly to ensure fairness. It prevents anyone from cheating by knowing their work in advance.
Each miner then performs real training work on their assigned data pages. Miners run the current AI model on their data to see how well it performs. They measure the model's mistakes (called "loss"). Based on these errors, miners calculate mathematical instructions called "pseudo-gradients". These tell the model exactly how to adjust itself to become better/smarter.
The goal is to process as much data as possible within the 84-second window. Miners who process more data are likely to produce better improvements. This creates competition between miners to optimize their training methods.
Once calculations are complete, miners upload these pseudo-gradients to shared cloud storage buckets. These are called "R2 buckets." Miners must share their storage read keys, making their gradients publicly available. This creates an open system where validators can access and verify everyone's work.
Each miner works on different data. Their contributions get combined to improve one shared AI model. Validators collect pseudo-gradients from multiple miners and select the top 15 highest-quality contributions. These get mathematically combined and applied to the global model. This creates an improved version that benefits from all the different training work. This updated model then gets shared with all participants for the next training round.
Validators play a key role in keeping quality high. They download the same data pages assigned to specific miners. Running these pages through their own copy of the AI model measures it's baseline.
Next, validators apply the miner's pseudo-gradients to their model. Testing it again with the same data shows whether the contribution helped. If the loss number decreases, the miner's contribution improved the AI. If it increases or stays the same, the contribution didn't help. This process allows validators to score miners, by comparing the loss reduction to their baseline.
The subnet rewards miners based on how much they improve the AI model. The biggest improvements earn the most rewards. However, miners must stay synchronized with the rest of the subnet. Anyone who falls behind sees their rewards drop quickly.
To keep quality high, only the top 15 highest-quality miners participate in each training round. This selective approach encourages miners to improve their performance. Competition to stay in this top group incentivizes miners to improve.
Top miners can earn up to 28 TAO per day according to current reports. This amount changes based on miner performance and token prices. Success requires major skill. It also requires powerful hardware. Miners need powerful H100 graphics cards with excellent internet connections.
DEFAULT MINER SUCCESS
The default miner code is not strong enough to stay registered. Miners using only the default code will get deregistered from the subnet. Success requires major customization, improvement, and understanding of training techniques.
The incentive mechanism includes protections against various forms of cheating. First, the subnet prevents "overfitting." This occurs when someone's improvements only work on their specific data, and fail with new information. Those who overfit will receive lower scores.
"Bucket Copying" represents another challenge. Lazy miners copy other miners' pseudo-gradients instead of doing their own work. Validators check for this behavior. They ensure miners worked on their assigned data pages. Heavy penalties await those caught copying.
Another big challenge is "free-riding." Some miners do good work initially to earn higher scores. Then they shut off their machines while coasting on their reputation. Ongoing performance monitoring addresses this issue. Miners must stay active and keep improving to maintain their rewards.
Real-world attacks have tested these defenses. On Christmas Day, someone discovered they could disrupt the entire subnet. They sent pseudo-gradients with very large math values. This attack kept developers awake for 48 hours fixing the problem. It led to stronger defenses against such attacks/manipulation.
AI training needs much stricter coordination than other systems. Bitcoin tolerates up to 49% of participants being dishonest. Other blockchains handle up to 33% bad actors. AI training works with only about 5-7.5% tolerance for mistakes.
This precision requirement means miners must maintain perfect performance throughout the entire training run. Submissions must arrive exactly on time. Even small delays result in big penalties. This makes Templar technically demanding. Experience across over 20,000 training cycles taught these lessons.
The completed training run shows promising results. According to team reports, Templar-1B performed competitively with AdamW. AdamW is the standard tool used for training LLMs. Training results show better performance in the early training phases. This was compared to the baseline.
Standard AI benchmark results show specific performance scores across different tasks. For reading comprehension (HellaSwag), Templar scored 51.0% compared to AdamW's 51.0%. Reasoning tasks (PIQA) showed a score of 71.4% compared to AdamW's 71.9%. The ARC-E reasoning benchmark resulted in 59.2% compared to AdamW's 58.9%.
Dataset | Tokens | HellaSwag | PIQA | ARC-E | |
---|---|---|---|---|---|
Templar-1B | FineWebEdu | 100B-200B | 51.0 | 71.4 | 59.2 |
DeMo 1B | Dolmo | 100B | 48.0 | 70.0 | 55.0 |
AdamW DDP 1B | FineWebEdu | 120B | 51.0 | 71.9 | 58.9 |
AI BENCHMARKS
HellaSwag tests reading comprehension and common sense reasoning. PIQA evaluates physical reasoning and practical problem-solving. ARC-E tests basic reasoning abilities with elementary-level questions. These are standard tests used to measure AI model performance across different thinking skills.
These results suggest Templar's approach achieves competitive performance with traditional training methods. At the same time, it remains available to a much broader community of participants.
Future plans target 8 billion parameters next, then 70 billion and beyond. Each step requires more coordination and computing power. Currently, only the top-performing miners participate in training to keep quality high. About 200 GPUs are available. As remaining technical challenges get solved, more participants will join safely. This will provide extra computing power for larger models.
The team focuses on improvements that would allow miners to train and coordinate at the same time. Currently, miners must complete each step before starting the next one. This creates delays that waste time and computing power. Planned improvements would remove these waiting periods. Miners could work on new data while sharing their previous results at the same time. This could greatly boost efficiency and make Templar more competitive with traditional systems.
Plans include expanding Templar into a platform for major AI labs. This would enable these organizations to train their most advanced models. They wouldn't need to build expensive data centers. This aligns with a growing industry movement. Companies are creating specialized basic models from the ground up. This differs from just fine-tuning existing models.
The team cites companies like Netflix and Stripe. These companies are reportedly developing foundational models. They tailor these models to their specific areas and use cases. The team envisions Templar playing a key role in enabling this use case at scale.
Beyond initial training, expansion could support the entire AI development process. This includes fine-tuning and specialized training phases.