Decentralized Infrastructure: The Future of Open Data in Web3 | 2025


Decentralized Infrastructure: The Future of Open Data in Web3
Open data must transition to decentralized infrastructure to realize its full potential and reap the benefits of affordable LLM training, accessible research data sharing, and unstoppable DApp hosting. Currently, open data is a significant contributor to building a global emerging tech economy, with an estimated market value exceeding $350 billion. However, many open data sources rely on centralized infrastructure, which contradicts the core principles of autonomy and censorship resistance that Web3 advocates.
The Need for Decentralized Infrastructure
To truly harness the potential of open data, a shift towards decentralized infrastructure is essential. This transition can resolve multiple vulnerabilities associated with user applications. Decentralized infrastructure has a myriad of use cases, ranging from hosting decentralized applications (DApps) and trading bots to sharing research data and facilitating the training and inference of large language models (LLMs).

Understanding the Shift
Examining each use case reveals why leveraging decentralized infrastructure for open data is more beneficial than relying on centralized systems. The recent market downturn, which wiped out $1 trillion from the US tech markets, underscores the importance of open-source protocols. This serves as a wake-up call to focus on the emerging economy driven by open data.

For instance, the final stage of training DeepSeek R1 cost approximately $5.5 million, a stark contrast to the over $100 million spent on OpenAI’s GPT-4. Yet, the latter still depends on centralized infrastructure platforms like LLM API providers, which are fundamentally at odds with the innovations stemming from open-source technologies.

Cost-Effective Hosting of Open-Source LLMs
Hosting open-source LLMs such as Llama 2 and DeepSeek R1 is both simple and cost-effective. Unlike stateful blockchains that require constant synchronization, LLMs are stateless and only necessitate periodic updates. Despite this simplicity, the computational costs associated with running inference on open-source models can be high, as node runners require GPUs. However, these models can significantly reduce costs since they do not need real-time updates for continuous synchronization.

The Rise of Generalizable Base Models
The emergence of generalizable base models like GPT-4 has paved the way for the development of innovative products through contextual inference. Centralized companies like OpenAI restrict any random network support or inference from their trained models. In contrast, decentralized node runners can facilitate the development of open-source LLMs by acting as AI endpoints that provide deterministic data to clients.
Empowering Entrepreneurs with Decentralized Networks
Decentralized networks lower entry barriers by enabling operators to launch their gateways on top of the network. These decentralized infrastructure protocols can handle millions of requests on their permissionless networks by open-sourcing the core gateway and service infrastructure. As a result, any entrepreneur or operator can deploy their gateway and tap into this emerging market.
For example, an individual can train an LLM using decentralized computing resources on the permissionless protocol Akash, which offers customized computing services at 85% lower prices than traditional centralized cloud providers. The AI training and inference market holds immense potential, with AI companies spending approximately $1 million daily on infrastructure maintenance to run LLM inference. This translates the service obtainable market (SAM) to roughly $365 million annually.

Accelerating Research Through Open Data
In the scientific and research domains, the combination of data sharing, machine learning, and LLMs can potentially accelerate research and enhance human lives. However, access to this data has been restricted by the high-cost journal system, which selectively publishes research based on the preferences of its board. This creates barriers to knowledge dissemination and hinders progress in various fields.
To overcome these challenges, a shift towards decentralized infrastructure is imperative. By democratizing access to data and reducing costs, decentralized systems can foster innovation and collaboration, ultimately benefiting society as a whole.

In conclusion, the transition from centralized to decentralized infrastructure is not just a technical necessity; it is a philosophical imperative that aligns with the core values of Web3. Embracing decentralized infrastructure for open data will unlock new opportunities, drive innovation, and create a more equitable tech economy.

For more insights on this topic, check out the original article here.
