How far is the road from Filecoin to Arweave for Decentralization storage?

Author: @BlazingKevin_, the Researcher at Movemaker

Storage was once one of the top narratives in the industry. Filecoin, as the leader in the previous bull market, had a market cap exceeding $10 billion. Arweave, a comparable storage protocol, promoted permanent storage as its selling point, reaching a peak market cap of $3.5 billion. However, as the availability of cold data storage has been debunked, the necessity of permanent storage is being questioned, and whether the narrative of decentralized storage can succeed is marked with a big question mark. The emergence of Walrus has stirred the long-silent storage narrative, and now Aptos has teamed up with Jump Crypto to launch Shelby, aiming to elevate decentralized storage in the hot data space to the next level. So, can decentralized storage make a comeback and provide widespread use cases? Or is it just another topic for hype? This article analyzes the evolution of the decentralized storage narrative based on the development paths of Filecoin, Arweave, Walrus, and Shelby, attempting to find an answer to the question: How far is the path to the popularization of decentralized storage?

Filecoin: Storage is the surface, mining is the essence

Filecoin is one of the initial emerging altcoins, and its development direction naturally revolves around decentralization, which is a common characteristic of early altcoins—seeking the significance of decentralization in various traditional tracks. Filecoin is no exception; it connects storage with decentralization, naturally leading to the drawbacks of centralized storage: the trust assumption in centralized data storage service providers. Therefore, what Filecoin does is shift centralized storage to decentralized storage. However, certain aspects sacrificed in the process of achieving decentralization have become the pain points that later projects like Arweave or Walrus aim to address. To understand why Filecoin is merely a mining coin, one needs to comprehend the objective limitations of its underlying technology, IPFS, which is not suitable for hot data.

IPFS: A Decentralized Architecture, Yet Stalled by Transmission Bottlenecks

IPFS (InterPlanetary File System) was introduced around 2015 and aims to revolutionize the traditional HTTP protocol through content addressing. The biggest drawback of IPFS is its extremely slow retrieval speed. In an era where traditional data service providers can achieve millisecond-level responses, retrieving a file from IPFS still takes several seconds, making it difficult to promote in practical applications and explaining why it is rarely adopted by traditional industries except for a few blockchain projects.

The IPFS underlying P2P protocol is mainly suitable for "cold data", which refers to static content that does not change frequently, such as videos, images, and documents. However, when it comes to handling hot data, such as dynamic web pages, online games, or artificial intelligence applications, the P2P protocol does not offer significant advantages over traditional CDNs.

However, although IPFS itself is not a blockchain, its directed acyclic graph (DAG) design concept is highly compatible with many public chains and Web3 protocols, making it inherently suitable as a foundational framework for blockchain. Therefore, even if it does not have practical value, it is already quite sufficient as a foundational framework that carries the blockchain narrative. Early altcoin projects only needed a functioning framework to set sail into the stars and the sea, but as Filecoin developed to a certain extent, the inherent flaws brought by IPFS began to hinder its progress.

The Logic of Mining Coins Under Storage Clothing

The original intention of IPFS was to allow users to store data while also being part of the storage network. However, without economic incentives, it is difficult for users to voluntarily use this system, let alone become active storage nodes. This means that most users will only store files on IPFS but will not contribute their own storage space or store others' files. It is against this backdrop that Filecoin was born.

The token economic model of Filecoin mainly includes three roles: users are responsible for paying fees to store data; storage miners receive token incentives for storing user data; retrieval miners provide data when users need it and receive incentives.

This model has potential malicious space. Storage miners may fill up garbage data after providing storage space to gain rewards. Since this garbage data will not be retrieved, even if it is lost, it will not trigger the penalty mechanism for storage miners. This allows storage miners to delete garbage data and repeat this process. Filecoin's proof of replication consensus can only ensure that user data has not been privately deleted, but cannot prevent miners from filling up garbage data.

The operation of Filecoin largely depends on the continuous investment of miners in the token economy, rather than on the real demand from end users for distributed storage. Although the project is still undergoing continuous iterations, at this stage, the ecological construction of Filecoin is more in line with the definition of "mining coin logic" rather than "application-driven" storage projects.

Arweave: Success through long-termism, failure through long-termism

If the design goal of Filecoin is to build an incentivized, verifiable decentralized "data cloud" shell, then Arweave takes another extreme direction in storage: providing the capability for permanent data storage. Arweave does not attempt to construct a distributed computing platform; its entire system revolves around a core assumption—important data should be stored once and remain permanently on the network. This extreme long-termism makes Arweave fundamentally different from Filecoin in terms of mechanisms, incentive models, hardware requirements, and narrative perspectives.

Arweave uses Bitcoin as a learning object, attempting to continuously optimize its permanent storage network over long periods measured in years. Arweave does not care about marketing, nor does it concern itself with competitors and market trends. It is simply progressing along the path of iterating its network architecture, indifferent even if no one pays attention, because that is the essence of the Arweave development team: long-termism. Thanks to long-termism, Arweave was enthusiastically sought after in the last bull market; and because of long-termism, even if it falls to the bottom, Arweave may still endure several cycles of bull and bear. The only question is whether there will be a place for Arweave in the future of decentralized storage. The existing value of permanent storage can only be proven over time.

Since the Arweave mainnet started from version 1.5 to the recent version 2.9, it has lost market discussion, but has been committed to allowing a wider range of miners to participate in the network at the lowest cost, and to incentivize miners to maximize data storage, continuously enhancing the robustness of the entire network. Arweave, being aware that it does not align with market preferences, has taken a conservative approach, not embracing the miner community, with the ecosystem completely stagnated, upgrading the mainnet at minimal cost while continuously lowering hardware thresholds without compromising network security.

Review of the Upgrade Path from 1.5 to 2.9

Arweave version 1.5 exposed a vulnerability where miners could rely on GPU stacking instead of real storage to optimize block generation chances. To curb this trend, version 1.7 introduced the RandomX algorithm, limiting the use of specialized computing power and instead requiring general-purpose CPUs to participate in mining, thereby reducing power centralization.

In version 2.0, Arweave adopts SPoA, converting data proofs into a concise path of Merkle tree structure, and introduces format 2 transactions to reduce synchronization burdens. This architecture alleviates network bandwidth pressure, significantly enhancing the collaborative capabilities of nodes. However, some miners can still evade the responsibility of holding real data through centralized high-speed storage pool strategies.

To correct this bias, 2.4 introduced the SPoRA mechanism, which incorporates global indexing and slow hash random access, requiring miners to genuinely hold data blocks to participate in effective block creation, thus weakening the stacking effect of computing power from a mechanistic perspective. As a result, miners began to focus on storage access speed, driving the application of SSDs and high-speed read/write devices. 2.6 introduced a hash chain to control the rhythm of block creation, balancing the marginal benefits of high-performance devices and providing fair participation space for small and medium-sized miners.

Subsequent versions further strengthen network collaboration capabilities and storage diversity: 2.7 adds collaborative mining and mining pool mechanisms to enhance the competitiveness of small miners; 2.8 introduces a composite packaging mechanism that allows large-capacity low-speed devices to participate flexibly; 2.9 introduces a new packaging process in replica_2_9 format, significantly improving efficiency and reducing computational dependencies, completing the closed-loop of data-driven mining models.

Overall, Arweave's upgrade path clearly presents its storage-oriented long-term strategy: while continuously resisting the trend of computational power centralization, it lowers the participation threshold to ensure the possibility of the protocol's long-term operation.

Walrus: Is Embracing Hot Data Hype or Hidden Depths?

From a design perspective, Walrus is completely different from Filecoin and Arweave. The starting point of Filecoin is to create a decentralized and verifiable storage system, at the cost of cold data storage; Arweave aims to create an on-chain library of Alexandria that can permanently store data, at the cost of too few scenarios; Walrus seeks to optimize the storage costs of hot data storage protocols.

Magic Modification of Erasure Codes: Cost Innovation or Old Wine in a New Bottle?

In terms of storage cost design, Walrus believes that the storage overhead of Filecoin and Arweave is unreasonable, as both adopt a fully replicated architecture. Their main advantage lies in the fact that each node holds a complete copy, which provides strong fault tolerance and independence between nodes. This type of architecture ensures that even if some nodes go offline, the network still maintains data availability. However, this also means that the system requires multiple copies for redundancy to maintain robustness, thereby driving up storage costs. Especially in the design of Arweave, the consensus mechanism itself encourages node redundancy for enhanced data security. In contrast, Filecoin is more flexible in cost control, but at the expense of potentially higher data loss risks in some low-cost storage options. Walrus attempts to find a balance between the two, enhancing availability through structured redundancy while controlling replication costs, thus establishing a new compromise path between data availability and cost efficiency.

The Redstuff created by Walrus is a key technology for reducing node redundancy, originating from Reed-Solomon (RS) coding. RS coding is a very traditional error correction code algorithm, which is a technique that allows for doubling a dataset by adding redundant fragments (erasure code), enabling the reconstruction of the original data. From CD-ROMs to satellite communications to QR codes, it is frequently used in daily life.

Erasure coding allows users to take a block, for example, 1MB in size, and then "expand" it to 2MB, where the additional 1MB is special data known as erasure coding. If any bytes in the block are lost, users can easily recover these bytes through the code. Even if up to 1MB of the block is lost, you can recover the entire block. The same technology allows computers to read all data on a CD-ROM, even if it has been damaged.

Currently, the most commonly used is RS coding. The implementation method starts from k information blocks, constructs the relevant polynomial, and evaluates it at different x coordinates to obtain the coded blocks. Using RS erasure coding, the probability of randomly sampling large amounts of lost data is very small.

For example: A file is divided into 6 data blocks and 4 parity blocks, totaling 10 pieces. As long as any 6 pieces are retained, the original data can be completely restored.

Advantages: Strong fault tolerance, widely used in CD/DVD, fault-tolerant disk array (RAID), and cloud storage systems (such as Azure Storage, Facebook F4).

Disadvantages: Decoding calculations are complex and the overhead is relatively high; not suitable for data scenarios with frequent changes. Therefore, it is usually used for data recovery and scheduling in off-chain centralized environments.

Under the decentralized architecture, Storj and Sia have adjusted traditional RS coding to meet the actual needs of distributed networks. Based on this, Walrus has also proposed its own variant - the RedStuff coding algorithm - to achieve a lower cost and more flexible redundancy storage mechanism.

What is the biggest feature of Redstuff? By improving the erasure coding algorithm, Walrus can quickly and robustly encode unstructured data blocks into smaller shards, which are distributed across a storage node network. Even if up to two-thirds of the shards are lost, the original data block can be quickly reconstructed using partial shards. This is made possible while maintaining a replication factor of only 4 to 5 times.

Therefore, it is reasonable to define Walrus as a lightweight redundancy and recovery protocol redesigned around a decentralized scenario. Compared to traditional erasure codes (such as Reed-Solomon), RedStuff no longer pursues strict mathematical consistency, but instead makes realistic trade-offs regarding data distribution, storage verification, and computational costs. This model abandons the instantaneous decoding mechanism required for centralized scheduling, instead verifying whether nodes possess specific data copies through on-chain Proof, thereby adapting to a more dynamic and marginalized network structure.

The core design of RedStuff is to split data into two categories: primary slices and secondary slices. Primary slices are used to recover the original data, and their generation and distribution are under strict constraints, with a recovery threshold of f+1 and requiring 2f+1 signatures as availability endorsement. Secondary slices are generated through simple operations such as XOR combinations, serving to provide elastic fault tolerance and enhance the overall robustness of the system. This structure essentially lowers the requirements for data consistency—allowing different nodes to temporarily store different versions of data, emphasizing the practical path of "eventual consistency." Although similar to the lenient requirements for backtracking blocks in systems like Arweave, which achieves certain effects in reducing network burden, it also weakens the guarantees of data immediacy and integrity.

It is important to note that RedStuff, while achieving effective storage in low computing power and low bandwidth environments, is essentially a "variant" of erasure coding systems. It sacrifices a certain degree of data read determinacy in exchange for cost control and scalability in a decentralized environment. However, it remains to be seen whether this architecture can support large-scale, high-frequency interactive data scenarios at the application level. Furthermore, RedStuff has not truly broken through the long-standing coding computation bottleneck of erasure coding, but has instead avoided the high coupling points of traditional architectures through structural strategies. Its innovation is more reflected in combinatorial optimization on the engineering side rather than a disruption at the foundational algorithm level.

Therefore, RedStuff is more like a "reasonable modification" aimed at the current decentralized storage reality. It indeed brings improvements in redundancy costs and operational load, allowing edge devices and non-high-performance nodes to participate in data storage tasks. However, in large-scale applications, general computing adaptation, and business scenarios with higher consistency requirements, its capability boundaries remain quite apparent. This makes Walrus's innovation more of an adaptive transformation of the existing technological system rather than a decisive breakthrough in promoting the migration of the decentralized storage paradigm.

Sui and Walrus: Can High-Performance Public Chains Drive the Practical Use of Storage?

From Walrus's official research article, we can see its target scenario: "The original intention of Walrus's design is to provide a solution for storing large binary files (Blobs), which are the lifeblood of many decentralized applications."

The so-called large blob data usually refers to large, unstructured binary objects, such as videos, audio, images, model files, or software packages.

In the context of cryptocurrency, it more refers to images and videos in NFTs and social media content. This also constitutes the main application direction of Walrus.

  • Although the article also mentions the potential uses of AI model dataset storage and Data Availability layers (DA), the gradual decline of Web3 AI has left very few related projects, and the number of protocols that will truly adopt Walrus in the future may be very limited.
  • In terms of the DA layer, whether Walrus can serve as a viable alternative still needs to be validated after mainstream projects like Celestia rekindle market interest.

Therefore, the core positioning of Walrus can be understood as a hot storage system for serving content assets such as NFTs, emphasizing dynamic invocation, real-time updates, and version management capabilities.

This also explains why Walrus needs to rely on Sui: with Sui's high-performance chain capabilities, Walrus can build a high-speed data retrieval network, significantly reducing operational costs without the need to develop a high-performance public chain, thus avoiding direct competition with traditional cloud storage services in terms of unit costs.

According to official data, the storage cost of Walrus is about one-fifth of traditional cloud services. Although it appears to be dozens of times more expensive compared to Filecoin and Arweave, its goal is not to pursue extremely low costs, but to build a decentralized hot storage system that can be used in real business scenarios. Walrus itself operates as a PoS network, with the core responsibility of verifying the honesty of storage nodes, providing the most basic security guarantee for the entire system.

As for whether Sui really needs Walrus, it currently remains more at the level of ecological narrative. If financial settlement is the main purpose, Sui does not urgently need off-chain storage support. However, if it hopes to support more complex on-chain scenarios in the future, such as AI applications, content assetization, and composable agents, then the storage layer will be indispensable in providing context, context, and indexing capabilities. High-performance chains can handle complex state models, but these states need to be bound to verifiable data in order to build a trustworthy content network.

Shelby: Dedicated Fiber Network Completely Unleashes Web3 Application Scenarios

Among the biggest technical bottlenecks facing current Web3 applications, "read performance" has always been a difficult shortcoming to overcome.

Whether it's video streaming, RAG systems, real-time collaboration tools, or AI model inference engines, they all rely on low-latency, high-throughput access to hot data. Decentralized storage protocols (from Arweave, Filecoin to Walrus) have made progress in terms of data persistence and trustlessness, but because they operate on the public internet, they are always unable to escape the limitations of high latency, unstable bandwidth, and uncontrollable data scheduling.

Shelby is trying to address this issue at its root.

First, the Paid Reads mechanism directly reshapes the "read operation" dilemma in decentralized storage. In traditional systems, data retrieval is almost free, and the lack of effective incentive mechanisms leads to service nodes generally being lazy in responding and cutting corners, resulting in an actual user experience that is far behind Web2.

Shelby links user experience directly to service node income by introducing a pay-per-use model: the faster and more reliably nodes return data, the more rewards they can earn.

This model is not an "incidental economic design," but rather the core logic of Shelby performance design - without incentives, there is no reliable performance; with incentives, there is a sustainable improvement in service quality.

Secondly, one of the biggest technological breakthroughs proposed by Shelby is the introduction of the Dedicated Fiber Network (, which is equivalent to building a high-speed rail network for the instant reading of hot data in Web3.

This architecture completely bypasses the public transport layer that Web3 systems generally rely on, directly deploying storage nodes and RPC nodes on a high-performance, low-congestion, physically isolated transport backbone. This not only significantly reduces the latency of inter-node communication but also ensures the predictability and stability of the transmission bandwidth. Shelby's underlying network structure is more akin to the dedicated line deployment model between AWS internal data centers, rather than the "upload to a miner node" logic of other Web3 protocols.

![])https://img-cdn.gateio.im/webp-social/moments-464b86a140c07e4f022c580b75016503.webp(

Source: Shelby White Paper

This network-level architecture inversion makes Shelby the first decentralized hot storage protocol capable of genuinely supporting Web2-level user experiences. Users can read a 4K video, call embedding data from a large language model, or trace a transaction log on Shelby without enduring the second-level latency commonly found in cold data systems, but instead receive sub-second responses. For service nodes, the dedicated network not only improves service efficiency but also significantly reduces bandwidth costs, making the "pay-per-read" mechanism truly economically viable, thereby encouraging the system to evolve towards higher performance rather than greater storage capacity.

It can be said that the introduction of dedicated fiber optic networks is the key support that allows Shelby to "look like AWS, but is fundamentally Web3." It not only breaks the natural opposition between decentralization and performance but also opens up the real possibility for Web3 applications in terms of high-frequency reading, high-bandwidth scheduling, and low-cost edge access.

In addition, between data persistence and cost, Shelby adopts the Efficient Coding Scheme built by Clay Codes, achieving a storage redundancy as low as <2x through the mathematically optimal MSR and MDS coding structures, while still maintaining 11 nines of persistence and 99.9% availability. While most Web3 storage protocols are still at 5x~15x redundancy today, Shelby is not only more efficient technically but also more competitive in cost. This also means that for dApp developers who truly value cost optimization and resource scheduling, Shelby offers a "cheap and fast" practical option.

Summary

Looking at the evolution from Filecoin, Arweave, and Walrus to Shelby, we can clearly see: the narrative of decentralized storage has gradually shifted from a technological utopia of "existence is justification" to a realistic approach of "availability is justice". Early Filecoin drove hardware participation through economic incentives, but genuine user needs have long been marginalized; Arweave chose extreme permanent storage, yet it appears increasingly isolated in a silent application ecosystem; Walrus attempts to find a new balance between cost and performance, but there are still doubts regarding the construction of landing scenarios and incentive mechanisms. It was not until Shelby emerged that decentralized storage first provided a systematic response to "Web2-level availability" — from dedicated fiber optic networks at the transmission layer to efficient erasure code design at the computing layer, and to a pay-per-read incentive mechanism, these capabilities that originally belonged to centralized cloud platforms are beginning to be reconstructed in the Web3 world.

The emergence of Shelby does not mean the end of problems. It also does not solve all the challenges: issues such as developer ecology, permission management, and terminal access are still ahead. But its significance lies in opening a possible path of "performance without compromise" for the decentralized storage industry, breaking the binary paradox of "either censorship-resistant or user-friendly."

The path to the popularization of decentralized storage will ultimately rely not just on conceptual hype or token speculation, but must move towards an application-driven stage of "usable, integrable, and sustainable". In this stage, whoever can first solve the real pain points of users will reshape the narrative of the next round of infrastructure. From mining coin logic to usage logic, Shelby's breakthrough may mark the end of an era - and the beginning of another.

About Movemaker

Movemaker is the first official community organization authorized by the Aptos Foundation, jointly initiated by Ankaa and BlockBooster, focusing on promoting the construction and development of the Aptos ecosystem in the Chinese-speaking region. As the official representative of Aptos in the Chinese-speaking area, Movemaker is committed to creating a diverse, open, and prosperous Aptos ecosystem by connecting developers, users, capital, and numerous ecological partners.

Disclaimer:

This article/blog is for reference only, representing the author's personal views and does not reflect the position of Movemaker. This article does not intend to provide: )i( investment advice or investment recommendations; )ii( offers or solicitations to buy, sell, or hold digital assets; or )iii( financial, accounting, legal, or tax advice. Holding digital assets, including stablecoins and NFTs, is extremely risky, with significant price volatility, and they may even become worthless. You should carefully consider whether trading or holding digital assets is suitable for you based on your financial situation. If you have specific questions, please consult your legal, tax, or investment advisor. The information provided in this article (including market data and statistics, if any) is for general reference only. Reasonable care has been taken in compiling this data and charts, but no responsibility is accepted for any factual errors or omissions contained herein.

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • 1
  • Share
Comment
0/400
GateUser-d86e4b85vip
· 4h ago
Hold on tight, we're about to To da moon 🛫
Reply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)