Data Model
Because of Arweave's permanent and immutable nature, traditional file structure operations such as renaming and moving files or folders cannot be accomplished by simply updating on-chain data. ArFS works around this by defining an append-only transaction data model based on the metadata tags found in the Arweave Transaction Headers.
This model uses a bottom-up reference method, which avoids race conditions in file system updates. Each file contains metadata that refers to the parent folder, and each folder contains metadata that refers to its parent drive. A top-down data model would require the parent model (i.e. a folder) to store references to its children.
These defined entities allow the state of the drive to be constructed by a client to look and feel like a file system:
- Drive Entities contain folders and files
- Folder Entities contain other folders or files
- File Entities contain both the file data and metadata
- Snapshot entities contain a state rollups of all entities' (such as drive, folder, file and snapshot) metadata within a drive
Entity Relationships
The following diagram shows the high level relationships between drive, folder, and file entities, and their associated data. More detailed information about each Entity Type can be found in the ArFS specification documentation.
As you can see, each file and folder contains metadata which points to both the parent folder and the parent drive. The drive entity contains metadata about itself, but not the child contents. So clients must build drive states from the lowest level and work their way up.
Metadata Format
Metadata stored in any Arweave transaction tag will be defined in the following manner:
{ "name": "Example-Tag", "value": "example-data" }
Metadata stored in the Transaction Data Payload will follow JSON formatting like below:
{
"exampleField": "exampleData"
}
Fields with a ?
suffix are optional.
{
"name": "My Project",
"description": "This is a sample project.",
"version?": "1.0.0",
"author?": "John Doe"
}
Enumerated field values (those which must adhere to certain values) are defined in the format "value 1 | value 2".
All UUIDs used for Entity-Ids are based on the Universally Unique Identifier standard.
There are no requirements to list ArFS tags in any specific order.
Building Drive State
To construct the current state of a drive, clients must:
- Query for all entities associated with a specific
Drive-Id
- Sort by block height to establish chronological order
- Process entities bottom-up starting with files and folders
- Build the hierarchy by following parent-child relationships
- Handle conflicts by using the most recent entity version
Example Drive State Construction
Entity Lifecycle
Each ArFS entity follows a specific lifecycle pattern:
Creation
- Generate unique UUID for entity
- Create metadata transaction with required tags
- For files: create separate data transaction
- Upload to Arweave network
Updates
- Create new entity with same ID
- Update metadata as needed
- Upload new transaction
- Client processes both versions and uses latest
Deletion
- Mark entity as hidden (
isHidden: true
) - Upload new transaction
- Entity remains in history but hidden from UI
Data Integrity
ArFS ensures data integrity through:
- Immutable transactions - Once uploaded, data cannot be modified
- Cryptographic signatures - All transactions are signed by the owner
- Version tracking - Multiple versions of entities can exist
- Conflict resolution - Clients use block height and timestamps to resolve conflicts
Performance Considerations
For large drives, consider these optimization strategies:
- Use snapshots for quick state reconstruction
- Implement caching for frequently accessed data
- Batch operations when possible
- Query by date ranges to limit data transfer
Next Steps
Now that you understand the ArFS data model, learn how to work with it:
- Privacy & Encryption - Secure your data with private drives
- Creating Drives - Start building with ArFS
- Reading Data - Query and retrieve your data
How is this guide?