Blockchain technology continues to reshape industries—from finance to supply chain management—by enabling decentralized, transparent, and tamper-resistant data systems. As demand for blockchain analytics grows, access to reliable, structured, and up-to-date data becomes essential for researchers, developers, and enterprises. The AWS Public Blockchain Data program delivers exactly that: free, high-quality blockchain datasets hosted on Amazon S3, optimized for analytics and research.
These datasets are part of the Registry of Open Data on AWS, a collaborative initiative that makes public datasets discoverable and accessible to the global community. By transforming raw blockchain data into efficient, query-ready formats, AWS empowers users to conduct in-depth analysis without the overhead of node operation or data processing.
What Is AWS Public Blockchain Data?
The AWS Public Blockchain Data program provides open access to blockchain data from major networks, including Bitcoin, Ethereum, and several Layer 2 and alternative blockchains. The data is stored in Apache Parquet format, a columnar file format known for its compression efficiency and fast query performance—ideal for large-scale analytics.
Each dataset is partitioned by date, allowing users to efficiently query specific time ranges without scanning unnecessary data. This structure significantly reduces processing time and cost when running analytics using tools like Amazon Athena, Apache Spark, or other big data platforms.
👉 Discover how blockchain data can power your next analytics project
Available Blockchain Datasets
The following blockchains are currently supported under the AWS Public Blockchain Data program:
- Bitcoin – Maintained by AWS
Path:s3://aws-public-blockchain/v1.0/btc/ - Ethereum – Maintained by AWS
Path:s3://aws-public-blockchain/v1.0/eth/ - Arbitrum – Maintained by SonarX
Path:s3://aws-public-blockchain/v1.1/sonarx/arbitrum/ - Aptos – Maintained by SonarX
Path:s3://aws-public-blockchain/v1.1/sonarx/aptos/ - Base – Maintained by SonarX
Path:s3://aws-public-blockchain/v1.1/sonarx/base/ - Provenance – Maintained by SonarX
Path:s3://aws-public-blockchain/v1.1/sonarx/provenance/ - XRP Ledger – Maintained by SonarX
Path:s3://aws-public-blockchain/v1.1/sonarx/xrp/
Note: While AWS directly maintains the Bitcoin and Ethereum datasets, others are contributed and maintained by SonarX, a blockchain data infrastructure provider. For full datasets with real-time updates and enterprise support, visit SonarX’s official platform.
How Often Is the Data Updated?
New blockchain data is delivered daily to the corresponding date-partitioned folders in Parquet format. This ensures users have access to near real-time information, making the datasets suitable for time-sensitive research, trend analysis, and monitoring applications.
For example, transactions from January 5, 2025, will be available in the /date=2025-01-05/ directory within the respective blockchain bucket. This predictable structure simplifies automation and integration into existing data pipelines.
Why Use AWS Public Blockchain Data?
There are several compelling reasons why developers, researchers, and analysts choose this resource:
- Free Access: No cost to download or use the data.
- High Performance: Parquet files enable fast queries and reduced storage costs.
- Scalability: Leverage AWS’s global infrastructure for processing petabytes of blockchain data.
- Interoperability: Compatible with popular analytics tools such as Amazon Athena, AWS Glue, Presto, and Spark.
- Cross-Chain Analytics: With both Bitcoin and Ethereum available, users can perform comparative studies across different consensus mechanisms and network behaviors.
👉 Start exploring structured blockchain datasets today
Core Keywords
This article focuses on the following core keywords, which reflect user search intent and technical relevance:
- AWS Public Blockchain Data
- blockchain datasets
- open data on AWS
- Bitcoin analytics
- Ethereum data
- Parquet format
- S3 blockchain data
- cross-chain analytics
These terms are naturally integrated throughout the content to enhance SEO visibility while maintaining readability and technical accuracy.
How to Access and Use the Data
Accessing the datasets is straightforward:
- Open the AWS Management Console or use the AWS CLI.
- Navigate to the S3 bucket path corresponding to your desired blockchain.
- Query or download the Parquet files using your preferred toolset.
For example, to analyze Ethereum transactions from a specific date using Amazon Athena:
SELECT * FROM eth_transactions
WHERE block_timestamp >= '2025-01-05'
AND block_timestamp < '2025-01-06';You can also build automated ETL pipelines using AWS Glue to transform and load this data into data warehouses for dashboarding or machine learning applications.
Licensing and Attribution
The data is released under an open-source license hosted on GitHub:
https://github.com/aws-samples/digital-assets-examples/blob/main/LICENSE
When using the data in publications or projects, please cite it as follows:
AWS Public Blockchain Data was accessed on DATE from https://registry.opendata.aws/aws-public-blockchainThis attribution helps maintain transparency and supports the open-data ecosystem.
Documentation and Support
Comprehensive documentation is available at:
https://github.com/aws-samples/digital-assets-examples/blob/main/analytics/
It includes code samples, schema definitions, best practices for querying, and integration guides for various AWS services.
For inquiries or feedback, contact the team at [email protected].
👉 Unlock advanced blockchain insights with powerful analytics tools
Frequently Asked Questions (FAQ)
What blockchains are included in the AWS Public Blockchain Data program?
Currently supported blockchains include Bitcoin, Ethereum, Arbitrum, Aptos, Base, Provenance, and XRP Ledger. Bitcoin and Ethereum datasets are maintained by AWS, while others are managed by SonarX.
Is there a cost to access these datasets?
No. All datasets are freely accessible via Amazon S3. However, standard AWS data retrieval and query execution fees may apply if you use services like Amazon Athena or transfer large volumes of data.
Can I use this data for commercial purposes?
Yes, subject to the terms of the license. The data is open for research, development, and commercial use, provided proper attribution is given.
How is the data structured?
The data is stored in Apache Parquet format and partitioned by date (e.g., /date=2025-01-05/). This structure optimizes performance for time-based queries and large-scale analytics.
What tools work best with these datasets?
Popular tools include Amazon Athena (for SQL queries), Apache Spark (for distributed processing), AWS Glue (for ETL), and Jupyter notebooks with PySpark integration.
Where can I find real-time or enterprise-grade updates?
For real-time streaming data and premium support, visit SonarX’s platform directly. The AWS-hosted version is updated daily and best suited for research and batch analytics.
Final Thoughts
The AWS Public Blockchain Data initiative lowers the barrier to entry for blockchain research and innovation. By offering structured, high-quality datasets at no cost, AWS enables developers and analysts to focus on deriving insights rather than managing infrastructure.
Whether you're studying transaction patterns on Bitcoin, analyzing smart contract behavior on Ethereum, or comparing Layer 2 scaling solutions like Arbitrum and Base, these datasets provide a solid foundation for exploration.
As blockchain adoption accelerates across industries, access to reliable historical and recent chain data will remain a critical asset—and AWS continues to lead in democratizing that access.