HALO Database 2.0
A proposal for funding to support the development of a novel, decentralized HALO database was submitted on 18 December 2025 to the DFG programme: Information Infrastructure for Research Data. The envisioned 3-year project aims at developing a working prototype of a de-centralized database. It is intended to apply for another 3-year project phase to develop advanced functionality in a second phase of the project.
The new database will use the IPFS technology:
IPFS (InterPlanetary File System) is a peer-to-peer, content-addressed network designed to store and share data in a decentralized manner. Unlike traditional Uniform Resource Locators (URLs) that point to locations on specific servers, IPFS uses content identifiers (CIDs) – cryptographic hashes of the data itself – as addresses. This means that any piece of content retrieved via its CID is guaranteed to match the original data, enabling immutable, verifiable storage across a distributed network of nodes ([1]). CIDs can be „pinned“ on a server, meaning that the data is fetched by the server and kept until possibly „unpinned“. Multiple servers can pin the same CID, creating redundancy. The scalability of the system and its performance for a wide range of applications is well documented (see here, here, and here) including using IPFS for large scientific data (see here).
skip to: > Project goals | > Objectives | > Project schedule | > Work packages & leads
Fully functional & usable prototype of the de-centralized HALO-DB 2.0
–> Data access, digestion, searchable website…
Full manuals & descriptions for users
–> User how-to’s, storage node set-up guide, …
Regulations & rules for storage nodes
–> Clear rules for long-term accessibility of storage nodes & administration (including access rights to data sets)
–> Connection to consortium & WLA for regulations & rules (data safety, accessibility, …)
Roadmap for phase II
–> Second proposal to same programme at DFG as a follow-up: 3 more years to
- Implement “embargo data”,
- Merge data files from current HALO-DB,
- Resolve open issues concerning governance,
- Establish system within HALO community
Objectives: Project phase I
(Prototype development)
The proposal for the first phase (Prototype-development) outlines the following objectives:
Prototypical Development of a Distributed Data Infrastructure for RDM
–> Design, deploy, and test a multi node system (three institutional nodes + one spare node) using IPFS to demonstrate scalability, resilience, and decentralization.
Implementation of Core Services as Reliable, Scalable APIs
–> Develop standardized RESTful (Representational State Transfer) inter-faces and client libraries for metadata ingestion, data upload- /download, authentication/authorization, and DOI/hash assignment to ensure reliable, automated data exchange.
IT-Security
–> Conduct regular threat modeling and automated testing for vulnerabilities. Align security measures with
- Local infrastructure requirements (data centers, scientific institutions)
- Federal Office for Information Security (Bundesamt für Sicherheit in der Informationstechnik – BSI)
- General Data Protection Regulation (GDPR) across all infrastructure components.
–> Make sure that the configuration of the nodes is set to replicate HALO data only and not arbitrary data from the wider IPFS. Deploy automated tests, certificate management and monitoring.
Provenance and FAIR Compliance
–> Use established metadata schemas (CF conventions, ISO 19115, etc.), to
- Implement automated metadata validation and provenance tracking
- Assign globally persistent identifiers to all data objects to satisfy FAIR principles.
Performance and Resilience Validation
–> Establish performance benchmarks (e.g., bandwidth, latency, uptime)
–> Demonstrate automatic failover and caching across geographically distributed nodes to ensure robust operation under realistic field-campaign conditions.
Community Integration and Training
–> Engage the HALO research community through dedicated workshops, user surveys, and documentation, to validate usability, collect feedback, and build capacity for the follow-up proposal.
Conceptualization of Governance and Sustainability
–> Draft a detailed governance model, funding strategy, and operational plan for transitioning the prototype to a long-term operational service, including integration with national and international research data management activities.
The 36-month funding period will be organized in four main parts of the project, which will be worked on as follows during the specified project months (PMs):
Part 1a – Initial storage infrastructure setup; PMs 1–18: Define requirements and safe operation measures for storage nodes; implement data upload, accounting, and pinning methods. (WP 3, with input from WPs 8, 9)
Part 1b – Core software development & integration; PMs 1–20: Define user requirements for DB use, basic data ingestion, indexing, and FAIR validation; develop UI; successfully upload existing field-campaign data. (WPs 2, 4, 5, 6, 7, with input from WPs 8, 9)
Part 2 – Operation & Maintenance of the prototype; PMs 11–36: Implement FAIR validation, global pinning service, DOI assignment, external search, explore non-public data methods; iterative testing & improvement of core functions. (all WPs)
Part 3 – User engagement & dissemination; PMs 3–36: User needs assessments, workshops, user documentation. Continuous feedback loops between community and development team (as soon as user access is possible). And lessons learned from existing solutions. (WP 8, with input from all other WPs)
Part 4 – Governance & sustainability planning; PMs 5–36: Governance model, road map to proposal for Phase II. (WP 9, with input from all other WPs)
| Milestone | Description | Project month |
|---|---|---|
| M 1 | Dataset definition | PM 6 |
| M 2 | First Storage node & Pinning service for development available | PM 10 |
| M 3 | Requirements assessments, governance structures & DB blueprint finalized | PM 10 |
| M 4 | Index definition finalized | PM 11 |
| M 5 | User authentication complete | PM 15 |
| M 6 | First usable Web UI >>> start of community feedback-loop | PM 16 |
| M 7 | Data index creation & Ingestion Complete: PERCUSION datasets ingestion successfully tested | PM 20 |
| M 8 | First user engagement workshop with HALO-DB demonstration | PM 21 |
| M 9 | Decentralised storage nodes at 3 partner facilities ready | PM 24 |
| M 10 | FAIR Validation & Performance Evaluation: Metadata checks, DOI assignment, bandwidth benchmarks defined | PM 25 |
| M 11 | Web UI public access | PM 26 |
| M 12 | Second Community Workshop | PM 29 |
| M 13 | HALO DB searchable for external services | PM 31 |
| M 14 | Final Documentation & Open-Source Release: All code, documentation, and tutorials publicly available | PM 35 |
| M 15 | Embargo data handling strategy decided | PM 36 |
| M 16 | Governance Concept & Follow-up Proposal Roadmap Finalized: Sustainability plan approved by all partners | PM 36 |
WP 1: Project management
Governance, publication coordination, reporting, risk management.
Lead: Universität Leipzig
WP 2: Frontend & APIs
UI prototyping, API design, implementation
Lead: Deutsches Zentrum für Luft- und Raumfahrt, Oberpfaffenhofen
WP 3: Stoarage infrastructure & Node deployment
Procurement, node development and deployment, performance tests
Lead: Deutsches Klimarechenzentrum, Hamburg

WP 4: Data Structures & Indexing
Schema design (datasets & index), data transformations, index operations
Lead: Max-Planck-Institut für Meteorologie, Hamburg
WP 5: Access Control & „embargo Data“
AuthN/AuthZ, encryption, security
Lead: Ludwig-Maximilians-Universität, München
WP 6: Data Ingestion & Index Creation
ETL pipelines, index creation, monitoring
Lead: Max-Planck-Institut für Meteorologie, Hamburg
WP 7: Quality Management & FAIR Compliance
Lead: Karlsruhe Institut für Technologie
WP 8: User Engagement & Training
Workshops, surveys, documentation
Lead: Karlsruhe Institut für Technologie
Tab-Inhalt
WP 9: Governance & Sustainability Concept
Governance model, business case, roadmap for phase II
Lead: Deutsches Zentrum für Luft- und Raumfahrt, Oberpfaffenhofen
skip to: Top | > Project goals | > Objectives | > Project schedule | > Work packages & leads