HALO

HALO Database 2.0

A proposal for funding to support the development of a novel, decentralized HALO database was submitted on 18 December 2025 to the DFG programme: Information Infrastructure for Research Data. The envisioned 3-year project aims at developing a working prototype of a de-centralized database. It is intended to apply for another 3-year project phase to develop advanced functionality in a second phase of the project.

The new database will use  the IPFS technology:


IPFS (InterPlanetary File System) is a peer-to-peer, content-addressed network designed to store and share data in a decentralized manner.  Unlike traditional Uniform  Resource Locators (URLs) that point to locations on specific servers, IPFS uses content identifiers (CIDs) – cryptographic hashes of the data itself – as addresses. This means that any piece of content retrieved via its CID is guaranteed to match the original data, enabling immutable, verifiable storage across a distributed network of nodes ([1]).  CIDs can be „pinned“ on a server, meaning that the data is fetched by the server and kept until possibly „unpinned“. Multiple servers can pin the same CID, creating redundancy. The scalability of the system and its performance for a wide range of applications is well documented (see here, here, and here) including using IPFS for large scientific data (see here).


skip to:  > Project goals  |  > Objectives  | > Project schedule  |  > Work packages & leads

Envisioned project results (phase I)

Projektbild: HALO Database 2.0
Decentralized data storage structure with functionality to serve the needs of the HALO user community.

Fully functional & usable prototype of the de-centralized HALO-DB 2.0
–> Data access, digestion, searchable website…

Full manuals & descriptions for users
–> User how-to’s, storage node set-up guide, …

Regulations & rules for storage nodes
–> Clear rules for long-term accessibility of storage nodes & administration (including access rights to data sets)
–> Connection to consortium & WLA for regulations & rules (data safety, accessibility, …)

Roadmap for phase II
–> Second proposal to same programme at DFG as a follow-up: 3 more years to

  • Implement “embargo data”,
  • Merge data files from current HALO-DB,
  • Resolve open issues concerning governance,
  • Establish system within HALO community

Objectives: Project phase I
(Prototype development)

The proposal for the first phase (Prototype-development) outlines the following objectives:

Prototypical Development of a Distributed Data Infrastructure for RDM
 –> Design, deploy, and test a multi node system (three institutional nodes + one spare node) using IPFS to demonstrate scalability, resilience, and decentralization.

Implementation of Core Services as Reliable, Scalable APIs
–> Develop standardized RESTful (Representational State Transfer) inter-faces and client libraries for metadata ingestion, data upload- /download, authentication/authorization, and DOI/hash assignment to ensure reliable, automated data exchange.

IT-Security
–> Conduct regular threat modeling and automated testing for vulnerabilities. Align security measures with

  • Local infrastructure requirements (data centers, scientific institutions)
  • Federal Office for Information Security (Bundesamt für Sicherheit in der Informationstechnik – BSI)
  • General Data Protection Regulation (GDPR) across all infrastructure components.

–> Make sure that the configuration of the nodes is set to replicate HALO data only and not arbitrary data from the wider IPFS. Deploy automated tests, certificate management and monitoring.

Provenance and FAIR Compliance
–> Use established metadata schemas (CF conventions, ISO 19115, etc.), to

  • Implement automated metadata validation and provenance tracking
  • Assign globally persistent identifiers to all data objects to satisfy FAIR principles.

Performance and Resilience Validation
–> Establish performance benchmarks (e.g., bandwidth, latency, uptime)
–> Demonstrate automatic failover and caching across geographically distributed nodes to ensure robust operation under realistic field-campaign conditions.

Community Integration and Training
–> Engage the HALO research community through dedicated workshops, user surveys, and documentation, to validate usability, collect feedback, and build capacity for the follow-up proposal.

Conceptualization of Governance and Sustainability
–> Draft a detailed governance model, funding strategy, and operational plan for transitioning the prototype to a long-term operational service, including integration with national and international research data management activities.

Project schedule & Milestones (phase I)

 The 36-month funding period will be organized in four main parts of the project, which will be worked on as follows during the specified project months (PMs):

Part 1a – Initial storage infrastructure setup; PMs 1–18: Define requirements and safe operation measures for storage nodes; implement data upload, accounting, and pinning methods.  (WP 3, with input from WPs 8, 9)

Part 1b – Core software development & integration; PMs 1–20: Define user requirements for DB use, basic data ingestion, indexing, and FAIR validation; develop UI; successfully upload existing field-campaign data. (WPs 2, 4, 5, 6, 7, with input from WPs 8, 9)

Part 2 – Operation & Maintenance of the prototype; PMs 11–36: Implement FAIR validation, global pinning service, DOI assignment, external search, explore non-public data methods; iterative testing & improvement of core functions. (all WPs)

Part 3 – User engagement & dissemination; PMs 3–36: User needs assessments, workshops, user documentation. Continuous feedback loops between community and development team (as soon as user access is possible). And lessons learned from existing solutions. (WP 8, with input from all other WPs)

Part 4 – Governance & sustainability planning; PMs 5–36: Governance model, road map to proposal for Phase II. (WP 9, with input from all other WPs)

MilestoneDescriptionProject month
M 1Dataset definitionPM 6
M 2First Storage node & Pinning service for development availablePM 10
M 3Requirements assessments, governance structures & DB blueprint finalizedPM 10
M 4Index definition finalizedPM 11
M 5User authentication completePM 15
M 6First usable Web UI >>> start of community feedback-loopPM 16
M 7Data index creation & Ingestion Complete: PERCUSION datasets ingestion successfully testedPM 20
M 8First user engagement workshop with HALO-DB demonstrationPM 21
M 9Decentralised storage nodes at 3 partner facilities readyPM 24
M 10FAIR Validation & Performance Evaluation: Metadata checks, DOI assignment, bandwidth benchmarks definedPM 25
M 11Web UI public accessPM 26
M 12Second Community WorkshopPM 29
M 13HALO DB searchable for external servicesPM 31
M 14Final Documentation & Open-Source Release: All code, documentation, and tutorials publicly availablePM 35
M 15Embargo data handling strategy decidedPM 36
M 16Governance Concept & Follow-up Proposal Roadmap Finalized: Sustainability plan approved by all partnersPM 36

Project group (phase I)

skip to:  Top  | > Project goals  |  > Objectives  | > Project schedule  |  > Work packages & leads