NWB is more than just a file format; it defines an ecosystem of tools, methods, and standards for storing, sharing, and analyzing complex neurophysiology data. The following provides a high-level overview of the main components for the NWB:N data standardization ecosystem. For each component we provide an overview of the problem, its function, and a description.
Problem: Definition of neuroscience data standards.
Approach: To support the formal and verifiable specification of neurodata file formats, NWB:N relies on the NWB:N specification language.
Function: The specification language provides mechanism to formally specify the organization of data.
Description: The specification language is defined in YAML (or JSON). The specification language defines formal structures for describing the organization of complex data using basic concepts, e.g., Groups, Datasets, Attributes, and Links. The specification language is used to extend the format, which is necessary to store types of data that are not currently managed by the format
Problem: Efficient interaction with neuroscience data.
Approach: Develop APIs that provide easy-to-use representations of NWB:N neurodata types for programmatic use and enable the mapping of these representations to/from data storage based on the NWB:N format specification.
Function: The role of data API(s) is to facilitate efficient interaction with neuroscience data stored in the NWB:N data format (e.g,. for reading, writing, querying, and analyzing neuroscience data). An API provides a stable and usable interface for programmatic use and development of new applications. The API should insulate developers and users from implementation details related to the specification language, format specification, and data storage.
Description: NWB:N currently provides the following APIs
- PyNWB: Python reference API for NWB:N 2 to read, write, use, extend, and analyze data stored in NWB:N. Documentation . Sources (GitHub)
- MatNWB: Matlab API for NWB:N. Documentation . Sources (GitHub)
Community Software: In addition to the core APIs developed by the NWB team, there is a growing collection of software tools and libraries that support NWB. See our Analysis and Visualization Tools page for a list of tools that support NWB.
Problem: Storage of large collections of neuroscience data.
Approach: The NWB:N format currently uses the Hierarchical Data Format (HDF5) as primary storage mechanism.
Function: Data storage maps NWB:N primitives (Groups, Datasets, Attributes, Links etc.) to storage. In the case of HDF5 this is currently mostly a 1-to-1 mapping as the NWB:N primitives largely match HDF5 primitives.
Description: HDF5 was selected for the NWB:N format because it meets several key requirements. First, HDF5 it is a mature data format standard with libraries available in multiple programming languages. Second, HDF5’s hierarchical structure allows data to be grouped into logical self-documenting sections. The HDF5 structure is analogous to a file system in which its “groups” and “datasets” correspond to directories and files. Groups and datasets can have attributes that provide additional details, such as authorities’ identifiers. Third, the HDF5 linking feature enables data stored in one location to be transparently accessed from multiple locations in the hierarchy. The linked data can be external to the file. Fourth, HDF5 is widely supported across programming languages (e.g., C, C++, Python, MATLAB, R among others) and tools, such as, HDFView, a free, cross-platform application, can be used to open a file and browse data. Fifth, the HDF Group, a nonprofit group, ensures the ongoing accessibility of HDF-stored data.
Problem: Organization of complex collections of neuroscience data.
Approach: Organize data hierarchically using easy-to-use primitives, e.g., Groups (similar to Folders), Datasets (n-D Arrays), Attributes (Metadata objects on Groups and Datasets), and Links (links to Groups and Datasets).
Function: The format specification formally specifies the organization of neuroscience data. The format specification provides a verifiable, computer and human readable document that governs the NWB:N format. The format specification is, hence, central to support development of API’s and codes compliant with the NWB format and extension of the NWB format.
Description: The NWB:N format standard is governed by a formal format specification, the NWB:N schema that is formally specified using the NWB specification language. A new schema file will be published for each revision of the NWB format standard. Developers can use the schema to validate NWB files or create advanced APIs for NWB data.
NWB provides users with a mature software ecosystem that enables users to: 1) efficiently create and use NWB data files via Python and Matlab APIs, 2) extend the data standard via Neuro Data Extensions (NDX), 3) share and deploy these extensions to the community via the extensions catalog, and iv) explore, convert, and document NWB data via high-level tools and utilities. This software ecosystem is essential to enable efficient use and maintenance of the data standard and to foster the creation of a broad ecosystem of community software tools and methods around NWB. Through its use of state-of-the-art data technologies, NWB improves efficiency and makes most modern high-performance computing capabilities accessible to neurophysiology. We are not aware of any other neurophysiology format that has all of these qualities.
NWB uses a modern, modular software architecture. All our software libraries and tools are open source and available via our GitHub organizations. The developer perspective of the NWB software stack is an extended version of the User View; in addition to the software for using NWB, the developers view also includes all software needed to create and maintain the NWB data standard and software and their dependencies.
The NWB Core software defines the core functionality of the NWB and is developed and maintained by the core NWB team in collaboration with and contributions from the broader NWB developer community. The NWB core consists of: 1) Data APIs for reading/writing NWB files and creating NWB extensions, 2) Data Modeling software that build the foundation for creating, maintaining, using, and extending the NWB data standard, 3) Data Standard Specification defining the schema of the NWB data standard and extensions, and 4) Foundational Documents & Definitions.
The Extended NWB Core includes high-level tools and utilities for NWB that allow users to explore, convert, and document NWB data. These NWB-specific tools and utilities are often created and maintained primarily by the members of the NWB community.
The broader NWB software ecosystem also includes a growing number of Community Software for data management, visualization, analysis, processing, and acquisition that support the use and/or export of data in the NWB format.
NWB technologies are at the heart of the neurodata lifecycle. Data standards are a critical conduit that facilitate the flow of data throughout the data lifecycle as well as the integration of data and software across all phases of the data lifecycle. As such, NWB needs to support the needs and integrate with technologies across the data lifecycle. In this process, our goal is to work with (not compete with) existing and emerging technologies as much as possible.