Overview
To avoid copying large arrays and duplicating data, AqNWB supports creating soft-links to existing (!) datasets within the same file. This is accomplished by passing a LinkArrayDataSetConfig when initializing a type. The LinkArrayDataSetConfig constructor only requires the target path. As a general rule, any function that accepts a BaseArrayDataSetConfig should be able to handle both explicit datasets defined via ArrayDataSetConfig or a linked dataset via LinkArrayDataSetConfig (e.g., the initialize methods of TimeSeries and Data and their subtypes).
AqNWB also supports linking to existing Groups (or Datasets) via the createLink method of the I/O, which we can use, e.g., to create a link to a TimeSeries.
- Warning
- Links must point to targets that comply with the schema requirements to ensure valid NWB files. E.g., a TimeSeries requires that the data has a unit attribute or a VectorData requires corresponding attributes for neurodata_type and namespace. We should, therefore, only create links between schema-compatible targets.
Use Case: Time-Aligned TimeSeries
NWB requires that timestamps in a file are aligned with the timestampReferenceTime. However, time-alignment performed online during acquisition may not be as accurate as post-hoc alignment. When performing post-hoc alignment of timestamps we may want to avoid duplicating the recorded signal. In this case we would like to create multiple TimeSeries that share the same data but have different timestamps.
Key benefits of this approach are:
- Storage Efficiency: Data is stored only once, saving disk space
- Consistency: Changes to the original data are automatically reflected in all linked series
- Flexibility: Each linked series can have its own metadata, timestamps, and attributes
- NWB Compliance: Soft-links are a standard HDF5 feature fully supported by NWB
- Best Practices: Aligned data is organized in a ProcessingModule, making the data organization and provenance explicit
When you have multiple TimeSeries that need to reference the same data array but with different timestamps or metadata, you can use LinkArrayDataSetConfig to create soft-links instead of duplicating the data. The example here illustrates the main steps for creating linked TimeSeries.
1. Setup
First, create an NWB file and initialize it:
std::string path = getTestFilePath("testLinkTimeSeriesExample.nwb");
auto io = std::make_shared<IO::HDF5::HDF5IO>(path);
io->open();
REQUIRE(status == Status::Success);
2. Create the Original TimeSeries
Create the first TimeSeries with the actual data during acquisition. We create the series in the /acquisition group following NWB best practices. Since the data is regularly sampled, we use starting_time and rate instead of explicit timestamps:
std::string originalSeriesPath =
REQUIRE(originalSeries != nullptr);
std::vector<float> data(numSamples);
for (size_t i = 0; i < numSamples; ++i) {
data[i] = static_cast<float>(i) * 0.1f;
}
double startingTime = 0.0;
float samplingRate = 1000.0;
status = originalSeries->initialize(
dataConfig,
"m/s",
"Original speed recording of the animal",
"Coarse aligned with starting time but not aligned to "
"stimulus events",
1.0f,
-1.0f,
0.0f,
startingTime,
samplingRate
);
REQUIRE(status == Status::Success);
status = originalSeries->writeData({numSamples},
{0},
data.data()
);
REQUIRE(status == Status::Success);
3. Create a ProcessingModule
Create a ProcessingModule to store the time-aligned data. This follows NWB best practices where processed data generated post-acquisition is organized in processing modules:
auto processingModule = nwbfile->createProcessingModule("time_alignment");
REQUIRE(processingModule != nullptr);
status = processingModule->initialize(
"Time-aligned data relative to stimulus onset");
REQUIRE(status == Status::Success);
4. Create the Linked TimeSeries
Create a second TimeSeries in the ProcessingModule that links to the first one's data. This step is typically done after acquisition is completed. We use ProcessingModule::createNWBDataInterface() to create the TimeSeries and add it to the processing module. The aligned series uses irregular timestamps with small adjustments to demonstrate the time-alignment use case:
auto linkedSeries =
"aligned_voltage");
REQUIRE(linkedSeries != nullptr);
std::string linkTarget =
mergePaths(originalSeriesPath,
"data");
status = linkedSeries->initialize(
linkConfig,
originalSeries->readDataUnit()
->values()
.data[0],
"Time-aligned voltage data",
"Aligned to stimulus events with irregular timestamps",
1.0f,
-1.0f,
0.0f,
);
REQUIRE(status == Status::Success);
std::vector<double> newTimestamps(numSamples);
for (size_t i = 0; i < numSamples; ++i) {
double baseTime = 5.0 + static_cast<double>(i) * 0.001;
double jitter = static_cast<double>(i % 10) * 0.00001;
newTimestamps[i] = baseTime + jitter;
}
auto timestampRecorder = linkedSeries->recordTimestamps();
status = timestampRecorder->writeDataBlock(
REQUIRE(status == Status::Success);
5. Link to the Original Series
Create a link to the original TimeSeries in the ProcessingModule to explicitly document the relationship and provenance of the processed data. This is not strictly necessary since the linkage is already documented via the link of the data in the aligned timeseries, however, linking the full original timeseries makes the provenance more explicit and makes the data easier to reuse.
std::string referenceLinkPath =
mergePaths(processingModule->getPath(),
"original_series_reference");
status = io->createLink(referenceLinkPath, originalSeries->getPath());
REQUIRE(status == Status::Success);
6. Cleanup
Finalize and close the file:
io->stopRecording();
io->close();
Verification
If you would like to verify that links were created correctly using h5ls:
h5ls -r testLinkTimeSeriesExample.nwb
You should see output similar to:
/acquisition/original_series/data Dataset {1000/Inf}
/processing/time_alignment/aligned_voltage/data Soft Link {/acquisition/original_series/data}
/processing/time_alignment/original_series Soft Link {/acquisition/original_series}