aqnwb 0.3.0
Loading...
Searching...
No Matches
Using Links 🔗

Overview

To avoid copying large arrays and duplicating data, AqNWB supports creating soft-links to existing (!) datasets within the same file. This is accomplished by passing a LinkArrayDataSetConfig when initializing a type. The LinkArrayDataSetConfig constructor only requires the target path. As a general rule, any function that accepts a BaseArrayDataSetConfig should be able to handle both explicit datasets defined via ArrayDataSetConfig or a linked dataset via LinkArrayDataSetConfig (e.g., the initialize methods of TimeSeries and Data and their subtypes).

AqNWB also supports linking to existing Groups (or Datasets) via the createLink method of the I/O, which we can use, e.g., to create a link to a TimeSeries.

Warning
Links must point to targets that comply with the schema requirements to ensure valid NWB files. E.g., a TimeSeries requires that the data has a unit attribute or a VectorData requires corresponding attributes for neurodata_type and namespace. We should, therefore, only create links between schema-compatible targets.

Use Case: Time-Aligned TimeSeries

NWB requires that timestamps in a file are aligned with the timestampReferenceTime. However, time-alignment performed online during acquisition may not be as accurate as post-hoc alignment. When performing post-hoc alignment of timestamps we may want to avoid duplicating the recorded signal. In this case we would like to create multiple TimeSeries that share the same data but have different timestamps.

Key benefits of this approach are:

  • Storage Efficiency: Data is stored only once, saving disk space
  • Consistency: Changes to the original data are automatically reflected in all linked series
  • Flexibility: Each linked series can have its own metadata, timestamps, and attributes
  • NWB Compliance: Soft-links are a standard HDF5 feature fully supported by NWB
  • Best Practices: Aligned data is organized in a ProcessingModule, making the data organization and provenance explicit

When you have multiple TimeSeries that need to reference the same data array but with different timestamps or metadata, you can use LinkArrayDataSetConfig to create soft-links instead of duplicating the data. The example here illustrates the main steps for creating linked TimeSeries.

1. Setup

First, create an NWB file and initialize it:

// Create an NWB file
std::string path = getTestFilePath("testLinkTimeSeriesExample.nwb");
auto io = std::make_shared<IO::HDF5::HDF5IO>(path);
io->open();
auto nwbfile = NWB::NWBFile::create(io);
auto status = nwbfile->initialize(generateUuid());
REQUIRE(status == Status::Success);

2. Create the Original TimeSeries

Create the first TimeSeries with the actual data during acquisition. We create the series in the /acquisition group following NWB best practices. Since the data is regularly sampled, we use starting_time and rate instead of explicit timestamps:

// Create the original TimeSeries with actual data during acquisition
std::string originalSeriesPath =
auto originalSeries = NWB::TimeSeries::create(originalSeriesPath, io);
REQUIRE(originalSeries != nullptr);
// Generate sample data
SizeType numSamples = 1000;
std::vector<float> data(numSamples);
for (size_t i = 0; i < numSamples; ++i) {
data[i] = static_cast<float>(i) * 0.1f; // Mock data
}
// Create configuration for the original data
IO::BaseDataType::F32, // Data type
SizeArray {0}, // Shape: extendable in time dimension
SizeArray {1000}); // Chunking
// Initialize the TimeSeries with data constant sampling rate
double startingTime = 0.0; // Start at 0 seconds
float samplingRate = 1000.0; // 1000 Hz
status = originalSeries->initialize(
dataConfig, // Data configuration
"m/s", // unit for speed of the animal
"Original speed recording of the animal", // description
"Coarse aligned with starting time but not aligned to "
"stimulus events", // comment
1.0f, // conversion
-1.0f, // resolution (not specified)
0.0f, // offset
startingTime, // starting time
samplingRate // sampling rate
);
REQUIRE(status == Status::Success);
// Write data. No timestamps needed since we have regular sampling rate
status = originalSeries->writeData({numSamples}, // dataShape
{0}, // positionOffset
data.data() // dataInput
);
REQUIRE(status == Status::Success);

3. Create a ProcessingModule

Create a ProcessingModule to store the time-aligned data. This follows NWB best practices where processed data generated post-acquisition is organized in processing modules:

// Create a ProcessingModule for time-aligned data
auto processingModule = nwbfile->createProcessingModule("time_alignment");
REQUIRE(processingModule != nullptr);
status = processingModule->initialize(
"Time-aligned data relative to stimulus onset");
REQUIRE(status == Status::Success);

4. Create the Linked TimeSeries

Create a second TimeSeries in the ProcessingModule that links to the first one's data. This step is typically done after acquisition is completed. We use ProcessingModule::createNWBDataInterface() to create the TimeSeries and add it to the processing module. The aligned series uses irregular timestamps with small adjustments to demonstrate the time-alignment use case:

// Create a TimeSeries in the ProcessingModule for the time-aligned data
// The TimeSeries will link to the original data and have its own timestamps
// reflecting the post-hoc alignment to stimulus events
auto linkedSeries =
processingModule->createNWBDataInterface<NWB::TimeSeries>(
"aligned_voltage");
REQUIRE(linkedSeries != nullptr);
// Create link configuration pointing to the original data
std::string linkTarget = mergePaths(originalSeriesPath, "data");
IO::LinkArrayDataSetConfig linkConfig(linkTarget);
// Initialize the linked TimeSeries using the link configuration
// Note: TimeSeries::initialize automatically queries shape and chunking
// from the linked dataset to configure related datasets like timestamps
// accordingly.
status = linkedSeries->initialize(
linkConfig, // Use link instead of creating new data
originalSeries->readDataUnit()
->values()
.data[0], // Same unit as original
"Time-aligned voltage data", // Description
"Aligned to stimulus events with irregular timestamps", // Comment
1.0f, // conversion
-1.0f, // resolution (not specified)
0.0f, // offset
// no sampling rate or starting time needed since we use timestamps
);
REQUIRE(status == Status::Success);
// Simulate time alignment with small adjustments to demonstrate irregular
// sampling that would result from aligning to stimulus events
std::vector<double> newTimestamps(numSamples);
for (size_t i = 0; i < numSamples; ++i) {
// Base offset of 5 seconds plus small jitter to make timestamps irregular
double baseTime = 5.0 + static_cast<double>(i) * 0.001;
double jitter = static_cast<double>(i % 10) * 0.00001;
newTimestamps[i] = baseTime + jitter;
}
// Write the adjusted timestamps to the aligned_voltage TimeSeries
auto timestampRecorder = linkedSeries->recordTimestamps();
status = timestampRecorder->writeDataBlock(
{numSamples}, {0}, IO::BaseDataType::F64, newTimestamps.data());
REQUIRE(status == Status::Success);

5. Link to the Original Series

Create a link to the original TimeSeries in the ProcessingModule to explicitly document the relationship and provenance of the processed data. This is not strictly necessary since the linkage is already documented via the link of the data in the aligned timeseries, however, linking the full original timeseries makes the provenance more explicit and makes the data easier to reuse.

// Create a link to the original series in the ProcessingModule to make the
// relationship explicit
std::string referenceLinkPath =
mergePaths(processingModule->getPath(), "original_series_reference");
status = io->createLink(referenceLinkPath, originalSeries->getPath());
REQUIRE(status == Status::Success);

6. Cleanup

Finalize and close the file:

io->stopRecording();
io->close();

Verification

If you would like to verify that links were created correctly using h5ls:

h5ls -r testLinkTimeSeriesExample.nwb

You should see output similar to:

/acquisition/original_series/data Dataset {1000/Inf}
/processing/time_alignment/aligned_voltage/data Soft Link {/acquisition/original_series/data}
/processing/time_alignment/original_series Soft Link {/acquisition/original_series}