hexsample.fileio
— Input/Output#
This module contains all the I/O related facilities, that is, the basic definition of the file format and the proper classes to create and read back data files.
Warning
We have made the provisional decision to have all the I/O implemented in the HDF5 format, which seems to be widely used and to support all the basic features that we need. That said, there seems to be two actively maintained Python interfaces to the HDF5 standard, implemented according to radically ] different design principle, namely:
It is not trivial for a non expert to really understand which one is more
suited to our purposes, and we opted for pytables
, but this is a part of
the package that might see drastic changes in the future.
Base classes#
The module provides base classes for input and output files, than are then specialized for, e.g., digitized or reconstructed data.
OutputFileBase
is the base class for
output files. Instantiating an object of this class causes the output file to
be opened in write mode, and a header
node to be created that can be
used to store arbitrary user attributes—typically to keep track of the
simulation or reconstruction settings. The file header can bu updated via the
update_header()
call, which
loops over the keyword arguments and set the attributes one at a time.
InputFileBase
is the base class for
input files. Instantiating an object of this class causes the input file to be
opened in read mode and the header information to be rebuilt in the form of a
dictionary.
Digitized data#
The basic content of a digitized event contains all the event-by-event data that would ordinarily be written out by the DAQ, i.e., the trigger identifier, the timestamp, and all the quantities that are necessary in order to uniquely identify the region of interest:
class DigiDescription(tables.IsDescription):
"""Description of the (flat) digi part of the file format.
NOTE: This should be eliminated when the above three classes will be fully
implemented and tested.
"""
# pylint: disable=too-few-public-methods
trigger_id = tables.Int32Col(pos=0)
seconds = tables.Int32Col(pos=1)
microseconds = tables.Int32Col(pos=2)
livetime = tables.Int32Col(pos=3)
min_col = tables.Int16Col(pos=4)
max_col = tables.Int16Col(pos=5)
min_row = tables.Int16Col(pos=6)
max_row = tables.Int16Col(pos=7)
padding_top = tables.Int8Col(pos=8)
padding_right = tables.Int8Col(pos=9)
padding_bottom = tables.Int8Col(pos=10)
padding_left = tables.Int8Col(pos=11)
In addition, the PHA content of the ROI (which is a variable-length array by its
very nature), is encapsulated in a separate VLArray
object in the same
group holding the digi table.
For simulated data, digitized files contain an additional table encapsulating the ground truth information for the event.
class MonteCarloDescription(tables.IsDescription):
"""Description of the Monte Carlo part of the file format.
"""
# pylint: disable=too-few-public-methods
timestamp = tables.Float64Col(pos=0)
energy = tables.Float32Col(pos=1)
absx = tables.Float32Col(pos=2)
absy = tables.Float32Col(pos=3)
absz = tables.Float32Col(pos=4)
num_pairs = tables.Int32Col(pos=5)
Reconstructed data#
Reconstructed files contain all the reconstruction output, that is: all the quantities in the underlying digi file that are relevant for science analysis (i.e., timestamp and ROI size), and all the additional quantities that are calculated as part of the standard reconstruction (i.e., energy and position).
class ReconDescription(tables.IsDescription):
"""Description of the recon file format. This should be common to all the
modes of readout, so it is the same aside from the DigiDescription type.
"""
# pylint: disable=too-few-public-methods
trigger_id = tables.Int32Col(pos=0)
timestamp = tables.Float64Col(pos=1)
livetime = tables.Int32Col(pos=2)
roi_size = tables.Int32Col(pos=3)
cluster_size = tables.Int8Col(pos=4)
energy = tables.Float32Col(pos=5)
posx = tables.Float32Col(pos=6)
posy = tables.Float32Col(pos=7)
Module documentation#
Definition of the file format.
- hexsample.fileio._fill_mc_row(row: Row, event: MonteCarloEvent) None #
Helper function to fill an output table row, given a MonteCarloEvent object.
Note
This would have naturally belonged to the MonteCarloDescription class as a @staticmethod, but doing so is apparently breaking something into the tables internals, and all of a sudden you get an exception due to the fact that a staticmethod cannot be pickled.
- hexsample.fileio._fill_digi_row_base(row: Row, event: DigiEventBase) None #
Helper function to fill an output table row, given a DigiEventBase object.
Note that this method of the base class is not calling the row.append() hook, which is delegated to the implementations in derived classes.
Note
This would have naturally belonged to the DigiDescriptionBase class as a @staticmethod, but doing so is apparently breaking something into the tables internals, and all of a sudden you get an exception due to the fact that a staticmethod cannot be pickled.
- hexsample.fileio._fill_digi_row_sparse(row: Row, event: DigiEventBase) None #
Overloaded method. It uses the _fill_digi_row_base() function for filling the trigger_id and time coordinates of the event.
Note
This would have naturally belonged to the DigiDescriptionSparse class as a @staticmethod, but doing so is apparently breaking something into the tables internals, and all of a sudden you get an exception due to the fact that a staticmethod cannot be pickled.
- hexsample.fileio._fill_digi_row_rectangular(row: Row, event: DigiEventBase) None #
Overloaded method. It uses the _fill_digi_row_base() function for filling the trigger_id and time coordinates of the event.
Note
This would have naturally belonged to the DigiDescriptionRectangular class as a @staticmethod, but doing so is apparently breaking something into the tables internals, and all of a sudden you get an exception due to the fact that a staticmethod cannot be pickled.
- hexsample.fileio._fill_digi_row_circular(row: Row, event: DigiEventBase) None #
Overloaded method. It uses the _fill_digi_row_base() function for filling the trigger_id and time coordinates of the event.
Note
This would have naturally belonged to the DigiDescriptionCircular class as a @staticmethod, but doing so is apparently breaking something into the tables internals, and all of a sudden you get an exception due to the fact that a staticmethod cannot be pickled.
- hexsample.fileio._fill_digi_row(row: Row, event: DigiEventBase) None #
Helper function to fill an output table row, given a DigiEvent object.
Note
This would have naturally belonged to the DigiDescription class as a @staticmethod, but doing so is apparently breaking something into the tables internals, and all of a sudden you get an exception due to the fact that a staticmethod cannot be pickled.
- hexsample.fileio._fill_recon_row(row: Row, event: ReconEvent) None #
Helper function to fill an output table row, given a ReconEvent object.
Note
This would have naturally belonged to the ReconDescription class as a @staticmethod, but doing so is apparently breaking something into the tables internals, and all of a sudden you get an exception due to the fact that a staticmethod cannot be pickled.
- class hexsample.fileio.FileType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)#
Enum class for the different file types. ** IS IT POSSIBLE TO DEFINE SUBCLASSES FOR DIGI? **
- DIGI = 'Digi'#
- RECON = 'Recon'#
- class hexsample.fileio.OutputFileBase(file_path: str)#
Base class for output files.
The base class has the responsibility of opening the output file and create a header node to store all the necessary metadata. Subclasses can use the update_header() hook to write arbitrary user attributes in the header node.
Note this is a purely virtual class, and subclasses should reimplement the add_row() and flush() methods.
- Parameters:
file_path (str) – The path to the file on disk.
- _DATE_FORMAT = '%a, %d %b %Y %H:%M:%S %z'#
- _FILE_TYPE = None#
- update_header(**kwargs) None #
Update the user attributes in the header group.
- static _set_user_attribute(group: Group, name: str, value: Any) None #
Set a user attribute for a given group.
- static update_user_attributes(group: Group, **kwargs) None #
Update the user attributes for a given group.
The basic rules, here, are that all the keys of the keyword arguments must be string, and the values can be arbitrary data types. Following up on the discussion at https://www.pytables.org/usersguide/tutorials.html we write the keywords arguments one at a time (as opposed to the entire dictionary all at once) and make an effort to convert the Python types to native numpy arrays when that is not performed automatically (e.g., for lists and tuples). This avoids the need for serializing the Python data and should ensure that the output file can be read with any (be it Python-aware or not) application.
- add_row(*args) None #
Virtual function to add a row to the output file.
This needs to be reimplemented in derived classes.
- flush() None #
Virtual function to flush the file.
This needs to be reimplemented in derived classes.
- class hexsample.fileio.DigiOutputFileSparse(file_path: str)#
Description of a sparse digitized output file.
This can represent either a digitized files written by the DAQ, or the output of a simulation, in which case it contains an additional group and table for the Monte Carlo ground truth.
- Parameters:
file_path (str) – The path to the file on disk.
- _FILE_TYPE = 'Digi'#
- DIGI_TABLE_SPECS = ('digi_table', <class 'tables.description.DigiDescriptionSparse'>, 'Digi data')#
- COLUMNS_ARRAY_SPECS = ('columns', Int32Atom(shape=(), dflt=0))#
- ROWS_ARRAY_SPECS = ('rows', Int32Atom(shape=(), dflt=0))#
- PHA_ARRAY_SPECS = ('pha', Int32Atom(shape=(), dflt=0))#
- MC_TABLE_SPECS = ('mc_table', <class 'tables.description.MonteCarloDescription'>, 'Monte Carlo data')#
- add_row(digi_event: DigiEventSparse, mc_event: MonteCarloEvent) None #
Add one row to the file.
- Parameters:
digi (DigiEventSparse) – The digitized event contribution.
mc (MonteCarloEvent) – The Monte Carlo event contribution.
- flush() None #
Flush the basic file components.
- class hexsample.fileio.DigiOutputFileRectangular(file_path: str)#
Description of a rectangular digitized output file.
This can represent either a digitized files written by the DAQ, or the output of a simulation, in which case it contains an additional group and table for the Monte Carlo ground truth.
- Parameters:
file_path (str) – The path to the file on disk.
- _FILE_TYPE = 'Digi'#
- DIGI_TABLE_SPECS = ('digi_table', <class 'tables.description.DigiDescriptionRectangular'>, 'Digi data')#
- PHA_ARRAY_SPECS = ('pha', Int32Atom(shape=(), dflt=0))#
- MC_TABLE_SPECS = ('mc_table', <class 'tables.description.MonteCarloDescription'>, 'Monte Carlo data')#
- add_row(digi_event: DigiEventRectangular, mc_event: MonteCarloEvent) None #
Add one row to the file.
- Parameters:
digi (DigiEventRectangular) – The digitized event contribution.
mc (MonteCarloEvent) – The Monte Carlo event contribution.
- flush() None #
Flush the basic file components.
- class hexsample.fileio.DigiOutputFileCircular(file_path: str)#
Description of a circular digitized output file.
This can represent either a digitized files written by the DAQ, or the output of a simulation, in which case it contains an additional group and table for the Monte Carlo ground truth.
- Parameters:
file_path (str) – The path to the file on disk.
- _FILE_TYPE = 'Digi'#
- DIGI_TABLE_SPECS = ('digi_table', <class 'tables.description.DigiDescriptionCircular'>, 'Digi data')#
- MC_TABLE_SPECS = ('mc_table', <class 'tables.description.MonteCarloDescription'>, 'Monte Carlo data')#
- add_row(digi_event: DigiEventCircular, mc_event: MonteCarloEvent) None #
Add one row to the file.
- Parameters:
digi (DigiEventCircular) – The digitized event contribution.
mc (MonteCarloEvent) – The Monte Carlo event contribution.
- flush() None #
Flush the basic file components.
- hexsample.fileio.digioutput_class(mode: HexagonalReadoutMode)#
Return the proper class to be used as DigiOutputFile, depending on the readout mode of the chip.
- class hexsample.fileio.ReconOutputFile(file_path: str)#
Description of a reconstructed output file. This should be the same for all types of DigiEvent.
- Parameters:
file_path (str) – The path to the file on disk.
- _FILE_TYPE = 'Recon'#
- RECON_TABLE_SPECS = ('recon_table', <class 'tables.description.ReconDescription'>, 'Recon data')#
- MC_TABLE_SPECS = ('mc_table', <class 'tables.description.MonteCarloDescription'>, 'Monte Carlo data')#
- update_digi_header(**kwargs)#
Update the user arguments in the digi header group.
- add_row(recon_event: ReconEvent, mc_event: MonteCarloEvent) None #
Add one row to the file.
- Parameters:
digi (DigiEventRectangular) – The digitized event contribution.
mc (MonteCarloEvent) – The Monte Carlo event contribution.
- flush() None #
Flush the basic file components.
- class hexsample.fileio.InputFileBase(file_path: str)#
Base class for input files.
- static _user_attributes(group: Group) dict #
Return all the user attributes for a given group in the form of a Python dictionary.
This is used, e.g, to rebuild the header information.
- header_value(key: str, default: Any = None) Any #
Return the header value corresponding to a given key.
- class hexsample.fileio.DigiInputFileBase(file_path: str)#
- column(name: str) ndarray #
Return a given column in the digi table.
- mc_column(name: str) ndarray #
Return a given column in the Monte Carlo table.
- mc_event(row_index: int) MonteCarloEvent #
Random access to the MonteCarloEvent part of the event contribution.
- Parameters:
row_index (int) – The index of the target row in the event file.
- class hexsample.fileio.DigiInputFileSparse(file_path: str)#
Description of a sparse digitized input file.
This has a very simple interface: we cache references to the relevant tables when we open the file and we provide methods to reassemble a specific table row into the corresponding DigiEvent or MonteCarloEvent objects, along with an implementation of the iterator protocol to make event loops easier.
- digi_event(row_index: int) DigiEventSparse #
Random access to the DigiEventSparse part of the event contribution.
- Parameters:
row_index (int) – The index of the target row in the event file.
- class hexsample.fileio.DigiInputFileRectangular(file_path: str)#
Description of a rectangular digitized input file.
This has a very simple interface: we cache references to the relevant tables when we open the file and we provide methods to reassemble a specific table row into the corresponding DigiEvent or MonteCarloEvent objects, along with an implementation of the iterator protocol to make event loops easier.
- digi_event(row_index: int) DigiEventRectangular #
Random access to the DigiEvent part of the event contribution.
- Parameters:
row_index (int) – The index of the target row in the event file.
- class hexsample.fileio.DigiInputFileCircular(file_path: str)#
Description of a circular digitized input file.
This has a very simple interface: we cache references to the relevant tables when we open the file and we provide methods to reassemble a specific table row into the corresponding DigiEvent or MonteCarloEvent objects, along with an implementation of the iterator protocol to make event loops easier.
- digi_event(row_index: int) DigiEventCircular #
Random access to the DigiEventSparse part of the event contribution.
- Parameters:
row_index (int) – The index of the target row in the event file.
- class hexsample.fileio.ReconInputFile(file_path: str)#
Description of a reconstructed input file.
- column(name: str) ndarray #
Return a given column in the recon table.
- mc_column(name: str) ndarray #
Return a given column in the Monte Carlo table.
- hexsample.fileio.peek_file_type(file_path: str) FileType #
Peek into the header of a HDF5 file and determing the file type.
- Parameters:
file_path (str) – The path to the input file.
- hexsample.fileio.peek_readout_type(file_path: str) HexagonalReadoutMode #
Peek into the header of a HDF5 Digi file and determing the readout type.
- Parameters:
file_path (str) – The path to the input file.
- hexsample.fileio.open_input_file(file_path: str) InputFileBase #
Open an input file automatically determining the file type and readout type.
- Parameters:
file_path (str) – The path to the output file.