|
sparrow-ipc 1.0.1
|
A class for serializing Apache Arrow record batches to the IPC file format. More...
#include <stream_file_serializer.hpp>
Public Member Functions | |
| template<writable_stream TStream> | |
| stream_file_serializer (TStream &stream, std::optional< CompressionType > compression=std::nullopt) | |
| Constructs a stream_file_serializer object with a reference to a stream. | |
| template<writable_stream TStream> | |
| stream_file_serializer (TStream &stream, const sparrow::record_batch &schema_batch, std::optional< CompressionType > compression=std::nullopt) | |
| Constructs a stream_file_serializer object with a reference to a stream and a schema. | |
| ~stream_file_serializer () | |
| Destructor for the stream_file_serializer. | |
| void | write (const sparrow::record_batch &rb) |
| Writes a single record batch to the file. | |
| template<std::ranges::input_range R> requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch> | |
| void | write (const R &record_batches) |
| Writes a collection of record batches to the file. | |
| stream_file_serializer & | operator<< (const sparrow::record_batch &rb) |
| template<std::ranges::input_range R> requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch> | |
| stream_file_serializer & | operator<< (const R &record_batches) |
| stream_file_serializer & | operator<< (stream_file_serializer &(*manip)(stream_file_serializer &)) |
| void | end () |
| Finalizes the file serialization by writing footer and trailing magic bytes. | |
Public Attributes | |
| bool | m_header_written {false} |
| bool | m_schema_received {false} |
| std::optional< sparrow::record_batch > | m_first_record_batch |
| std::vector< sparrow::data_type > | m_dtypes |
| any_output_stream | m_stream |
| bool | m_ended {false} |
| std::optional< CompressionType > | m_compression |
| dictionary_tracker | m_dict_tracker |
| std::vector< record_batch_block > | m_dictionary_blocks |
| std::vector< record_batch_block > | m_record_batch_blocks |
A class for serializing Apache Arrow record batches to the IPC file format.
The stream_file_serializer class provides functionality to serialize single or multiple record batches into the Arrow IPC file format suitable for storage. It ensures schema consistency across multiple record batches and optimizes memory allocation by pre-calculating required buffer sizes.
The stream_file_serializer follows the Arrow IPC file format specification:
The class validates that all record batches have consistent schemas and throws std::invalid_argument if inconsistencies are detected.
Definition at line 70 of file stream_file_serializer.hpp.
|
inline |
Constructs a stream_file_serializer object with a reference to a stream.
| TStream | The type of the stream to be used for serialization. |
| stream | Reference to the stream object that will be used for serialization operations. The serializer stores a pointer to this stream for later use. |
| compression | Optional compression type to apply to record batch bodies. |
Definition at line 83 of file stream_file_serializer.hpp.
|
inline |
Constructs a stream_file_serializer object with a reference to a stream and a schema.
This constructor allows establishing the schema for the file immediately, which is useful when the number of record batches is zero or when the schema is known upfront.
| TStream | The type of the stream to be used for serialization. |
| stream | Reference to the stream object that will be used for serialization operations. |
| schema_batch | A record batch containing the schema for the file. The data in this batch is NOT written to the file; only its schema is used. |
| compression | Optional compression type to apply to record batch bodies. |
Definition at line 102 of file stream_file_serializer.hpp.
| sparrow_ipc::stream_file_serializer::~stream_file_serializer | ( | ) |
Destructor for the stream_file_serializer.
Ensures proper cleanup by calling end() if the serializer has not been explicitly ended. This guarantees that the complete file format (including footer and trailing magic bytes) is written before the object is destroyed.
| void sparrow_ipc::stream_file_serializer::end | ( | ) |
Finalizes the file serialization by writing footer and trailing magic bytes.
This method completes the Arrow IPC file format by:
It can be called multiple times safely as it tracks whether the file has already been ended to prevent duplicate operations.
| std::runtime_error | if no record batches have been written |
|
inline |
Definition at line 292 of file stream_file_serializer.hpp.
|
inline |
Definition at line 266 of file stream_file_serializer.hpp.
|
inline |
Definition at line 312 of file stream_file_serializer.hpp.
|
inline |
Writes a collection of record batches to the file.
This method efficiently adds multiple record batches to the serialization stream by first calculating the total required size and reserving memory space to minimize reallocations during the append operations.
| R | The type of the record batch collection (must be iterable) |
| record_batches | A collection of record batches to append to the file |
| std::runtime_error | if the serializer has been ended |
| std::invalid_argument | if any record batch schema doesn't match |
The method performs the following operations:
Definition at line 161 of file stream_file_serializer.hpp.
| void sparrow_ipc::stream_file_serializer::write | ( | const sparrow::record_batch & | rb | ) |
Writes a single record batch to the file.
| rb | The record batch to write to the file |
| std::runtime_error | if the serializer has been ended |
| std::invalid_argument | if the record batch schema doesn't match the established schema |
| std::optional<CompressionType> sparrow_ipc::stream_file_serializer::m_compression |
Definition at line 341 of file stream_file_serializer.hpp.
| dictionary_tracker sparrow_ipc::stream_file_serializer::m_dict_tracker |
Definition at line 342 of file stream_file_serializer.hpp.
| std::vector<record_batch_block> sparrow_ipc::stream_file_serializer::m_dictionary_blocks |
Definition at line 343 of file stream_file_serializer.hpp.
| std::vector<sparrow::data_type> sparrow_ipc::stream_file_serializer::m_dtypes |
Definition at line 338 of file stream_file_serializer.hpp.
| bool sparrow_ipc::stream_file_serializer::m_ended {false} |
Definition at line 340 of file stream_file_serializer.hpp.
| std::optional<sparrow::record_batch> sparrow_ipc::stream_file_serializer::m_first_record_batch |
Definition at line 337 of file stream_file_serializer.hpp.
| bool sparrow_ipc::stream_file_serializer::m_header_written {false} |
Definition at line 335 of file stream_file_serializer.hpp.
| std::vector<record_batch_block> sparrow_ipc::stream_file_serializer::m_record_batch_blocks |
Definition at line 344 of file stream_file_serializer.hpp.
| bool sparrow_ipc::stream_file_serializer::m_schema_received {false} |
Definition at line 336 of file stream_file_serializer.hpp.
| any_output_stream sparrow_ipc::stream_file_serializer::m_stream |
Definition at line 339 of file stream_file_serializer.hpp.