Enum datafusion_row::layout::RowType
source · [−]pub enum RowType {
Compact,
WordAligned,
}
Expand description
Type of a RowLayout
Variants
Compact
Stores each field with minimum bytes for space efficiency.
Its typical use case represents a sorting payload that accesses all row fields as a unit.
Each tuple consists of up to three parts: “null bit set
” ,
“values
” and “var length data
”
The null bit set is used for null tracking and is aligned to 1-byte. It stores one bit per field.
In the region of the values, we store the fields in the order they are defined in the schema.
- For fixed-length, sequential access fields, we store them directly. E.g., 4 bytes for int and 1 byte for bool.
- For fixed-length, update often fields, we store one 8-byte word per field.
- For fields of non-primitive or variable-length types, we append their actual content to the end of the var length region and store their offset relative to row base and their length, packed into an 8-byte word.
┌────────────────┬──────────────────────────┬───────────────────────┐ ┌───────────────────────┬────────────┐
│Validity Bitmask│ Fixed Width Field │ Variable Width Field │ ... │ vardata area │ padding │
│ (byte aligned) │ (native type width) │(vardata offset + len) │ │ (variable length) │ bytes │
└────────────────┴──────────────────────────┴───────────────────────┘ └───────────────────────┴────────────┘
For example, given the schema (Int8, Utf8, Float32, Utf8)
Encoding the tuple (1, “FooBar”, NULL, “baz”)
Requires 32 bytes (31 bytes payload and 1 byte padding to make each tuple 8-bytes aligned):
┌──────────┬──────────┬──────────────────────┬──────────────┬──────────────────────┬───────────────────────┬──────────┐
│0b00001011│ 0x01 │0x00000016 0x00000006│ 0x00000000 │0x0000001C 0x00000003│ FooBarbaz │ 0x00 │
└──────────┴──────────┴──────────────────────┴──────────────┴──────────────────────┴───────────────────────┴──────────┘
0 1 2 10 14 22 31 32
WordAligned
This type of layout stores one 8-byte word per field for CPU-friendly, It is mainly used to represent the rows with frequently updated content, for example, grouping state for hash aggregation.