Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
JoinHashMapType in datafusion::physical_plan::joins::utils - Rust
[go: Go Back, main page]

Trait JoinHashMapType

Source
pub trait JoinHashMapType: Send + Sync {
    // Required methods
    fn extend_zero(&mut self, len: usize);
    fn update_from_iter<'a>(
        &mut self,
        iter: Box<dyn Iterator<Item = (usize, &'a u64)> + Send + 'a>,
        deleted_offset: usize,
    );
    fn get_matched_indices<'a>(
        &self,
        iter: Box<dyn Iterator<Item = (usize, &'a u64)> + 'a>,
        deleted_offset: Option<usize>,
    ) -> (Vec<u32>, Vec<u64>);
    fn get_matched_indices_with_limit_offset(
        &self,
        hash_values: &[u64],
        limit: usize,
        offset: (usize, Option<u64>),
    ) -> (Vec<u32>, Vec<u64>, Option<(usize, Option<u64>)>);
    fn is_empty(&self) -> bool;
}
Expand description

Maps a u64 hash value based on the build side [“on” values] to a list of indices with this key’s value.

By allocating a HashMap with capacity for at least the number of rows for entries at the build side, we make sure that we don’t have to re-hash the hashmap, which needs access to the key (the hash in this case) value.

E.g. 1 -> [3, 6, 8] indicates that the column values map to rows 3, 6 and 8 for hash value 1 As the key is a hash value, we need to check possible hash collisions in the probe stage During this stage it might be the case that a row is contained the same hashmap value, but the values don’t match. Those are checked in the equal_rows_arr method.

The indices (values) are stored in a separate chained list stored as Vec<u32> or Vec<u64>.

The first value (+1) is stored in the hashmap, whereas the next value is stored in array at the position value.

The chain can be followed until the value “0” has been reached, meaning the end of the list. Also see chapter 5.3 of Balancing vectorized query execution with bandwidth-optimized storage

§Example

See the example below:

Insert (10,1)            <-- insert hash value 10 with row index 1
map:
----------
| 10 | 2 |
----------
next:
---------------------
| 0 | 0 | 0 | 0 | 0 |
---------------------
Insert (20,2)
map:
----------
| 10 | 2 |
| 20 | 3 |
----------
next:
---------------------
| 0 | 0 | 0 | 0 | 0 |
---------------------
Insert (10,3)           <-- collision! row index 3 has a hash value of 10 as well
map:
----------
| 10 | 4 |
| 20 | 3 |
----------
next:
---------------------
| 0 | 0 | 0 | 2 | 0 |  <--- hash value 10 maps to 4,2 (which means indices values 3,1)
---------------------
Insert (10,4)          <-- another collision! row index 4 ALSO has a hash value of 10
map:
---------
| 10 | 5 |
| 20 | 3 |
---------
next:
---------------------
| 0 | 0 | 0 | 2 | 4 | <--- hash value 10 maps to 5,4,2 (which means indices values 4,3,1)
---------------------

Here we have an option between creating a JoinHashMapType using u32 or u64 indices based on how many rows were being used for indices.

At runtime we choose between using JoinHashMapU32 and JoinHashMapU64 which oth implement JoinHashMapType.

Required Methods§

Source

fn extend_zero(&mut self, len: usize)

Source

fn update_from_iter<'a>( &mut self, iter: Box<dyn Iterator<Item = (usize, &'a u64)> + Send + 'a>, deleted_offset: usize, )

Source

fn get_matched_indices<'a>( &self, iter: Box<dyn Iterator<Item = (usize, &'a u64)> + 'a>, deleted_offset: Option<usize>, ) -> (Vec<u32>, Vec<u64>)

Source

fn get_matched_indices_with_limit_offset( &self, hash_values: &[u64], limit: usize, offset: (usize, Option<u64>), ) -> (Vec<u32>, Vec<u64>, Option<(usize, Option<u64>)>)

Source

fn is_empty(&self) -> bool

Returns true if the join hash map contains no entries.

Implementors§