Struct datafusion::physical_plan::repartition::RepartitionExec

source ·

pub struct RepartitionExec { /* private fields */ }

Expand description

Maps N input partitions to M output partitions based on a Partitioning scheme.

Background

DataFusion, like most other commercial systems, with the the notable exception of DuckDB, uses the “Exchange Operator” based approach to parallelism which works well in practice given sufficient care in implementation.

DataFusion’s planner picks the target number of partitions and then RepartionExec redistributes RecordBatches to that number of output partitions.

For example, given target_partitions=3 (trying to use 3 cores) but scanning an input with 2 partitions, RepartitionExec can be used to get 3 even streams of RecordBatches

        ▲                  ▲                  ▲
        │                  │                  │
        │                  │                  │
        │                  │                  │
┌───────────────┐  ┌───────────────┐  ┌───────────────┐
│    GroupBy    │  │    GroupBy    │  │    GroupBy    │
│   (Partial)   │  │   (Partial)   │  │   (Partial)   │
└───────────────┘  └───────────────┘  └───────────────┘
        ▲                  ▲                  ▲
        └──────────────────┼──────────────────┘
                           │
              ┌─────────────────────────┐
              │     RepartitionExec     │
              │   (hash/round robin)    │
              └─────────────────────────┘
                         ▲   ▲
             ┌───────────┘   └───────────┐
             │                           │
             │                           │
        .─────────.                 .─────────.
     ,─'           '─.           ,─'           '─.
    ;      Input      :         ;      Input      :
    :   Partition 0   ;         :   Partition 1   ;
     ╲               ╱           ╲               ╱
      '─.         ,─'             '─.         ,─'
         `───────'                   `───────'

Output Ordering

No guarantees are made about the order of the resulting partitions unless preserve_order is set.

Footnote

The “Exchange Operator” was first described in the 1989 paper Encapsulation of parallelism in the Volcano query processing system Paper which uses the term “Exchange” for the concept of repartitioning data across threads.

Struct datafusion::physical_plan::repartition::RepartitionExec

Implementations§

impl RepartitionExec

pub fn input(&self) -> &Arc<dyn ExecutionPlan>

pub fn partitioning(&self) -> &Partitioning

pub fn preserve_order(&self) -> bool

pub fn name(&self) -> &str

impl RepartitionExec

pub fn try_new( input: Arc<dyn ExecutionPlan>, partitioning: Partitioning ) -> Result<Self>

pub fn with_preserve_order(self, preserve_order: bool) -> Self

Trait Implementations§

impl Debug for RepartitionExec

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl DisplayAs for RepartitionExec

fn fmt_as(&self, t: DisplayFormatType, f: &mut Formatter<'_>) -> Result

impl ExecutionPlan for RepartitionExec

fn as_any(&self) -> &dyn Any

fn schema(&self) -> SchemaRef

fn unbounded_output(&self, children: &[bool]) -> Result<bool>

fn children(&self) -> Vec<Arc<dyn ExecutionPlan>>

fn with_new_children( self: Arc<Self>, children: Vec<Arc<dyn ExecutionPlan>> ) -> Result<Arc<dyn ExecutionPlan>>

fn benefits_from_input_partitioning(&self) -> Vec<bool>

fn output_partitioning(&self) -> Partitioning

fn output_ordering(&self) -> Option<&[PhysicalSortExpr]>

fn maintains_input_order(&self) -> Vec<bool>

fn equivalence_properties(&self) -> EquivalenceProperties

fn ordering_equivalence_properties(&self) -> OrderingEquivalenceProperties

fn execute( &self, partition: usize, context: Arc<TaskContext> ) -> Result<SendableRecordBatchStream>

fn metrics(&self) -> Option<MetricsSet>

fn statistics(&self) -> Statistics

fn required_input_distribution(&self) -> Vec<Distribution>

fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirement>>>

fn file_scan_config(&self) -> Option<&FileScanConfig>

Auto Trait Implementations§

impl !RefUnwindSafe for RepartitionExec

impl Send for RepartitionExec

impl Sync for RepartitionExec

impl Unpin for RepartitionExec

impl !UnwindSafe for RepartitionExec

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> Same<T> for T

type Output = T

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

fn vzip(self) -> V