Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
RepartitionExec in datafusion::physical_plan::repartition - Rust
[go: Go Back, main page]

pub struct RepartitionExec { /* private fields */ }
Expand description

Maps N input partitions to M output partitions based on a Partitioning scheme.

Background

DataFusion, like most other commercial systems, with the the notable exception of DuckDB, uses the “Exchange Operator” based approach to parallelism which works well in practice given sufficient care in implementation.

DataFusion’s planner picks the target number of partitions and then RepartionExec redistributes RecordBatches to that number of output partitions.

For example, given target_partitions=3 (trying to use 3 cores) but scanning an input with 2 partitions, RepartitionExec can be used to get 3 even streams of RecordBatches

        ▲                  ▲                  ▲
        │                  │                  │
        │                  │                  │
        │                  │                  │
┌───────────────┐  ┌───────────────┐  ┌───────────────┐
│    GroupBy    │  │    GroupBy    │  │    GroupBy    │
│   (Partial)   │  │   (Partial)   │  │   (Partial)   │
└───────────────┘  └───────────────┘  └───────────────┘
        ▲                  ▲                  ▲
        └──────────────────┼──────────────────┘
                           │
              ┌─────────────────────────┐
              │     RepartitionExec     │
              │   (hash/round robin)    │
              └─────────────────────────┘
                         ▲   ▲
             ┌───────────┘   └───────────┐
             │                           │
             │                           │
        .─────────.                 .─────────.
     ,─'           '─.           ,─'           '─.
    ;      Input      :         ;      Input      :
    :   Partition 0   ;         :   Partition 1   ;
     ╲               ╱           ╲               ╱
      '─.         ,─'             '─.         ,─'
         `───────'                   `───────'

Output Ordering

No guarantees are made about the order of the resulting partitions unless preserve_order is set.

Footnote

The “Exchange Operator” was first described in the 1989 paper Encapsulation of parallelism in the Volcano query processing system Paper which uses the term “Exchange” for the concept of repartitioning data across threads.

Implementations§

source§

impl RepartitionExec

source

pub fn input(&self) -> &Arc<dyn ExecutionPlan>

Input execution plan

source

pub fn partitioning(&self) -> &Partitioning

Partitioning scheme to use

source

pub fn preserve_order(&self) -> bool

Get preserve_order flag of the RepartitionExecutor true means SortPreservingRepartitionExec, false means RepartitionExec

source

pub fn name(&self) -> &str

Get name of the Executor

source§

impl RepartitionExec

source

pub fn try_new( input: Arc<dyn ExecutionPlan>, partitioning: Partitioning ) -> Result<Self>

Create a new RepartitionExec

source

pub fn with_preserve_order(self, preserve_order: bool) -> Self

Set Order preserving flag

Trait Implementations§

source§

impl Debug for RepartitionExec

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl DisplayAs for RepartitionExec

source§

fn fmt_as(&self, t: DisplayFormatType, f: &mut Formatter<'_>) -> Result

Format according to DisplayFormatType, used when verbose representation looks different from the default one Read more
source§

impl ExecutionPlan for RepartitionExec

source§

fn as_any(&self) -> &dyn Any

Return a reference to Any that can be used for downcasting

source§

fn schema(&self) -> SchemaRef

Get the schema for this execution plan

source§

fn unbounded_output(&self, children: &[bool]) -> Result<bool>

Specifies whether this plan generates an infinite stream of records. If the plan does not support pipelining, but its input(s) are infinite, returns an error to indicate this.

source§

fn children(&self) -> Vec<Arc<dyn ExecutionPlan>>

Get a list of child execution plans that provide the input for this plan. The returned list will be empty for leaf nodes, will contain a single value for unary nodes, or two values for binary nodes (such as joins).
source§

fn with_new_children( self: Arc<Self>, children: Vec<Arc<dyn ExecutionPlan>> ) -> Result<Arc<dyn ExecutionPlan>>

Returns a new plan where all children were replaced by new plans.
source§

fn benefits_from_input_partitioning(&self) -> Vec<bool>

Specifies whether the operator benefits from increased parallelization at its input for each child. If set to true, this indicates that the operator would benefit from partitioning its corresponding child (and thus from more parallelism). For operators that do very little work the overhead of extra parallelism may outweigh any benefits Read more
source§

fn output_partitioning(&self) -> Partitioning

Specifies the output partitioning scheme of this plan
source§

fn output_ordering(&self) -> Option<&[PhysicalSortExpr]>

If the output of this operator within each partition is sorted, returns Some(keys) with the description of how it was sorted. Read more
source§

fn maintains_input_order(&self) -> Vec<bool>

Returns false if this operator’s implementation may reorder rows within or between partitions. Read more
source§

fn equivalence_properties(&self) -> EquivalenceProperties

Get the EquivalenceProperties within the plan
source§

fn ordering_equivalence_properties(&self) -> OrderingEquivalenceProperties

Get the OrderingEquivalenceProperties within the plan
source§

fn execute( &self, partition: usize, context: Arc<TaskContext> ) -> Result<SendableRecordBatchStream>

creates an iterator
source§

fn metrics(&self) -> Option<MetricsSet>

Return a snapshot of the set of Metrics for this ExecutionPlan. Read more
source§

fn statistics(&self) -> Statistics

Returns the global output statistics for this ExecutionPlan node.
source§

fn required_input_distribution(&self) -> Vec<Distribution>

Specifies the data distribution requirements for all the children for this operator, By default it’s [Distribution::UnspecifiedDistribution] for each child,
source§

fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirement>>>

Specifies the ordering requirements for all of the children For each child, it’s the local ordering requirement within each partition rather than the global ordering Read more
source§

fn file_scan_config(&self) -> Option<&FileScanConfig>

Returns the FileScanConfig in case this is a data source scanning execution plan or None otherwise.

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> Same<T> for T

§

type Output = T

Should always be Self
source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

§

fn vzip(self) -> V