Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
TopK in datafusion::physical_plan - Rust
[go: Go Back, main page]

Struct TopK

Source
pub struct TopK { /* private fields */ }
Expand description

Global TopK

§Background

“Top K” is a common query optimization used for queries such as “find the top 3 customers by revenue”. The (simplified) SQL for such a query might be:

SELECT customer_id, revenue FROM 'sales.csv' ORDER BY revenue DESC limit 3;

The simple plan would be:

> explain SELECT customer_id, revenue FROM sales ORDER BY revenue DESC limit 3;
+--------------+----------------------------------------+
| plan_type    | plan                                   |
+--------------+----------------------------------------+
| logical_plan | Limit: 3                               |
|              |   Sort: revenue DESC NULLS FIRST       |
|              |     Projection: customer_id, revenue   |
|              |       TableScan: sales                 |
+--------------+----------------------------------------+

While this plan produces the correct answer, it will fully sorts the input before discarding everything other than the top 3 elements.

The same answer can be produced by simply keeping track of the top K=3 elements, reducing the total amount of required buffer memory.

§Partial Sort Optimization

This implementation additionally optimizes queries where the input is already partially sorted by a common prefix of the requested ordering. Once the top K heap is full, if subsequent rows are guaranteed to be strictly greater (in sort order) on this prefix than the largest row currently stored, the operator safely terminates early.

§Example

For input sorted by (day DESC), but not by timestamp, a query such as:

SELECT day, timestamp FROM sensor ORDER BY day DESC, timestamp DESC LIMIT 10;

can terminate scanning early once sufficient rows from the latest days have been collected, skipping older data.

§Structure

This operator tracks the top K items using a TopKHeap.

Implementations§

Source§

impl TopK

Source

pub fn try_new( partition_id: usize, schema: Arc<Schema>, common_sort_prefix: LexOrdering, expr: LexOrdering, k: usize, batch_size: usize, runtime: Arc<RuntimeEnv>, metrics: &ExecutionPlanMetricsSet, ) -> Result<TopK, DataFusionError>

Create a new TopK that stores the top k values, as defined by the sort expressions in expr.

Source

pub fn insert_batch( &mut self, batch: RecordBatch, ) -> Result<(), DataFusionError>

Insert batch, remembering if any of its values are among the top k seen so far.

Source

pub fn emit( self, ) -> Result<Pin<Box<dyn RecordBatchStream<Item = Result<RecordBatch, DataFusionError>> + Send>>, DataFusionError>

Returns the top k results broken into batch_size RecordBatches, consuming the heap

Auto Trait Implementations§

§

impl Freeze for TopK

§

impl !RefUnwindSafe for TopK

§

impl Send for TopK

§

impl Sync for TopK

§

impl Unpin for TopK

§

impl !UnwindSafe for TopK

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> Ungil for T
where T: Send,