Struct TopK

Source

pub struct TopK { /* private fields */ }

Expand description

Global TopK

§Background

“Top K” is a common query optimization used for queries such as “find the top 3 customers by revenue”. The (simplified) SQL for such a query might be:

SELECT customer_id, revenue FROM 'sales.csv' ORDER BY revenue DESC limit 3;

The simple plan would be:

> explain SELECT customer_id, revenue FROM sales ORDER BY revenue DESC limit 3;
+--------------+----------------------------------------+
| plan_type    | plan                                   |
+--------------+----------------------------------------+
| logical_plan | Limit: 3                               |
|              |   Sort: revenue DESC NULLS FIRST       |
|              |     Projection: customer_id, revenue   |
|              |       TableScan: sales                 |
+--------------+----------------------------------------+

While this plan produces the correct answer, it will fully sorts the input before discarding everything other than the top 3 elements.

The same answer can be produced by simply keeping track of the top K=3 elements, reducing the total amount of required buffer memory.

§Partial Sort Optimization

This implementation additionally optimizes queries where the input is already partially sorted by a common prefix of the requested ordering. Once the top K heap is full, if subsequent rows are guaranteed to be strictly greater (in sort order) on this prefix than the largest row currently stored, the operator safely terminates early.

§Example

For input sorted by (day DESC), but not by timestamp, a query such as:

SELECT day, timestamp FROM sensor ORDER BY day DESC, timestamp DESC LIMIT 10;

can terminate scanning early once sufficient rows from the latest days have been collected, skipping older data.

§Structure

This operator tracks the top K items using a TopKHeap.

Struct TopKCopy item path

§Background

§Partial Sort Optimization

§Example

§Structure

Implementations§

impl TopK

pub fn try_new( partition_id: usize, schema: Arc<Schema>, common_sort_prefix: LexOrdering, expr: LexOrdering, k: usize, batch_size: usize, runtime: Arc<RuntimeEnv>, metrics: &ExecutionPlanMetricsSet, ) -> Result<TopK, DataFusionError>

pub fn insert_batch( &mut self, batch: RecordBatch, ) -> Result<(), DataFusionError>

pub fn emit( self, ) -> Result<Pin<Box<dyn RecordBatchStream<Item = Result<RecordBatch, DataFusionError>> + Send>>, DataFusionError>

Auto Trait Implementations§

impl Freeze for TopK

impl !RefUnwindSafe for TopK

impl Send for TopK

impl Sync for TopK

impl Unpin for TopK

impl !UnwindSafe for TopK

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Same for T

type Output = T

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

fn vzip(self) -> V

impl<T> ErasedDestructor for Twhere T: 'static,

impl<T> Ungil for Twhere T: Send,

Struct TopK

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

impl<T> ErasedDestructor for T
where T: 'static,

impl<T> Ungil for T
where T: Send,