Home > mailing lists

Re: Support loser tree for k-way merge - Mailing list pgsql-hackers

From	Sami Imseih
Subject	Re: Support loser tree for k-way merge
Date	December 3 21:14:25
Msg-id	CAA5RZ0u=WDa-v+8Y_yE4WeRS8VtmcmQGuWVjZFCqZcXx-hEP3w@mail.gmail.com Whole thread Raw
In response to	Re: Support loser tree for k-way merge ("cca5507" <cca5507@qq.com>)
List	pgsql-hackers

Tree view

Hi,

Thanks for raising this.

> Now we use 'heap' during the k-way merge, it's O(n log k). The 'loser tree' is also O(n log k), but
> it's usually has fewer comparisons than the 'heap'. When the tuple comparator is complex, the
> 'loser tree' can significantly speed up the k-way merge.

I was playing around with this and I do also see the performance
benefits that you
mention with loser tree.

```
DROP TABLE IF EXISTS t;
CREATE TABLE t (
    col1 INT,
    col2 INT
);


INSERT INTO t (col1, col2)
SELECT
    (i % 10000000),
    (i % 100)
FROM generate_series(1, 10000000) AS s(i);
```

Using something like the above, testing an "external merge" on the
high cardinality
column there is a bit of an improvement with loser tree, but there is
a bit of a loss on
low-cardinality.

In general though, I see a bit of an issue with allowing a GUC to
change strategies. What happens if there are multiple "external merge"
nodes and they can each benefit from different strategies? Maybe that's
not an issue since all the other enable_ GUCs have that problem, but
it is something to consider.

Can we drive the decision for what to do based on optimizer
stats, i.e. n_distinct and row counts? Not sure what the calculation would
be specifically, but something else to consider.

We can still provide the GUC to  override the optimizer decisions,
but at least the optimizer, given up-to-date stats, may get it right most
of the time.

--
Sami Imseih
Amazon Web Services (AWS)

pgsql-hackers by date:

From: Zsolt Parragi
Date: 03 December, 21:06:51
Subject: Proposal: Add a callback data parameter to GetNamedDSMSegment

From: Kirill Reshke
Date: 03 December, 21:17:12
Subject: Re: [PATCH] Add enable_copy_program GUC to control COPY PROGRAM

Re: Support loser tree for k-way merge - Mailing list pgsql-hackers

Previous

Next