Re: Support loser tree for k-way merge - Mailing list pgsql-hackers

From Sami Imseih
Subject Re: Support loser tree for k-way merge
Date
Msg-id CAA5RZ0u=WDa-v+8Y_yE4WeRS8VtmcmQGuWVjZFCqZcXx-hEP3w@mail.gmail.com
Whole thread Raw
In response to Re: Support loser tree for k-way merge  ("cca5507" <cca5507@qq.com>)
List pgsql-hackers
Hi,

Thanks for raising this.

> Now we use 'heap' during the k-way merge, it's O(n log k). The 'loser tree' is also O(n log k), but
> it's usually has fewer comparisons than the 'heap'. When the tuple comparator is complex, the
> 'loser tree' can significantly speed up the k-way merge.

I was playing around with this and I do also see the performance
benefits that you
mention with loser tree.

```
DROP TABLE IF EXISTS t;
CREATE TABLE t (
    col1 INT,
    col2 INT
);


INSERT INTO t (col1, col2)
SELECT
    (i % 10000000),
    (i % 100)
FROM generate_series(1, 10000000) AS s(i);
```

Using something like the above, testing an "external merge" on the
high cardinality
column there is a bit of an improvement with loser tree, but there is
a bit of a loss on
low-cardinality.

In general though, I see a bit of an issue with allowing a GUC to
change strategies. What happens if there are multiple "external merge"
nodes and they can each benefit from different strategies? Maybe that's
not an issue since all the other enable_ GUCs have that problem, but
it is something to consider.

Can we drive the decision for what to do based on optimizer
stats, i.e. n_distinct and row counts? Not sure what the calculation would
be specifically, but something else to consider.

We can still provide the GUC to  override the optimizer decisions,
but at least the optimizer, given up-to-date stats, may get it right most
of the time.

--
Sami Imseih
Amazon Web Services (AWS)



pgsql-hackers by date:

Previous
From: Zsolt Parragi
Date:
Subject: Proposal: Add a callback data parameter to GetNamedDSMSegment
Next
From: Kirill Reshke
Date:
Subject: Re: [PATCH] Add enable_copy_program GUC to control COPY PROGRAM