Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | CAAaqYe9pzNZnF328bFZfzR8zJK9Mvft3i6TN0NjxjCkWeVm1iQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
|
List | pgsql-hackers |
On Mon, Jun 24, 2019 at 4:16 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Mon, Jun 24, 2019 at 01:00:50PM -0400, James Coleman wrote: > >On Mon, Jun 24, 2019 at 12:56 PM Simon Riggs <simon@2ndquadrant.com> wrote: > >> > >> On Mon, 24 Jun 2019 at 16:10, James Coleman <jtc331@gmail.com> wrote: > >>> > >>> On Thu, Jun 13, 2019 at 11:38:12PM -0400, James Coleman wrote: ... > >>> As I see it the two most significant concerning cases right now are: > >>> 1. Very large batches (in particular where the batch is effectively > >>> all of the matching rows such that we're really just doing a standard > >>> sort). > >>> 2. Many very small batches. > >> > >> > >> What is the specific use case for this? This sounds quite general case. > > > >They are both general cases in some sense, but the concerns lie mostly > >with what happens when they're unexpectedly encountered. For example, > >if the expected row count or group size is off by a good bit and we > >effectively have to perform a sort of all (or most) possible rows. > > > >If we can get the performance to a point where that misestimated row > >count or group size doesn't much matter, then ISTM including the patch > >becomes a much more obvious total win. > > > > Yes, that seems like a reasonable approach. Essentially, we're trying to > construct plausible worst case examples, and then minimize the overhead > compared to regular sort. If we get sufficiently close, then it's fine > to rely on somewhat shaky stats - like group size estimates. I have a bit of a mystery in my performance testing. I've been setting up a table like so: create table foo(pk serial primary key, owner_fk integer, created_at timestamp); insert into foo(owner_fk, created_at) select fk_t.i, now() - (time_t.i::text || ' minutes')::interval from generate_series(1, 10000) time_t(i) cross join generate_series(1, 1000) fk_t(i); -- double up on one set to guarantee matching prefixes insert into foo (owner_fk, created_at) select owner_fk, created_at from foo where owner_fk = 23; create index idx_foo_on_owner_and_created_at on foo(owner_fk, created_at); analyze foo; and then I have the following query: select * from foo where owner_fk = 23 order by created_at desc, pk desc limit 20000; The idea here is to force a bit of a worst case for small groups: we have 10,000 batches (i.e., equal prefix groups) of 2 tuples each and then query with a limit matching the actual number of rows we know will match the query -- so even though there's a limit we're forcing a total sort (and also guaranteeing both plans have to touch the same number of rows). Note: I know that batches of size is actually the worst case, but I chose batches of two because I've also been testing a change that would skip the sort entirely for single tuple batches. On master (really the commit right before the current revision of the patch), I get: latency average = 14.271 ms tps = 70.075243 (excluding connections establishing) With the patch (and incremental sort enabled): latency average = 11.975 ms tps = 83.512090 (excluding connections establishing) With the patch (but incremental sort disabled): latency average = 11.884 ms tps = 84.149834 (excluding connections establishing) All of those are 60 seconds runs on pgbench with a single thread. So we have a very substantial speedup with patch *even if the new feature isn't enabled*. I've confirmed the plan looks the same on patched with incremental sort disabled and master. The only changes that would seem to really effect execution time would be the changes to tuplesort.c, but looking through them I don't see anything I'd expect to change things so dramatically. Any thoughts on this? James Coleman
pgsql-hackers by date: