Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | 20190625160235.6ctryjzvw5je7cet@development Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
|
List | pgsql-hackers |
On Mon, Jun 24, 2019 at 07:34:19PM -0400, James Coleman wrote: >On Mon, Jun 24, 2019 at 4:16 PM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> >> On Mon, Jun 24, 2019 at 01:00:50PM -0400, James Coleman wrote: >> >On Mon, Jun 24, 2019 at 12:56 PM Simon Riggs <simon@2ndquadrant.com> wrote: >> >> >> >> On Mon, 24 Jun 2019 at 16:10, James Coleman <jtc331@gmail.com> wrote: >> >>> >> >>> On Thu, Jun 13, 2019 at 11:38:12PM -0400, James Coleman wrote: >... >> >>> As I see it the two most significant concerning cases right now are: >> >>> 1. Very large batches (in particular where the batch is effectively >> >>> all of the matching rows such that we're really just doing a standard >> >>> sort). >> >>> 2. Many very small batches. >> >> >> >> >> >> What is the specific use case for this? This sounds quite general case. >> > >> >They are both general cases in some sense, but the concerns lie mostly >> >with what happens when they're unexpectedly encountered. For example, >> >if the expected row count or group size is off by a good bit and we >> >effectively have to perform a sort of all (or most) possible rows. >> > >> >If we can get the performance to a point where that misestimated row >> >count or group size doesn't much matter, then ISTM including the patch >> >becomes a much more obvious total win. >> > >> >> Yes, that seems like a reasonable approach. Essentially, we're trying to >> construct plausible worst case examples, and then minimize the overhead >> compared to regular sort. If we get sufficiently close, then it's fine >> to rely on somewhat shaky stats - like group size estimates. > >I have a bit of a mystery in my performance testing. I've been setting >up a table like so: > >create table foo(pk serial primary key, owner_fk integer, created_at timestamp); >insert into foo(owner_fk, created_at) >select fk_t.i, now() - (time_t.i::text || ' minutes')::interval >from generate_series(1, 10000) time_t(i) >cross join generate_series(1, 1000) fk_t(i); >-- double up on one set to guarantee matching prefixes >insert into foo (owner_fk, created_at) select owner_fk, created_at >from foo where owner_fk = 23; >create index idx_foo_on_owner_and_created_at on foo(owner_fk, created_at); >analyze foo; > >and then I have the following query: > >select * >from foo >where owner_fk = 23 >order by created_at desc, pk desc >limit 20000; > >The idea here is to force a bit of a worst case for small groups: we >have 10,000 batches (i.e., equal prefix groups) of 2 tuples each and >then query with a limit matching the actual number of rows we know >will match the query -- so even though there's a limit we're forcing a >total sort (and also guaranteeing both plans have to touch the same >number of rows). Note: I know that batches of size is actually the >worst case, but I chose batches of two because I've also been testing >a change that would skip the sort entirely for single tuple batches. > >On master (really the commit right before the current revision of the >patch), I get: >latency average = 14.271 ms >tps = 70.075243 (excluding connections establishing) > >With the patch (and incremental sort enabled): >latency average = 11.975 ms >tps = 83.512090 (excluding connections establishing) > >With the patch (but incremental sort disabled): >latency average = 11.884 ms >tps = 84.149834 (excluding connections establishing) > >All of those are 60 seconds runs on pgbench with a single thread. > >So we have a very substantial speedup with patch *even if the new >feature isn't enabled*. I've confirmed the plan looks the same on >patched with incremental sort disabled and master. The only changes >that would seem to really effect execution time would be the changes >to tuplesort.c, but looking through them I don't see anything I'd >expect to change things so dramatically. > >Any thoughts on this? > I can reproduce the same thing, so it's not just you. On my machine, I see these tps numbers (average of 10 runs, 60 seconds each): master: 65.177 patched (on): 80.368 patched (off): 80.750 The numbers are very consistent (within 1 tps). I've done a bit of CPU profiling, and on master I see this: 13.84% postgres postgres [.] comparetup_heap 4.83% postgres postgres [.] qsort_tuple 3.87% postgres postgres [.] pg_ltostr_zeropad 3.55% postgres postgres [.] pg_ltoa 3.19% postgres postgres [.] AllocSetAlloc 2.68% postgres libc-2.28.so [.] __GI___strlen_sse2 2.38% postgres postgres [.] LWLockRelease 2.38% postgres postgres [.] AppendSeconds.constprop.9 2.22% postgres libc-2.28.so [.] __memmove_sse2_unaligned_erms 2.17% postgres postgres [.] GetPrivateRefCountEntry 2.03% postgres postgres [.] j2date ... while on patched versions I see this: 4.60% postgres postgres [.] pg_ltostr_zeropad 4.51% postgres postgres [.] pg_ltoa 3.50% postgres postgres [.] AllocSetAlloc 3.34% postgres libc-2.28.so [.] __GI___strlen_sse2 2.99% postgres postgres [.] LWLockRelease 2.84% postgres postgres [.] AppendSeconds.constprop.9 2.65% postgres postgres [.] GetPrivateRefCountEntry 2.64% postgres postgres [.] j2date 2.60% postgres postgres [.] printtup 2.56% postgres postgres [.] heap_hot_search_buffer ... 1.35% postgres postgres [.] comparetup_heap ... So either we're calling comparetup_heap less often, or it's cheaper. But it seems to be very dependent on the data set. If you do this: create table foo_2 as select * from foo order by random(); alter table foo_2 add primary key (pk); create index idx_foo_2_on_owner_and_created_at on foo_2 (owner_fk, created_at); and then run the test against this table, there's no difference. So my guess is this particular data set triggers slightly different behavior in tuplesort, reducing the cost of comparetup_heap. The speedup is quite significant (~20% on my system), the question is how widely applicable can it be. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: