Re: Tuplesort merge pre-reading - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Tuplesort merge pre-reading
Date
Msg-id CAM3SWZR3pArWugB9PGaYjiLDnHDUf+AyW1RCJwCFD-LnGCjtKg@mail.gmail.com
Whole thread Raw
In response to Re: Tuplesort merge pre-reading  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Tuplesort merge pre-reading  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Tue, Sep 6, 2016 at 12:08 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Offhand, I would think that taken together this is very important. I'd
> certainly want to see cases in the hundreds of megabytes or gigabytes
> of work_mem alongside your 4MB case, even just to be able to talk
> informally about this. As you know, the default work_mem value is very
> conservative.

It looks like your benchmark relies on multiple passes, which can be
misleading. I bet it suffers some amount of problems from palloc()
fragmentation. When very memory constrained, that can get really bad.

Non-final merge passes (merges with more than one run -- real or dummy
-- on any given tape) can have uneven numbers of runs on each tape.
So, tuplesort.c needs to be prepared to doll out memory among tapes
*unevenly* there (same applies to memtuples "slots"). This is why
batch memory support is so hard for those cases (the fact that they're
so rare anyway also puts me off it). As you know, I wrote a patch that
adds batch memory support to cases that require randomAccess (final
output on a materialized tape), for their final merge. These final
merges happen to not be a final on-the-fly merge only due to this
randomAccess requirement from caller. It's possible to support these
cases in the future, with that patch, only because I am safe to assume
that each run/tape is the same size there (well, the assumption is
exactly as safe as it was for the 9.6 final on-the-fly merge, at
least).

My point about non-final merges is that you have to be very careful
that you're comparing apples to apples, memory accounting wise, when
looking into something like this. I'm not saying that you didn't, but
it's worth considering.

FWIW, I did try an increase in the buffer size in LogicalTape at one
time several months back, and so no benefit there (at least, with no
other changes).

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Christian Convey
Date:
Subject: Re: [GENERAL] C++ port of Postgres
Next
From: Tomas Vondra
Date:
Subject: Re: Speed up Clog Access by increasing CLOG buffers