Home > mailing lists

Re: Tuplesort merge pre-reading - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Tuplesort merge pre-reading
Date	September 6, 2016 22:27:03
Msg-id	CAM3SWZR3pArWugB9PGaYjiLDnHDUf+AyW1RCJwCFD-LnGCjtKg@mail.gmail.com Whole thread Raw
In response to	Re: Tuplesort merge pre-reading (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Tuplesort merge pre-reading (Heikki Linnakangas <hlinnaka@iki.fi>)
List	pgsql-hackers

Tree view

On Tue, Sep 6, 2016 at 12:08 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Offhand, I would think that taken together this is very important. I'd
> certainly want to see cases in the hundreds of megabytes or gigabytes
> of work_mem alongside your 4MB case, even just to be able to talk
> informally about this. As you know, the default work_mem value is very
> conservative.

It looks like your benchmark relies on multiple passes, which can be
misleading. I bet it suffers some amount of problems from palloc()
fragmentation. When very memory constrained, that can get really bad.

Non-final merge passes (merges with more than one run -- real or dummy
-- on any given tape) can have uneven numbers of runs on each tape.
So, tuplesort.c needs to be prepared to doll out memory among tapes
*unevenly* there (same applies to memtuples "slots"). This is why
batch memory support is so hard for those cases (the fact that they're
so rare anyway also puts me off it). As you know, I wrote a patch that
adds batch memory support to cases that require randomAccess (final
output on a materialized tape), for their final merge. These final
merges happen to not be a final on-the-fly merge only due to this
randomAccess requirement from caller. It's possible to support these
cases in the future, with that patch, only because I am safe to assume
that each run/tape is the same size there (well, the assumption is
exactly as safe as it was for the 9.6 final on-the-fly merge, at
least).

My point about non-final merges is that you have to be very careful
that you're comparing apples to apples, memory accounting wise, when
looking into something like this. I'm not saying that you didn't, but
it's worth considering.

FWIW, I did try an increase in the buffer size in LogicalTape at one
time several months back, and so no benefit there (at least, with no
other changes).

-- 
Peter Geoghegan

pgsql-hackers by date:

From: Christian Convey
Date: 06 September 2016, 22:15:36
Subject: Re: [GENERAL] C++ port of Postgres

From: Tomas Vondra
Date: 06 September 2016, 22:38:46
Subject: Re: Speed up Clog Access by increasing CLOG buffers

Re: Tuplesort merge pre-reading - Mailing list pgsql-hackers

Previous

Next