Thread:

From
Peter Geoghegan
Date:
On Tue, Sep 27, 2016 at 3:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I don't know what any of that means.  You said we need something like
> an LWLock, but I think we don't.  The workers just write the results
> of their own final merges into shm_mqs.  The leader can read from any
> given shm_mq until no more tuples can be read without blocking, just
> like nodeGather.c does, or at least it can do that unless its own
> queue fills up first.  No mutual exclusion mechanism is required for
> any of that, as far as I can see - not an LWLock, and not anything
> similar.

I'm confused about the distinction you're making. Isn't the shm_mqs
queue itself a mutual exclusion mechanism? The leader cannot really do
anything without some input to process, because of the dependency that
naturally exists. ISTM that everything else we've discussed is
semantics, and/or about particular mechanisms in Postgres.

At a high level, based on my intuition about performance
characteristics, I have reservations about eager merging in the
leader. That's all I mean. There is a trade-off to be made between
eager and lazy merging. As I pointed out, the author of the Volcano
paper (Goetz Graefe) went on at length about this particular trade-off
for parallel sort almost 30 years ago. So, it's not clear that that
would be an easy win, or even a win.

That's all I meant.

-- 
Peter Geoghegan



Re:

From
Robert Haas
Date:
You seem to have erased the subject line from this email somehow.

On Tue, Sep 27, 2016 at 10:18 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Sep 27, 2016 at 3:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I don't know what any of that means.  You said we need something like
>> an LWLock, but I think we don't.  The workers just write the results
>> of their own final merges into shm_mqs.  The leader can read from any
>> given shm_mq until no more tuples can be read without blocking, just
>> like nodeGather.c does, or at least it can do that unless its own
>> queue fills up first.  No mutual exclusion mechanism is required for
>> any of that, as far as I can see - not an LWLock, and not anything
>> similar.
>
> I'm confused about the distinction you're making. Isn't the shm_mqs
> queue itself a mutual exclusion mechanism? The leader cannot really do
> anything without some input to process, because of the dependency that
> naturally exists. ISTM that everything else we've discussed is
> semantics, and/or about particular mechanisms in Postgres.

No, a shm_mq is not a mutual exclusion mechanism.  It's a queue!

A data dependency and a lock aren't the same thing.  If your point in
saying "we need something like an LWLock" was actually "the leader
will have to wait if it's merged all tuples from a worker and no more
are immediately available" then I think that's a pretty odd way to
make that point.  To me, the first statement appears to be false,
while the second is obviously true.

> At a high level, based on my intuition about performance
> characteristics, I have reservations about eager merging in the
> leader. That's all I mean. There is a trade-off to be made between
> eager and lazy merging. As I pointed out, the author of the Volcano
> paper (Goetz Graefe) went on at length about this particular trade-off
> for parallel sort almost 30 years ago. So, it's not clear that that
> would be an easy win, or even a win.
>
> That's all I meant.

OK, well I'm not taking any position on whether what Heikki is
proposing will turn out to be good from a performance point of view.
My intuitions about sorting performance haven't turned out to be
particularly good in the past.  I'm only saying that if you do decide
to queue the tuples passing from worker to leader in a shm_mq, you
shouldn't read from the shm_mq objects in a round robin, but rather
read multiple tuples at a time from the same worker whenever that is
possible without blocking.  If you read tuples from workers one at a
time, you get a lot of context-switch thrashing, because the worker
wakes up, writes one tuple (or even part of a tuple) and immediately
goes back to sleep.  Then it shortly thereafter does it again.
Whereas if you drain the worker's whole queue each time you read from
that queue, then the worker can wake up, refill it completely, and go
back to sleep again.  So you incur fewer context switches for the same
amount of work.  Or at least, that's how it worked out for Gather, and
what I am saying is that it will probably work out the same way for
sorting, if somebody chooses to try to implement this.

I am also saying that using the shm_mq code here does not require the
introduction of an LWLock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re:

From
Peter Geoghegan
Date:
On Tue, Sep 27, 2016 at 3:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> You seem to have erased the subject line from this email somehow.

I think that that was Gmail. Maybe that's new? Generally, I have to go
out of my way to change the subject line, so it seems unlikely that I
fat-fingered it. I wish that they'd stop changing things...

> On Tue, Sep 27, 2016 at 10:18 AM, Peter Geoghegan <pg@heroku.com> wrote:
> No, a shm_mq is not a mutual exclusion mechanism.  It's a queue!
>
> A data dependency and a lock aren't the same thing.  If your point in
> saying "we need something like an LWLock" was actually "the leader
> will have to wait if it's merged all tuples from a worker and no more
> are immediately available" then I think that's a pretty odd way to
> make that point.  To me, the first statement appears to be false,
> while the second is obviously true.

Okay. Sorry for being unclear.

> OK, well I'm not taking any position on whether what Heikki is
> proposing will turn out to be good from a performance point of view.
> My intuitions about sorting performance haven't turned out to be
> particularly good in the past.  I'm only saying that if you do decide
> to queue the tuples passing from worker to leader in a shm_mq, you
> shouldn't read from the shm_mq objects in a round robin, but rather
> read multiple tuples at a time from the same worker whenever that is
> possible without blocking.  If you read tuples from workers one at a
> time, you get a lot of context-switch thrashing, because the worker
> wakes up, writes one tuple (or even part of a tuple) and immediately
> goes back to sleep.  Then it shortly thereafter does it again.
> Whereas if you drain the worker's whole queue each time you read from
> that queue, then the worker can wake up, refill it completely, and go
> back to sleep again.  So you incur fewer context switches for the same
> amount of work.  Or at least, that's how it worked out for Gather, and
> what I am saying is that it will probably work out the same way for
> sorting, if somebody chooses to try to implement this.

Maybe. This would involve the overlapping of multiple worker merges
that are performed in parallel with a single leader merge. I don't
think it would be all that great. There would be a trade-off around
disk bandwidth utilization too, so I'm particularly doubtful that the
complexity would pay for itself. But, of course, I too have been wrong
about sorting performance characteristics in the past, so I can't be
sure.

-- 
Peter Geoghegan