Re: BUG #7914: pg_dump aborts occasionally - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #7914: pg_dump aborts occasionally
Date
Msg-id 8804.1399507096@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #7914: pg_dump aborts occasionally  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #7914: pg_dump aborts occasionally  (Shin-ichi MORITA <s-morita@beingcorp.co.jp>)
List pgsql-bugs
I wrote:
> "Shin-ichi MORITA" <s-morita@beingcorp.co.jp> writes:
>>> If so, perhaps injecting a sleep() delay into the right place in pg_dump
>>> or libpq would make it reproducible?

>> An alternative way would be running pg_dump with a lower priority.
>> Actually, I can reproduce this issue by setting the priority of
>> pg_dump to Low using Windows Task Manager
>> on the "single processor" environment.

> I tried nice'ing pg_dump without any success in making it bloat memory.
> I'm suspicious that there's something Windows-specific in what you're
> seeing, because it's fairly hard to credit that nobody's seen this
> problem in the ten years or so that libpq has been doing it like that.

> Can anyone else manage to reproduce a similar behavior?

I started thinking about this issue again after seeing a similar report at
http://www.postgresql.org/message-id/flat/850BF81CE7324F81AF2A44C8660CF2EE@dell2
and this time I think I see the problem.  Look at the loop in pqReadData()
created by this test:

        /*
         * Hack to deal with the fact that some kernels will only give us back
         * 1 packet per recv() call, even if we asked for more and there is
         * more available.  If it looks like we are reading a long message,
         * loop back to recv() again immediately, until we run out of data or
         * buffer space.  Without this, the block-and-restart behavior of
         * libpq's higher levels leads to O(N^2) performance on long messages.
         *
         * Since we left-justified the data above, conn->inEnd gives the
         * amount of data already read in the current message.  We consider
         * the message "long" once we have acquired 32k ...
         */
        if (conn->inEnd > 32768 &&
            (conn->inBufSize - conn->inEnd) >= 8192)
        {
            someread = 1;
            goto retry3;
        }

Once we have more than 32K in the input buffer, this logic will loop
until we either fill the buffer (to within 8K anyway) or get ahead
of the sender (so that recv() returns zero).  Then, we go and process
all the message(s) that we just read.  If what we have includes some
message(s) and then an incomplete message at the end of the buffer,
we'll call pqCheckInBufferSpace, which is quite likely to result in
doubling the input buffer size, even though the incomplete message
might be much smaller than the buffer size.  That's fine so far,
but then when we go back to pqReadData, it's capable of filling that
enlarged buffer again.  Lather, rinse, repeat.

I've been able to reproduce buffer bloat by inserting some delay into
this loop (I put "usleep(10000);" in front of the goto above) and then
sending a steady stream of circa-50KB data messages.

To see any significant buffer bloat in practice, you need an environment
where the sender + network is consistently faster than the receiver.
That's fairly unlikely if the sender and receiver are on different
machines, just because that recv() loop in pqReadData is probably faster
than the network connection.  It seems also unlikely when they're on the
same machine: that recv() loop is certainly faster than any data-sending
loop in the server, so you'd need some scheduler effect that consistently
gives the backend process priority over the pg_dump process, which is
exactly the opposite of what generally happens on Unix machines.  Perhaps
Windows schedules differently?  Or maybe the reason we're seeing this now,
after not seeing it for many years, is that networks are getting faster.

Anyway, I still don't like your original suggestion of simply skipping
the pqCheckInBufferSpace call; that's just losing the knowledge we have of
how big the next message will be.  Rather I think what we are missing here
is that we should deduct the excess space to the left of inStart, and/or
forcibly left-justify the buffer, before deciding that we need to enlarge
the buffer.  I think it'd be safe for pqCheckInBufferSpace to do that
(ie move the data within the buffer), though I need to look at the callers
more closely to be sure.

            regards, tom lane

pgsql-bugs by date:

Previous
From: David G Johnston
Date:
Subject: Re: BUG #10256: COUNT(*) behaves sort of like RANK() when used over a window containing an ORDER BY
Next
From: Tom Lane
Date:
Subject: Re: Re: BUG #10256: COUNT(*) behaves sort of like RANK() when used over a window containing an ORDER BY