Thread: log chunking broken with large queries under load

log chunking broken with large queries under load

From
Andrew Dunstan
Date:
Some of my PostgreSQL Experts colleagues have been complaining to me 
that servers under load with very large queries cause CSV log files that 
are corrupted, because lines are apparently multiplexed. The log 
chunking protocol between the errlog routines and the syslogger is 
supposed to prevent that, so I did a little work to try to reproduce it 
in a controlled way. On my dual quad xeon setup, this script:
   #!/bin/sh   par=$1   seq=$2
   sed 2000q /usr/share/dict/words > words
   psql -q -c 'drop table if exists foo'   psql -q -c 'create table foo (t text)'
   echo '\set words `cat words`' > wordsin.sql   echo 'prepare fooplan (text) as insert into foo values ($1);' >>
wordsin.sql
   for i in `seq 1 $seq`; do      echo "execute fooplan(:'words');" >> wordsin.sql   done
   for i in `seq 1 $par`; do      psql -q -t -f wordsin.sql &   done   wait

called with parameters of 100 and 50 (i.e. 100 simultaneous clients each 
doing 50 very large inserts) is enough to cause CSV log corruption quite 
reliably on PostgreSQL 9.1.

This is a serious bug. I'm going to investigate, but it's causing major 
pain, so anyone else who has any ideas is welcome to chime in.

cheers

andrew


Re: log chunking broken with large queries under load

From
Andrew Dunstan
Date:

On 04/01/2012 06:34 PM, Andrew Dunstan wrote:
> Some of my PostgreSQL Experts colleagues have been complaining to me
> that servers under load with very large queries cause CSV log files
> that are corrupted, because lines are apparently multiplexed. The log
> chunking protocol between the errlog routines and the syslogger is
> supposed to prevent that, so I did a little work to try to reproduce
> it in a controlled way.


Well, a little further digging jogged my memory a bit. It looks like we
underestimated the amount of messages we might get as more than one
chunk fairly badly.

We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead
to stripe the slots with a two dimensional array, so we didn't have to
search a larger number of slots for any given message. See the attached
patch.

I'm not sure how much we want to scale this up. I set CHUNK_STRIPES to
20 to start with, and I've asked some colleagues with very heavy log
loads with very large queries to test it out if possible. If anyone else
has a similar load I'd appreciate similar testing.

cheers

andrew

Attachment

Re: log chunking broken with large queries under load

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 04/01/2012 06:34 PM, Andrew Dunstan wrote:
>> Some of my PostgreSQL Experts colleagues have been complaining to me 
>> that servers under load with very large queries cause CSV log files 
>> that are corrupted,

> We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead 
> to stripe the slots with a two dimensional array, so we didn't have to 
> search a larger number of slots for any given message. See the attached 
> patch.

This seems like it isn't actually fixing the problem, only pushing out
the onset of trouble a bit.  Should we not replace the fixed-size array
with a dynamic data structure?
        regards, tom lane


Re: log chunking broken with large queries under load

From
Andrew Dunstan
Date:

On 04/02/2012 12:00 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> On 04/01/2012 06:34 PM, Andrew Dunstan wrote:
>>> Some of my PostgreSQL Experts colleagues have been complaining to me
>>> that servers under load with very large queries cause CSV log files
>>> that are corrupted,
>> We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead
>> to stripe the slots with a two dimensional array, so we didn't have to
>> search a larger number of slots for any given message. See the attached
>> patch.
> This seems like it isn't actually fixing the problem, only pushing out
> the onset of trouble a bit.  Should we not replace the fixed-size array
> with a dynamic data structure?
>
>             


"A bit" = 10 to 20 times - more if we set CHUNK_STRIPES higher. :-)

But maybe your're right. If we do that and stick with my two-dimensional 
scheme to keep the number of probes per chunk down, we'd need to reorg 
the array every time we increased it. That might be a bit messy, but 
might be ok. Or maybe linearly searching an array of several hundred 
slots for our pid for every log chunk that comes in would be fast enough.

cheers

andrew




Re: log chunking broken with large queries under load

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 04/02/2012 12:00 PM, Tom Lane wrote:
>> This seems like it isn't actually fixing the problem, only pushing out
>> the onset of trouble a bit.  Should we not replace the fixed-size array
>> with a dynamic data structure?

> But maybe your're right. If we do that and stick with my two-dimensional 
> scheme to keep the number of probes per chunk down, we'd need to reorg 
> the array every time we increased it. That might be a bit messy, but 
> might be ok. Or maybe linearly searching an array of several hundred 
> slots for our pid for every log chunk that comes in would be fast enough.

You could do something like having a list of pending chunks for each
value of (pid mod 256).  The length of each such list ought to be plenty
short under ordinary circumstances.
        regards, tom lane


Re: log chunking broken with large queries under load

From
Andrew Dunstan
Date:

On 04/02/2012 12:44 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> On 04/02/2012 12:00 PM, Tom Lane wrote:
>>> This seems like it isn't actually fixing the problem, only pushing out
>>> the onset of trouble a bit.  Should we not replace the fixed-size array
>>> with a dynamic data structure?
>> But maybe your're right. If we do that and stick with my two-dimensional
>> scheme to keep the number of probes per chunk down, we'd need to reorg
>> the array every time we increased it. That might be a bit messy, but
>> might be ok. Or maybe linearly searching an array of several hundred
>> slots for our pid for every log chunk that comes in would be fast enough.
> You could do something like having a list of pending chunks for each
> value of (pid mod 256).  The length of each such list ought to be plenty
> short under ordinary circumstances.
>
>             


Yeah, ok, that should work. How big would we make each list to start 
with? Still 20, or smaller?

cheers

andrew



Re: log chunking broken with large queries under load

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 04/02/2012 12:44 PM, Tom Lane wrote:
>> You could do something like having a list of pending chunks for each
>> value of (pid mod 256).  The length of each such list ought to be plenty
>> short under ordinary circumstances.

> Yeah, ok, that should work. How big would we make each list to start 
> with? Still 20, or smaller?

When I said "list", I meant a "List *".  No fixed size.
        regards, tom lane


Re: log chunking broken with large queries under load

From
Andrew Dunstan
Date:

On 04/02/2012 01:03 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> On 04/02/2012 12:44 PM, Tom Lane wrote:
>>> You could do something like having a list of pending chunks for each
>>> value of (pid mod 256).  The length of each such list ought to be plenty
>>> short under ordinary circumstances.
>> Yeah, ok, that should work. How big would we make each list to start
>> with? Still 20, or smaller?
> When I said "list", I meant a "List *".  No fixed size.
>
>

Ok, like this? Do we consider this a bug fix, to be backpatched?

cheers

andrew



Attachment

Re: log chunking broken with large queries under load

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 04/02/2012 01:03 PM, Tom Lane wrote:
>> When I said "list", I meant a "List *".  No fixed size.

> Ok, like this?

I think this could use a bit of editorialization (I don't think the
"stripe" terminology is still applicable, in particular), but the
general idea seems OK.

Does anyone feel that it's a bad idea that list entries are never
reclaimed?  In the worst case a transient load peak could result in
a long list that permanently adds search overhead.  Not sure if it's
worth the extra complexity to delete a list cell that's no longer
needed, rather than leaving it present and empty.

> Do we consider this a bug fix, to be backpatched?

Yes, definitely.

I think I'd like to have a go at coding it the other way (with
release of list entries), just to see if that comes out cleaner
or uglier than this way.  If you don't mind I'll pick this up
and commit whichever way turns out better.
        regards, tom lane


Re: log chunking broken with large queries under load

From
Andrew Dunstan
Date:

On 04/04/2012 12:13 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> On 04/02/2012 01:03 PM, Tom Lane wrote:
>>> When I said "list", I meant a "List *".  No fixed size.
>> Ok, like this?
> I think this could use a bit of editorialization (I don't think the
> "stripe" terminology is still applicable, in particular), but the
> general idea seems OK.
>
> Does anyone feel that it's a bad idea that list entries are never
> reclaimed?  In the worst case a transient load peak could result in
> a long list that permanently adds search overhead.  Not sure if it's
> worth the extra complexity to delete a list cell that's no longer
> needed, rather than leaving it present and empty.

Me either. The logic could possibly be something simple when we free a 
node like "while the list tail is an available node prune the tail". But 
as you say, it might not be worth it.

>
>> Do we consider this a bug fix, to be backpatched?
> Yes, definitely.
>
> I think I'd like to have a go at coding it the other way (with
> release of list entries), just to see if that comes out cleaner
> or uglier than this way.  If you don't mind I'll pick this up
> and commit whichever way turns out better.
>
>             


Go for it.

cheers

andrew



Re: log chunking broken with large queries under load

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 04/04/2012 12:13 PM, Tom Lane wrote:
>> Does anyone feel that it's a bad idea that list entries are never
>> reclaimed?  In the worst case a transient load peak could result in
>> a long list that permanently adds search overhead.  Not sure if it's
>> worth the extra complexity to delete a list cell that's no longer
>> needed, rather than leaving it present and empty.

> Me either. The logic could possibly be something simple when we free a 
> node like "while the list tail is an available node prune the tail". But 
> as you say, it might not be worth it.

The idea I had in mind was to compensate for adding list-removal logic
by getting rid of the concept of an unused entry.  If the removal is
conditional then you can't do that and you end up with the complications
of both methods.  Anyway I've not tried to code it yet.
        regards, tom lane


Re: log chunking broken with large queries under load

From
Tom Lane
Date:
I wrote:
> The idea I had in mind was to compensate for adding list-removal logic
> by getting rid of the concept of an unused entry.  If the removal is
> conditional then you can't do that and you end up with the complications
> of both methods.  Anyway I've not tried to code it yet.

I concluded this would probably be a loser performance-wise, because it
would add a couple of palloc/pfree cycles to the processing of each
multi-chunk message, whether there was any contention or not.  So I
committed the patch with just some cosmetic cleanups.
        regards, tom lane


Re: log chunking broken with large queries under load

From
Andrew Dunstan
Date:

On 04/04/2012 03:09 PM, Tom Lane wrote:
> I wrote:
>> The idea I had in mind was to compensate for adding list-removal logic
>> by getting rid of the concept of an unused entry.  If the removal is
>> conditional then you can't do that and you end up with the complications
>> of both methods.  Anyway I've not tried to code it yet.
> I concluded this would probably be a loser performance-wise, because it
> would add a couple of palloc/pfree cycles to the processing of each
> multi-chunk message, whether there was any contention or not.  So I
> committed the patch with just some cosmetic cleanups.
>
>             

OK, thanks for doing this.

cheers

andrew