Re: Sync Rep v17 - Mailing list pgsql-hackers

From Daniel Farina
Subject Re: Sync Rep v17
Date
Msg-id AANLkTinFcM494Vn+Fj2nqctAVo4zZMv6zKn30TMWR18N@mail.gmail.com
Whole thread Raw
In response to Re: Sync Rep v17  (Jaime Casanova <jaime@2ndquadrant.com>)
Responses Re: Sync Rep v17  (Daniel Farina <daniel@heroku.com>)
Re: Sync Rep v17  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Tue, Feb 22, 2011 at 11:43 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
> On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690
>> DEBUG:  released 0 procs up to 0/3014690
>> DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
>> DEBUG:  released 2 procs up to 0/3027BC8
>> WARNING:  could not locate ourselves on wait queue
>> server closed the connection unexpectedly
>>        This probably means the server terminated abnormally
>>        before or while processing the request.
>> The connection to the server was lost. Attempting reset: DEBUG:
>
> you can make this happen more easily, i just run "pgbench -n -c10 -j10
> test" and qot that warning and sometimes a segmentation fault and
> sometimes a failed assertion

I have also reproduced this. Notably, things seem fine as long as
pgbench is confined to one backend, but as soon as two are used (-c 2)
by the feature I can get segfaults.

In the UI department, I am finding it somewhat difficult to affirm
that I am, in fact, synchronously replicating anything in the HA
scenario (where I never want to block. However, by enjoying the patch
at DEBUG3 and running what I think to be syncrepped and non-syncrepped
runs, I believe that I am not committing user error (seeing syncrep
waiting vs. lack thereof).  This is in part hard to confirm because
the single-backend performance (if DEBUG3 is to be believed, I will
write a real torture test later) of syncrep is actually very good, I
was expecting a more perceptible performance dropoff. But then again,
I imagine the real kicker will happen when we can run concurrent
backends. Also, Amazon EBS doesn't have the fastest disks, and within
an availability zone network latency is awfully low.

I can't quite explain what I was seeing before w.r.t.  memory usage,
and more pressingly, a very slow recover. I turned off hot standby and
was messing around and, before I knew it, the server was caught up. I
do not know if that was just coincidence (probably) or overhead
imposed by HS. The very high RES number was linux fooling me, as most
of it was SHR and in SHMEM.

--
fdr


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Fwd: psql include file using relative path
Next
From: Daniel Farina
Date:
Subject: Re: Sync Rep v17