Home > mailing lists

Re: Sync Rep v17 - Mailing list pgsql-hackers

From	Daniel Farina
Subject	Re: Sync Rep v17
Date	February 25, 2011 01:13:53
Msg-id	AANLkTinFcM494Vn+Fj2nqctAVo4zZMv6zKn30TMWR18N@mail.gmail.com Whole thread Raw
In response to	Re: Sync Rep v17 (Jaime Casanova <jaime@2ndquadrant.com>)
Responses	Re: Sync Rep v17 (Daniel Farina <daniel@heroku.com>) Re: Sync Rep v17 (Simon Riggs <simon@2ndQuadrant.com>)
List	pgsql-hackers

Tree view

On Tue, Feb 22, 2011 at 11:43 AM, Jaime Casanova <jaime@2ndquadrant.com> wrote:
> On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690
>> DEBUG:  released 0 procs up to 0/3014690
>> DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
>> DEBUG:  released 2 procs up to 0/3027BC8
>> WARNING:  could not locate ourselves on wait queue
>> server closed the connection unexpectedly
>>        This probably means the server terminated abnormally
>>        before or while processing the request.
>> The connection to the server was lost. Attempting reset: DEBUG:
>
> you can make this happen more easily, i just run "pgbench -n -c10 -j10
> test" and qot that warning and sometimes a segmentation fault and
> sometimes a failed assertion

I have also reproduced this. Notably, things seem fine as long as
pgbench is confined to one backend, but as soon as two are used (-c 2)
by the feature I can get segfaults.

In the UI department, I am finding it somewhat difficult to affirm
that I am, in fact, synchronously replicating anything in the HA
scenario (where I never want to block. However, by enjoying the patch
at DEBUG3 and running what I think to be syncrepped and non-syncrepped
runs, I believe that I am not committing user error (seeing syncrep
waiting vs. lack thereof).  This is in part hard to confirm because
the single-backend performance (if DEBUG3 is to be believed, I will
write a real torture test later) of syncrep is actually very good, I
was expecting a more perceptible performance dropoff. But then again,
I imagine the real kicker will happen when we can run concurrent
backends. Also, Amazon EBS doesn't have the fastest disks, and within
an availability zone network latency is awfully low.

I can't quite explain what I was seeing before w.r.t.  memory usage,
and more pressingly, a very slow recover. I turned off hot standby and
was messing around and, before I knew it, the server was caught up. I
do not know if that was just coincidence (probably) or overhead
imposed by HS. The very high RES number was linux fooling me, as most
of it was SHR and in SHMEM.

--
fdr

pgsql-hackers by date:

From: Robert Haas
Date: 24 February 2011, 23:12:54
Subject: Re: Fwd: psql include file using relative path

From: Daniel Farina
Date: 25 February 2011, 01:26:28
Subject: Re: Sync Rep v17

Re: Sync Rep v17 - Mailing list pgsql-hackers

Previous

Next