Thread: stopping processes, preventing connections

stopping processes, preventing connections

From

Herouth Maoz

Date:

03 March 2010, 09:44:14

Hi.

I'm continuing on with the problems I have in our reports/data warehouse system. Basically, the system brings in tables from our various production systems (sybase, postgresql, mssql, different servers) every night. Some tables are brought in whole, and some are brought in based on a date field, and only the relevant interval is imported.

For tables which are brought whole, I first truncate the local table, then copy in the up-to-date data. For the ones that are brought partially, I delete partially first, and then copy in the same way.

The trouble is that sometimes there is a stray select which has been initiated and then abandoned (without cancellation) by the crystal reports system. When these queries happen to last into the night, they lock some of the tables which are supposed to be truncated. Then the whole process hangs until the query quits or dies, which, we have seen in the past, can take several hours sometimes.

What I want to do is write a script that kills any queries or connections from the crystal system, and then prevents new queries from being ran, until I finish loading all the tables, at which point I want to allow queries again.

First, the easy part - regarding allowing/disallowing queries. Is it possible to GRANT or REVOKE access to tables based on the originating IP?

Second, and the more complicated one - what do I do about rogue queries that are running when my process starts? Today we had a query that ran since yesterday. I called pg_cancel_backend() on it several times and waited for almost two hours - to no avail. Eventually I had to ask our sysadmin to shutdown PostgreSQL, which took some five minutes, but eventually worked. Is there a way to do the same thing to a single process without shutting down the whole server, and without causing any harm to the database or memory corruption? Something I can call from within SQL? I run the nightly script from a linux user which is not "postgres", so I'd prefer a way that doesn't require using "kill".

Thank you,

Herouth Maoz

Re: stopping processes, preventing connections

From

Josh Kupershmidt

Date:

03 March 2010, 12:02:29

On Wed, Mar 3, 2010 at 8:31 AM, Herouth Maoz <herouth@unicell.co.il> wrote:

First, the easy part - regarding allowing/disallowing queries. Is it possible to GRANT or REVOKE access to tables based on the originating IP?

I'd suggest separating out access to your tables by roles, and then restricting those roles to certain IP ranges in pg_hba.conf.

Second, and the more complicated one - what do I do about rogue queries that are running when my process starts? Today we had a query that ran since yesterday. I called pg_cancel_backend() on it several times and waited for almost two hours - to no avail. Eventually I had to ask our sysadmin to shutdown PostgreSQL, which took some five minutes, but eventually worked. Is there a way to do the same thing to a single process without shutting down the whole server, and without causing any harm to the database or memory corruption? Something I can call from within SQL? I run the nightly script from a linux user which is not "postgres", so I'd prefer a way that doesn't require using "kill".

On 8.4, you can use pg_terminate_backend(), which sends a SIGTERM instead of pg_cancel_backend's SIGINT. If you're not on 8.4, you can just do this manually with a "kill -SIGTERM backend_pid". If that doesn't work either, you might have to resort to a "kill -SIGKILL backend_pid". Killing a single backend should be much better for you than restarting Postgres entirely. These operations shouldn't result in database corruption.

You have to be database superuser to use pg_cancel_backend() or pg_terminate_backend(), or have a shell login as the database user to use "kill". No way around that for now.

Though next time you see a query which doesn't respond to pg_cancel_backend(), try gathering information about the query and what the backend is doing; either you're doing something unusual (e.g. an app is restarting the query automatically after getting canceled) or perhaps you've stumbled on a bug in Postgres.

Josh

Re: stopping processes, preventing connections

From

Herouth Maoz

Date:

03 March 2010, 12:07:53

On Mar 3, 2010, at 18:01 , Josh Kupershmidt wrote:

On Wed, Mar 3, 2010 at 8:31 AM, Herouth Maoz <herouth@unicell.co.il> wrote:

First, the easy part - regarding allowing/disallowing queries. Is it possible to GRANT or REVOKE access to tables based on the originating IP?

I'd suggest separating out access to your tables by roles, and then restricting those roles to certain IP ranges in pg_hba.conf.

Thank you. I guess I will go for something simple - I'll give the lady in charge of the reports machine a new user/password to use, and revoke that user's access. I was hoping to avoid her needing to change settings in Windows, but it seems to be the easiest way.

Second, and the more complicated one - what do I do about rogue queries that are running when my process starts? Today we had a query that ran since yesterday. I called pg_cancel_backend() on it several times and waited for almost two hours - to no avail. Eventually I had to ask our sysadmin to shutdown PostgreSQL, which took some five minutes, but eventually worked. Is there a way to do the same thing to a single process without shutting down the whole server, and without causing any harm to the database or memory corruption? Something I can call from within SQL? I run the nightly script from a linux user which is not "postgres", so I'd prefer a way that doesn't require using "kill".

On 8.4, you can use pg_terminate_backend(), which sends a SIGTERM instead of pg_cancel_backend's SIGINT. If you're not on 8.4, you can just do this manually with a "kill -SIGTERM backend_pid". If that doesn't work either, you might have to resort to a "kill -SIGKILL backend_pid". Killing a single backend should be much better for you than restarting Postgres entirely. These operations shouldn't result in database corruption.

You have to be database superuser to use pg_cancel_backend() or pg_terminate_backend(), or have a shell login as the database user to use "kill". No way around that for now.

Unfortunately, we only have 8.3 - is SIGTERM safe on 8.3?

I guess I'll have to sudo or use local ssh.

Though next time you see a query which doesn't respond to pg_cancel_backend(), try gathering information about the query and what the backend is doing; either you're doing something unusual (e.g. an app is restarting the query automatically after getting canceled) or perhaps you've stumbled on a bug in Postgres.

I'd appreciate it if you tell me what to look for. It was running a join on several tables, but nothing too complicated. It may be that the query is not optimized (one of the tables is not indexed properly) but it still should respond to cancel - shouldn't it?

Thank you very much,

Herouth

Re: stopping processes, preventing connections

From

Josh Kupershmidt

Date:

03 March 2010, 14:32:35

Second, and the more complicated one - what do I do about rogue queries that are running when my process starts? Today we had a query that ran since yesterday. I called pg_cancel_backend() on it several times and waited for almost two hours - to no avail. Eventually I had to ask our sysadmin to shutdown PostgreSQL, which took some five minutes, but eventually worked. Is there a way to do the same thing to a single process without shutting down the whole server, and without causing any harm to the database or memory corruption? Something I can call from within SQL? I run the nightly script from a linux user which is not "postgres", so I'd prefer a way that doesn't require using "kill".

On 8.4, you can use pg_terminate_backend(), which sends a SIGTERM instead of pg_cancel_backend's SIGINT. If you're not on 8.4, you can just do this manually with a "kill -SIGTERM backend_pid". If that doesn't work either, you might have to resort to a "kill -SIGKILL backend_pid". Killing a single backend should be much better for you than restarting Postgres entirely. These operations shouldn't result in database corruption.

You have to be database superuser to use pg_cancel_backend() or pg_terminate_backend(), or have a shell login as the database user to use "kill". No way around that for now.

Unfortunately, we only have 8.3 - is SIGTERM safe on 8.3?

I guess I'll have to sudo or use local ssh.

Yeah, I think any of the kill modes on the backends should be "safe" in terms of your data. If you're interested in killing without having to SSH in, you could play with something like:
http://wiki.postgresql.org/wiki/Backend_killer_function
and adjust it as needed (permissions, etc) for your needs.

Though next time you see a query which doesn't respond to pg_cancel_backend(), try gathering information about the query and what the backend is doing; either you're doing something unusual (e.g. an app is restarting the query automatically after getting canceled) or perhaps you've stumbled on a bug in Postgres.

I'd appreciate it if you tell me what to look for. It was running a join on several tables, but nothing too complicated. It may be that the query is not optimized (one of the tables is not indexed properly) but it still should respond to cancel - shouldn't it?

I came across an old thread discussing a problem which sounds similar to yours (process not dying with a pg_cancel_backend()):
http://archives.postgresql.org/pgsql-general/2007-10/msg01696.php
there's some good info throughout that thread, including the recommendation from Tom Lane to try to reproduce, assuming your query isn't stuck inside a plpython or similar function.

If you can reproduce your hung query which doesn't respond to a pg_cancel_backend(), try following instructions here (assuming your server is a Linux/BSD machine) to gather more information about what's going on inside the backend:
http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
strace, top, and/or vmstat information might also be useful.

Hope this helps,
Josh

Re: stopping processes, preventing connections

From

Herouth Maoz

Date:

17 March 2010, 07:32:38

On Mar 3, 2010, at 18:01 , Josh Kupershmidt wrote:

Though next time you see a query which doesn't respond to pg_cancel_backend(), try gathering information about the query and what the backend is doing; either you're doing something unusual (e.g. an app is restarting the query automatically after getting canceled) or perhaps you've stumbled on a bug in Postgres.

Hi. A long time has passed since you made that suggestion, but today we stumbled again on a query that wouldn't be canceled. Not only does it not respond to pg_cancel_backend(), it also doesn't respond to kill -SIGTERM.

The query is:

select date_trunc('day',rb.time_stamp),count(*),rb.category,channels.channel_id,channels.name as channel,platforms.platform_id,

platforms.name,rb.operator,item,delivered,msisdn

from public.rb__view as rb,channels,platforms

where

rb.channel_id=channels.channel_id

and rb.platform_id=platforms.platform_id

and rb.time_stamp>='2010-03-14'::date and rb.time_stamp<'2010-03-14'::date + interval '1 day'

and platforms.platform_id=262

and channels.channel_id=1

group by date_trunc('day',rb.time_stamp),rb.category,channels.channel_id,channel,operator,item,delivered,msisdn,platforms.platform_id,

platforms.name

This is nothing too fancy - just an aggregate with group by. And the application on the other side is Crystal Reports, connecting using ODBC. I don't believe the application does anything like restart after cancel, because most of our queries can easily be cancelled and I don't think Crystal has different behaviors for different queries.

rb__view is a union all between two tables (rb and rb__archive) which have the same schema - one holds data from the past 7 weeks and the other holds older data. The channels and platforms tables are basically lookup tables. The fields item,delivered and msisdn all belong to rb__view.

There is nothing in the PostgreSQL log.

If it helps any, this is the EXPLAIN output for the above query. Note that at this time, the query has been running for over a hour and a half.

HashAggregate (cost=221312.77..221318.08 rows=354 width=94)

-> Nested Loop (cost=8078.83..221215.50 rows=3537 width=94)

-> Seq Scan on channels (cost=0.00..3.81 rows=1 width=16)

Filter: (channel_id = 1::numeric)

-> Nested Loop (cost=8078.83..221167.48 rows=3537 width=85)

-> Index Scan using platforms_pkey on platforms (cost=0.00..6.27 rows=1 width=19)

Index Cond: (platform_id = 262::numeric)

-> Append (cost=8078.83..221125.84 rows=3537 width=73)

-> Bitmap Heap Scan on rb (cost=8078.83..221115.42 rows=3536 width=72)

Recheck Cond: ((public.rb.time_stamp >= '2010-03-14'::date) AND (public.rb.time_stamp < '2010-03-15 00:00:00'::timestamp without time zone))

Filter: ((public.rb.channel_id = 1::numeric) AND (public.rb.platform_id = 262::numeric))

-> Bitmap Index Scan on rb_timestamp_ind (cost=0.00..8077.94 rows=104502 width=0)

Index Cond: ((public.rb.time_stamp >= '2010-03-14'::date) AND (public.rb.time_stamp < '2010-03-15 00:00:00'::timestamp without time zone))

-> Index Scan using rba_timestamp_ind on rb__archive (cost=0.00..10.42 rows=1 width=73)

Index Cond: ((rb__archive.time_stamp >= '2010-03-14'::date) AND (rb__archive.time_stamp < '2010-03-15 00:00:00'::timestamp without time zone))

Filter: ((rb__archive.channel_id = 1::numeric) AND (rb__archive.platform_id = 262::numeric))

I don't know what other information I may glean for this. Any thoughts?

Thank you,

Herouth

Re: stopping processes, preventing connections

From

Craig Ringer

Date:

17 March 2010, 08:34:34

On 17/03/2010 6:32 PM, Herouth Maoz wrote:
>
> On Mar 3, 2010, at 18:01 , Josh Kupershmidt wrote:
>
>> Though next time you see a query which doesn't respond to
>> pg_cancel_backend(), try gathering information about the query and
>> what the backend is doing; either you're doing something unusual (e.g.
>> an app is restarting the query automatically after getting canceled)
>> or perhaps you've stumbled on a bug in Postgres.
>
> Hi. A long time has passed since you made that suggestion, but today we
> stumbled again on a query that wouldn't be canceled. Not only does it
> not respond to pg_cancel_backend(), it also doesn't respond to kill
> -SIGTERM.

Interesting. If you attach gdb to the backend and run "backtrace",
what's the output?

If you strace the backend, what do you see?

--
Craig Ringer

Re: stopping processes, preventing connections

From

Herouth Maoz

Date:

17 March 2010, 09:43:35

On Mar 17, 2010, at 13:34 , Craig Ringer wrote:

> On 17/03/2010 6:32 PM, Herouth Maoz wrote:
>>
>> On Mar 3, 2010, at 18:01 , Josh Kupershmidt wrote:
>>
>>> Though next time you see a query which doesn't respond to
>>> pg_cancel_backend(), try gathering information about the query and
>>> what the backend is doing; either you're doing something unusual (e.g.
>>> an app is restarting the query automatically after getting canceled)
>>> or perhaps you've stumbled on a bug in Postgres.
>>
>> Hi. A long time has passed since you made that suggestion, but today we
>> stumbled again on a query that wouldn't be canceled. Not only does it
>> not respond to pg_cancel_backend(), it also doesn't respond to kill
>> -SIGTERM.
>
> Interesting. If you attach gdb to the backend and run "backtrace", what's the output?

(gdb) backtrace
#0  0x8dfcb410 in ?? ()
#1  0xbff10a28 in ?? ()
#2  0x083b1bf4 in ?? ()
#3  0xbff10a00 in ?? ()
#4  0x8db98361 in send () from /lib/tls/i686/cmov/libc.so.6
#5  0x08195d54 in secure_write ()
#6  0x0819dc7e in pq_setkeepalivesidle ()
#7  0x0819ddd5 in pq_flush ()
#8  0x0819de3d in pq_putmessage ()
#9  0x0819fa63 in pq_endmessage ()
#10 0x08086dcb in printtup_create_DR ()
#11 0x08178dc4 in ExecutorRun ()
#12 0x08222326 in PostgresMain ()
#13 0x082232c0 in PortalRun ()
#14 0x0821e27d in pg_parse_query ()
#15 0x08220056 in PostgresMain ()
#16 0x081ef77f in ClosePostmasterPorts ()
#17 0x081f0731 in PostmasterMain ()
#18 0x081a0484 in main ()

>
> If you strace the backend, what do you see?

All I get is this:
send(9, "00:00\0\0\0\0011\377\377\377\377\0\0\0\0011\0\0\0\0041"..., 1541, 0

I waited about 20 minutes after receiving that, but nothing further was output.

Thank you,
Herouth

Re: stopping processes, preventing connections

From

Craig Ringer

Date:

17 March 2010, 09:56:46

On 17/03/2010 8:43 PM, Herouth Maoz wrote:
>
> On Mar 17, 2010, at 13:34 , Craig Ringer wrote:
>
>> On 17/03/2010 6:32 PM, Herouth Maoz wrote:
>>>
>>> On Mar 3, 2010, at 18:01 , Josh Kupershmidt wrote:
>>>
>>>> Though next time you see a query which doesn't respond to
>>>> pg_cancel_backend(), try gathering information about the query and
>>>> what the backend is doing; either you're doing something unusual (e.g.
>>>> an app is restarting the query automatically after getting canceled)
>>>> or perhaps you've stumbled on a bug in Postgres.
>>>
>>> Hi. A long time has passed since you made that suggestion, but today we
>>> stumbled again on a query that wouldn't be canceled. Not only does it
>>> not respond to pg_cancel_backend(), it also doesn't respond to kill
>>> -SIGTERM.
>>
>> Interesting. If you attach gdb to the backend and run "backtrace", what's the output?
>
> (gdb) backtrace
> #0  0x8dfcb410 in ?? ()
> #1  0xbff10a28 in ?? ()
> #2  0x083b1bf4 in ?? ()
> #3  0xbff10a00 in ?? ()
> #4  0x8db98361 in send () from /lib/tls/i686/cmov/libc.so.6
> #5  0x08195d54 in secure_write ()
> #6  0x0819dc7e in pq_setkeepalivesidle ()
> #7  0x0819ddd5 in pq_flush ()
> #8  0x0819de3d in pq_putmessage ()
> #9  0x0819fa63 in pq_endmessage ()
> #10 0x08086dcb in printtup_create_DR ()
> #11 0x08178dc4 in ExecutorRun ()
> #12 0x08222326 in PostgresMain ()
> #13 0x082232c0 in PortalRun ()
> #14 0x0821e27d in pg_parse_query ()
> #15 0x08220056 in PostgresMain ()
> #16 0x081ef77f in ClosePostmasterPorts ()
> #17 0x081f0731 in PostmasterMain ()
> #18 0x081a0484 in main ()

OK, so it seems to be stuck sending data down a socket. The fact that
strace isn't reporting any new system calls suggests the backend is just
blocked on that send() call and isn't doing any work.

Is there any chance the client has disconnected/disappeared?

--
Craig Ringer

Re: stopping processes, preventing connections

From

Herouth Maoz

Date:

17 March 2010, 10:08:08

On Mar 17, 2010, at 14:56 , Craig Ringer wrote:

> On 17/03/2010 8:43 PM, Herouth Maoz wrote:
>>
>> On Mar 17, 2010, at 13:34 , Craig Ringer wrote:
>>
>>> On 17/03/2010 6:32 PM, Herouth Maoz wrote:
>>>>
>>>> On Mar 3, 2010, at 18:01 , Josh Kupershmidt wrote:
>>>>
>>>>> Though next time you see a query which doesn't respond to
>>>>> pg_cancel_backend(), try gathering information about the query and
>>>>> what the backend is doing; either you're doing something unusual (e.g.
>>>>> an app is restarting the query automatically after getting canceled)
>>>>> or perhaps you've stumbled on a bug in Postgres.
>>>>
>>>> Hi. A long time has passed since you made that suggestion, but today we
>>>> stumbled again on a query that wouldn't be canceled. Not only does it
>>>> not respond to pg_cancel_backend(), it also doesn't respond to kill
>>>> -SIGTERM.
>>>
>>> Interesting. If you attach gdb to the backend and run "backtrace", what's the output?
>>
>> (gdb) backtrace
>> #0  0x8dfcb410 in ?? ()
>> #1  0xbff10a28 in ?? ()
>> #2  0x083b1bf4 in ?? ()
>> #3  0xbff10a00 in ?? ()
>> #4  0x8db98361 in send () from /lib/tls/i686/cmov/libc.so.6
>> #5  0x08195d54 in secure_write ()
>> #6  0x0819dc7e in pq_setkeepalivesidle ()
>> #7  0x0819ddd5 in pq_flush ()
>> #8  0x0819de3d in pq_putmessage ()
>> #9  0x0819fa63 in pq_endmessage ()
>> #10 0x08086dcb in printtup_create_DR ()
>> #11 0x08178dc4 in ExecutorRun ()
>> #12 0x08222326 in PostgresMain ()
>> #13 0x082232c0 in PortalRun ()
>> #14 0x0821e27d in pg_parse_query ()
>> #15 0x08220056 in PostgresMain ()
>> #16 0x081ef77f in ClosePostmasterPorts ()
>> #17 0x081f0731 in PostmasterMain ()
>> #18 0x081a0484 in main ()
>
> OK, so it seems to be stuck sending data down a socket. The fact that strace isn't reporting any new system calls
suggeststhe backend is just blocked on that send() call and isn't doing any work. 
>
> Is there any chance the client has disconnected/disappeared?

Yes, certainly. In fact, I mentioned in the past that the product we use for our reports, which is an application built
ontop of Crystal Reports, when told to cancel a report or when a report times out, instead of telling Crystal to cancel
queriesproperly, simply kills Crystal's processes on the Windows machine side - which leaves us with orphan backends.
It'sstupid, but it's not under our control. But most of the time the backends respond to cancel requests.  

Aren't socket writes supposed to have time outs of some sort? Stupid policies notwithstanding, processes on the client
sidecan disappear for any number of reasons - bugs, power failures, whatever - and this is not something that is
supposedto cause a backend to hang, I would assume. 

Is there anything I can do about it?

Herouth

Re: stopping processes, preventing connections

From

Tom Lane

Date:

17 March 2010, 11:50:59

Craig Ringer <craig@postnewspapers.com.au> writes:
> On 17/03/2010 8:43 PM, Herouth Maoz wrote:
>> (gdb) backtrace
>> #0  0x8dfcb410 in ?? ()
>> #1  0xbff10a28 in ?? ()
>> #2  0x083b1bf4 in ?? ()
>> #3  0xbff10a00 in ?? ()
>> #4  0x8db98361 in send () from /lib/tls/i686/cmov/libc.so.6
>> #5  0x08195d54 in secure_write ()
>> #6  0x0819dc7e in pq_setkeepalivesidle ()
>> #7  0x0819ddd5 in pq_flush ()
>> #8  0x0819de3d in pq_putmessage ()
>> #9  0x0819fa63 in pq_endmessage ()
>> #10 0x08086dcb in printtup_create_DR ()
>> #11 0x08178dc4 in ExecutorRun ()
>> #12 0x08222326 in PostgresMain ()
>> #13 0x082232c0 in PortalRun ()
>> #14 0x0821e27d in pg_parse_query ()
>> #15 0x08220056 in PostgresMain ()
>> #16 0x081ef77f in ClosePostmasterPorts ()
>> #17 0x081f0731 in PostmasterMain ()
>> #18 0x081a0484 in main ()

> OK, so it seems to be stuck sending data down a socket.

Since this type of problem is fresh in mind: I wonder if the connection
is SSL-encrypted and the session just crossed the 512MB-total-data-sent
threshold.  If so it would have tried to renegotiate session keys, and
some SSL vendors have recently broken that functionality rather badly.
Silent hangups seem to be par for the course.

If that's it, your choices are to find an SSL library that has an actual
fix for CVE-2009-3555 rather than this brain damage, or to update to
this week's PG releases so you can set ssl_renegotiation_limit to zero.

If you're not using SSL, just ignore me ...

            regards, tom lane

Re: stopping processes, preventing connections

From

Greg Smith

Date:

17 March 2010, 15:17:58

Herouth Maoz wrote:
> Aren't socket writes supposed to have time outs of some sort? Stupid policies notwithstanding, processes on the
clientside can disappear for any number of reasons - bugs, power failures, whatever - and this is not something that is
supposedto cause a backend to hang, I would assume.
>

Note that you're not in the PostgreSQL code at the point where this is
stuck at--you're deep in the libc socket code. Making sure that sockets
will always have well behaved behavior at the OS level is not always
possible, due to the TPC/IP's emphasis on robust delivery. See section
2.8 "Why does it take so long to detect that the peer died?" at
http://www.faqs.org/faqs/unix-faq/socket/ for some background here, and
note that the point you're stuck in is inside of keepalive handling in
the database trying to do the right thing here.

As a general commentary on this area, in most cases where I've seen an
unkillable backend, which usually becomes noticed when the server won't
shutdown, have resulted from bad socket behavior. It's really a tricky
area to get right, and presuming the database backends will be robust in
the case of every possible weird OS behavior is hard to guarantee.

However, if you can repeatably get the server into this bad state at
will, it may be worth spending some more time digging into this in hopes
there is something valuable to learn about your situation that can
improve the keepalive handling on the server side. Did you mention your
PostgreSQL server version and platform? I didn't see the exact code
path you're stuck in during a quick look at the code involved (using a
snapshot of recent development), which makes me wonder if this isn't
already a resolved problem in a newer version.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us

Re: stopping processes, preventing connections

From

Herouth Maoz

Date:

20 March 2010, 14:51:16

quoth Greg Smith:

> Herouth Maoz wrote:
>> Aren't socket writes supposed to have time outs of some sort? Stupid
>> policies notwithstanding, processes on the client side can disappear
>> for any number of reasons - bugs, power failures, whatever - and this
>> is not something that is supposed to cause a backend to hang, I would
>> assume.
>>
> As a general commentary on this area, in most cases where I've seen an
> unkillable backend, which usually becomes noticed when the server
> won't shutdown, have resulted from bad socket behavior.  It's really a
> tricky area to get right, and presuming the database backends will be
> robust in the case of every possible weird OS behavior is hard to
> guarantee.
> However, if you can repeatably get the server into this bad state at
> will, it may be worth spending some more time digging into this in
> hopes there is something valuable to learn about your situation that
> can improve the keepalive handling on the server side.  Did you
> mention your PostgreSQL server version and platform?  I didn't see the
> exact code path you're stuck in during a quick look at the code
> involved (using a snapshot of recent development), which makes me
> wonder if this isn't already a resolved problem in a newer version.
>
The server version is 8.3.1. Migration to a higher version might be
difficult as far as policies go, if there isn't a supported debian
package for it, but if you can point out a version where this has been
fixed I might be able to persuade my boss and sysadmin.

Thank you for referring me to that entry in the FAQ.

By the way, the situation repeated itself today as well.

Thanks,
Herouth

Re: stopping processes, preventing connections

From

Scott Marlowe

Date:

20 March 2010, 15:31:43

On Sat, Mar 20, 2010 at 11:44 AM, Herouth Maoz <herouth@unicell.co.il> wrote:
> The server version is 8.3.1. Migration to a higher version might be
> difficult as far as policies go, if there isn't a supported debian package
> for it, but if you can point out a version where this has been fixed I might
> be able to persuade my boss and sysadmin.

Most of the time it is more dangerous to NOT update PostgreSQL to the
latest minor point version than to stay on an older minor point
version.  The occasions when a minor point upgrade come out that is
dangerous are rare, and the next minor point version to fix it shows
up the next day while the broken one is pulled.  I think that's
happened 1 or 2 times during the time I've been using postgresql.  So,
if it's 48 hours old and no alarm bells have gone off that it's being
pulled and replaced, a pg update is the right thing to do.  Backup
beforehand, etc.

The danger of a change making your application stop are very low,
while the danger of leaving some unpatched bit of nastiness in the
backend is much greater a possible problem.  I.e. data loss /
corruption, things like that.  And something as mature as 8.3 is now
shouldn't be running in production missing two years of patches.
Start with the release notes for 8.3.2 and move forward and see if
anything there looks like a problem for your app.  Behaviour changing
changes rarely get into production releases, they get saved for the
next major version.  If they do they are well noted in the release
notes.

Re: stopping processes, preventing connections

From

Herouth Maoz

Date:

20 March 2010, 18:58:09

ציטוט Scott Marlowe:

On Sat, Mar 20, 2010 at 11:44 AM, Herouth Maoz <herouth@unicell.co.il> wrote:

The server version is 8.3.1. Migration to a higher version might be
difficult as far as policies go, if there isn't a supported debian package
for it, but if you can point out a version where this has been fixed I might
be able to persuade my boss and sysadmin.

Most of the time it is more dangerous to NOT update PostgreSQL to the
latest minor point version than to stay on an older minor point
version.  The occasions when a minor point upgrade come out that is
dangerous are rare, and the next minor point version to fix it shows
up the next day while the broken one is pulled.  I think that's
happened 1 or 2 times during the time I've been using postgresql.  So,
if it's 48 hours old and no alarm bells have gone off that it's being
pulled and replaced, a pg update is the right thing to do.  Backup
beforehand, etc.

The danger of a change making your application stop are very low,
while the danger of leaving some unpatched bit of nastiness in the
backend is much greater a possible problem.  I.e. data loss /
corruption, things like that.  And something as mature as 8.3 is now
shouldn't be running in production missing two years of patches.
Start with the release notes for 8.3.2 and move forward and see if
anything there looks like a problem for your app.  Behaviour changing
changes rarely get into production releases, they get saved for the
next major version.  If they do they are well noted in the release
notes.

The problem is not so much danger in upgrading, but the fact that doing so without using the system's usual security/bugfix update path means non-standard work for the sysadmin, meaning he has to upgrade every package on the system using a different upgrade method, being notified about it from a different source, and needing to check each one in different conditions, which makes his work impossible. So the policy so far has been "Use the packages available through debian". So I'll need to check if there is an upgrade available through that path - and the question is whether it's worthwhile (i.e. whether the bug in question has indeed been fixed).

Herouth

Re: stopping processes, preventing connections

From

Scott Marlowe

Date:

20 March 2010, 20:12:48

On Sat, Mar 20, 2010 at 3:57 PM, Herouth Maoz <herouth@unicell.co.il> wrote:
>
>
> The problem is not so much danger in upgrading, but the fact that doing so
> without using the system's usual security/bugfix update path means
> non-standard work for the sysadmin, meaning he has to upgrade every package
> on the system using a different upgrade method, being notified about it from
> a different source, and needing to check each one in different conditions,
> which makes his work impossible. So the policy so far has been "Use the
> packages available through debian". So I'll need to check if there is an
> upgrade available through that path - and the question is whether it's
> worthwhile (i.e. whether the bug in question has indeed been fixed).

I'm certain debian keeps the pgsql packages up to date within a few
days or at most weeks of their release .

Re: stopping processes, preventing connections

From

Craig Ringer

Date:

21 March 2010, 08:33:50

On 21/03/2010 7:12 AM, Scott Marlowe wrote:
> On Sat, Mar 20, 2010 at 3:57 PM, Herouth Maoz<herouth@unicell.co.il>  wrote:
>>
>>
>> The problem is not so much danger in upgrading, but the fact that doing so
>> without using the system's usual security/bugfix update path means
>> non-standard work for the sysadmin, meaning he has to upgrade every package
>> on the system using a different upgrade method, being notified about it from
>> a different source, and needing to check each one in different conditions,
>> which makes his work impossible. So the policy so far has been "Use the
>> packages available through debian". So I'll need to check if there is an
>> upgrade available through that path - and the question is whether it's
>> worthwhile (i.e. whether the bug in question has indeed been fixed).
>
> I'm certain debian keeps the pgsql packages up to date within a few
> days or at most weeks of their release .

In sid (unstable), sure. But the stable releases don't usually see major
version upgrades (like 8.3 to 8.4) unless they're done via unofficial
channels like backports.org .

--
Craig Ringer

Re: stopping processes, preventing connections

From

Scott Marlowe

Date:

21 March 2010, 09:44:50

On Sun, Mar 21, 2010 at 5:33 AM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 21/03/2010 7:12 AM, Scott Marlowe wrote:
>>
>> On Sat, Mar 20, 2010 at 3:57 PM, Herouth Maoz<herouth@unicell.co.il>
>>  wrote:
>>>
>>>
>>> The problem is not so much danger in upgrading, but the fact that doing
>>> so
>>> without using the system's usual security/bugfix update path means
>>> non-standard work for the sysadmin, meaning he has to upgrade every
>>> package
>>> on the system using a different upgrade method, being notified about it
>>> from
>>> a different source, and needing to check each one in different
>>> conditions,
>>> which makes his work impossible. So the policy so far has been "Use the
>>> packages available through debian". So I'll need to check if there is an
>>> upgrade available through that path - and the question is whether it's
>>> worthwhile (i.e. whether the bug in question has indeed been fixed).
>>
>> I'm certain debian keeps the pgsql packages up to date within a few
>> days or at most weeks of their release .
>
> In sid (unstable), sure. But the stable releases don't usually see major
> version upgrades (like 8.3 to 8.4) unless they're done via unofficial
> channels like backports.org .

It was a few posts back, but our discussion point was minor point
upgrades and the fact that OP was running 8.3.1 and not sure there
were updates to 8.3.9 (or latest) out there for debian.  I'm quite
sure debian has 8.3.9 out by now.

Re: stopping processes, preventing connections

From

Dimitri Fontaine

Date:

22 March 2010, 12:45:45

Scott Marlowe <scott.marlowe@gmail.com> writes:
> It was a few posts back, but our discussion point was minor point
> upgrades and the fact that OP was running 8.3.1 and not sure there
> were updates to 8.3.9 (or latest) out there for debian.  I'm quite
> sure debian has 8.3.9 out by now.

Yes:

   http://packages.debian.org/lenny/postgresql-8.3
   http://packages.debian.org/etch-backports/postgresql-8.3

You wont' find it in testing/unstable though, because next stable will
contain 8.4 only, as far as I understand.

Regards,
--
dim