Thread: can't get connection during pg_dump

can't get connection during pg_dump

From
Steve Linabery
Date:
Hi,

I'm running 8.1.x on FC2 with smp.

The docs claim that pg_dump should allow normal operation of the db with the exception of things like 'VACUUM FULL'.

Last night during backup, I noticed that I could not get a connection, either remotely via jdbc or locally via psql.
Wellperhaps that is inaccurate: 'ps -ef' showed several processes with a status message of "waiting to startup" (or
somethingsimilar; sorry, I was late-night coding and didn't think to write it down). 

My installation has three databases in use. The command used to invoke the dump of one of these databases was:
/usr/local/pgsql/bin/pg_dump -Fc -b mydatabasename > /back/mydatabasename.dump

I looked through the docs to see if there was some runtime configuration I was missing, but saw nothing.

Anyone know what I might be doing wrong?

Thanks,
--
Steve Linabery, sysadmin/developer
B94B C3C7 8A27 FF09 3C9D  E992 5A20 2492 D5F5 EE51


This electronic message transmission contains information from the sender's organization that may be proprietary,
confidentialand/or privileged. The information is intended only for the use of the individual(s) or entity named above.
Ifyou are not the intended recipient, be aware that any disclosure, copying or distribution or use of the contents of
thisinformation is prohibited. If you have received this electronic transmission in error, please notify the sender
immediatelyby replying to the address listed in the "From:" 


Re: can't get connection during pg_dump

From
Tom Lane
Date:
Steve Linabery <slinabery@worldcycling.com> writes:
> Last night during backup, I noticed that I could not get a connection, either remotely via jdbc or locally via psql.
Wellperhaps that is inaccurate: 'ps -ef' showed several processes with a status message of "waiting to startup" (or
somethingsimilar; sorry, I was late-night coding and didn't think to write it down). 

Hm, I don't recall any such status message in the code.  Can you double
check what it said exactly?

What do you mean by "could not get a connection" ... did it fail (if so,
with what client-side error message) or just hang up waiting?

            regards, tom lane

Re: can't get connection during pg_dump

From
Steve Linabery
Date:
On Thu, Mar 30, 2006 at 11:14:37AM -0500, Tom Lane wrote:
> Steve Linabery <slinabery@worldcycling.com> writes:
> > Last night during backup, I noticed that I could not get a connection, either remotely via jdbc or locally via
psql.Well perhaps that is inaccurate: 'ps -ef' showed several processes with a status message of "waiting to startup"
(orsomething similar; sorry, I was late-night coding and didn't think to write it down). 
>
> Hm, I don't recall any such status message in the code.  Can you double
> check what it said exactly?

I just replicated the problem, see below for explanation, but here is ps output:

postgres  3201     1  0 Mar13 ?        01:58:14 /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
postgres  3270  3201  0 Mar13 ?        00:04:11 postgres: writer process
postgres  3271  3201  0 Mar13 ?        00:11:28 postgres: stats buffer process
postgres  3272  3271  0 Mar13 ?        00:07:11 postgres: stats collector process
postgres  3360  3201  0 Mar29 ?        00:00:00 postgres: dbusername other_db_name obfus.ip.address.36(53524) idle
  
postgres  3363  3201  0 Mar29 ?        00:00:00 postgres: dbusername other_db_name obfus.ip.address.36(53536) idle
  
postgres  3364  3201  0 Mar29 ?        00:00:00 postgres: dbusername other_db_name obfus.ip.address.36(53537) idle
  
root     16467 16447  0 09:59 pts/0    00:00:00 su postgres
postgres 16468 16467  0 09:59 pts/0    00:00:00 bash
postgres 25817  3201 55 10:20 ?        00:02:54 postgres: postgres db_being_dumped [local] COPY
postgres 27956  3201 46 10:24 ?        00:00:36 postgres: postgres db_being_dumped [local] COPY
postgres 28124  3201  0 10:25 ?        00:00:00 postgres: dbusername template1 obfus.ip.address.36(49528) DROP DATABASE
waiting
postgres 28180  3201  0 10:25 ?        00:00:00 postgres: dbusername db_being_dumped obfus.ip.address.37(48997) startup
waiting
postgres 28183  3201  0 10:25 ?        00:00:00 postgres: dbusername db_being_dumped obfus.ip.address.36(49532) startup
waiting


>
> What do you mean by "could not get a connection" ... did it fail (if so,
> with what client-side error message) or just hang up waiting?
>

All connection attempts just hung. No error message on the client side.

Here's what I did to replicate the problem:
0) first tried running pg_dump and there were no problems with connecting (from various web applications, mail server,
etc)
1) repeated what I did last night: am developing a java webapp with OJB, which uses torque & xdoclet to generate SQL
forthe various classes in the project. Part of the torque xdoclet task DROPS the database and recreates it (only while
you'rein development mode...still have to figure out how to turn that off!). 
2) postgresql hung once I ran the ant task and torque was trying to drop the database, as you can see from the ps
outputabove. 

So, what I'm gathering is that dropping a database counts as one of the operations that will hang, along with 'VACUUM
FULL',if pg_dump is running. Correct? I ran across a reference to 'VACUUM FULL' in the docs w.r.t. pg_dump in the
contextof troubleshooting this problem last night, but can't find it now. 

Thanks!
--
Steve Linabery, sysadmin/developer
B94B C3C7 8A27 FF09 3C9D  E992 5A20 2492 D5F5 EE51


This electronic message transmission contains information from the sender's organization that may be proprietary,
confidentialand/or privileged. The information is intended only for the use of the individual(s) or entity named above.
Ifyou are not the intended recipient, be aware that any disclosure, copying or distribution or use of the contents of
thisinformation is prohibited. If you have received this electronic transmission in error, please notify the sender
immediatelyby replying to the address listed in the "From:" 


Re: can't get connection during pg_dump

From
Tom Lane
Date:
Steve Linabery <slinabery@worldcycling.com> writes:
> I just replicated the problem, see below for explanation, but here is ps output:

> postgres 25817  3201 55 10:20 ?        00:02:54 postgres: postgres db_being_dumped [local] COPY
> postgres 27956  3201 46 10:24 ?        00:00:36 postgres: postgres db_being_dumped [local] COPY
> postgres 28124  3201  0 10:25 ?        00:00:00 postgres: dbusername template1 obfus.ip.address.36(49528) DROP
DATABASEwaiting 
> postgres 28180  3201  0 10:25 ?        00:00:00 postgres: dbusername db_being_dumped obfus.ip.address.37(48997)
startupwaiting 
> postgres 28183  3201  0 10:25 ?        00:00:00 postgres: dbusername db_being_dumped obfus.ip.address.36(49532)
startupwaiting 

Hm.  DROP DATABASE takes ExclusiveLock on pg_database, which blocks new
incoming connections, and both of those things are intentional (part of
the point of the lock is to block new connections to the victim
database, so it's hard to see how we could avoid this).

What doesn't seem so essential though is for pg_dump to be holding a
conflicting lock on pg_database.  In fact, a plain AccessShareLock
wouldn't block DROP DATABASE, so I'm not entirely sure that pg_dump
is at fault.  Could we see the contents of pg_locks for this situation?

            regards, tom lane