On Thu, Mar 30, 2006 at 11:14:37AM -0500, Tom Lane wrote:
> Steve Linabery <slinabery@worldcycling.com> writes:
> > Last night during backup, I noticed that I could not get a connection, either remotely via jdbc or locally via
psql.Well perhaps that is inaccurate: 'ps -ef' showed several processes with a status message of "waiting to startup"
(orsomething similar; sorry, I was late-night coding and didn't think to write it down).
>
> Hm, I don't recall any such status message in the code. Can you double
> check what it said exactly?
I just replicated the problem, see below for explanation, but here is ps output:
postgres 3201 1 0 Mar13 ? 01:58:14 /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
postgres 3270 3201 0 Mar13 ? 00:04:11 postgres: writer process
postgres 3271 3201 0 Mar13 ? 00:11:28 postgres: stats buffer process
postgres 3272 3271 0 Mar13 ? 00:07:11 postgres: stats collector process
postgres 3360 3201 0 Mar29 ? 00:00:00 postgres: dbusername other_db_name obfus.ip.address.36(53524) idle
postgres 3363 3201 0 Mar29 ? 00:00:00 postgres: dbusername other_db_name obfus.ip.address.36(53536) idle
postgres 3364 3201 0 Mar29 ? 00:00:00 postgres: dbusername other_db_name obfus.ip.address.36(53537) idle
root 16467 16447 0 09:59 pts/0 00:00:00 su postgres
postgres 16468 16467 0 09:59 pts/0 00:00:00 bash
postgres 25817 3201 55 10:20 ? 00:02:54 postgres: postgres db_being_dumped [local] COPY
postgres 27956 3201 46 10:24 ? 00:00:36 postgres: postgres db_being_dumped [local] COPY
postgres 28124 3201 0 10:25 ? 00:00:00 postgres: dbusername template1 obfus.ip.address.36(49528) DROP DATABASE
waiting
postgres 28180 3201 0 10:25 ? 00:00:00 postgres: dbusername db_being_dumped obfus.ip.address.37(48997) startup
waiting
postgres 28183 3201 0 10:25 ? 00:00:00 postgres: dbusername db_being_dumped obfus.ip.address.36(49532) startup
waiting
>
> What do you mean by "could not get a connection" ... did it fail (if so,
> with what client-side error message) or just hang up waiting?
>
All connection attempts just hung. No error message on the client side.
Here's what I did to replicate the problem:
0) first tried running pg_dump and there were no problems with connecting (from various web applications, mail server,
etc)
1) repeated what I did last night: am developing a java webapp with OJB, which uses torque & xdoclet to generate SQL
forthe various classes in the project. Part of the torque xdoclet task DROPS the database and recreates it (only while
you'rein development mode...still have to figure out how to turn that off!).
2) postgresql hung once I ran the ant task and torque was trying to drop the database, as you can see from the ps
outputabove.
So, what I'm gathering is that dropping a database counts as one of the operations that will hang, along with 'VACUUM
FULL',if pg_dump is running. Correct? I ran across a reference to 'VACUUM FULL' in the docs w.r.t. pg_dump in the
contextof troubleshooting this problem last night, but can't find it now.
Thanks!
--
Steve Linabery, sysadmin/developer
B94B C3C7 8A27 FF09 3C9D E992 5A20 2492 D5F5 EE51
This electronic message transmission contains information from the sender's organization that may be proprietary,
confidentialand/or privileged. The information is intended only for the use of the individual(s) or entity named above.
Ifyou are not the intended recipient, be aware that any disclosure, copying or distribution or use of the contents of
thisinformation is prohibited. If you have received this electronic transmission in error, please notify the sender
immediatelyby replying to the address listed in the "From:"