Re: Autovacuum seems to block database: WARNING worker took too long to start - Mailing list pgsql-admin

From Alvaro Herrera
Subject Re: Autovacuum seems to block database: WARNING worker took too long to start
Date
Msg-id 1290091390-sup-7218@alvh.no-ip.org
Whole thread Raw
In response to Autovacuum seems to block database: WARNING worker took too long to start  (Pablo Delgado Díaz-Pache <delgadop@gmail.com>)
List pgsql-admin
Excerpts from Pablo Delgado Díaz-Pache's message of jue nov 18 08:57:16 -0300 2010:

> 2) We did a strace to the postmaster pid. However we had 2 postmasters not
> dead
>
> # ps -fea |grep -i postmaster
> postgres  3889     1  0 Nov16 ?        00:01:24 /usr/bin/postmaster -p 5432
> -D /var/lib/pgsql/data
> postgres  7601  3889  0 12:37 ?        00:00:00 /usr/bin/postmaster -p 5432
> -D /var/lib/pgsql/data
>
> As soon as we did a "strace" to the 3889 pid everything started to work
> again.

Sorry for my previous response -- evidently I failed to scroll down
enough to notice this part.

It seems to me that this process was stuck in a unnatural way.

> Not sure it was a coincidence but that was the way it was.
>
> *# strace -p 3889*
> *Process 3889 attached - interrupt to quit*
> *select(6, [3 4 5], NULL, NULL, {56, 930000}) = ? ERESTARTNOHAND (To be
> restarted)*
> *--- SIGUSR1 (User defined signal 1) @ 0 (0) ---*
> *rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN
> RT_1], NULL, 8) = 0*

This seems normal postmaster activity: receiving SIGUSR1, then SIGCHLD,
and doing stuff accordingly.

Rather than a coincidence, I would think that the act of tracing it made
it come back to life.  A kernel bug maybe?  Have you upgraded your
kernel or libc lately?

> I also straced the other postmaster pid
>
> *# strace -p 7601*
> *Process 7601 attached - interrupt to quit*
> *recvfrom(8, "P\0\0\0\221\0select id_key from transla"..., 8192, 0, NULL,
> NULL) = 181*

This one seems like a regular postmaster child that hadn't gotten around
to changing its ps status yet.  (Note it had PPID 3889 which is
consistent with this idea.)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

pgsql-admin by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Autovacuum seems to block database: WARNING worker took too long to start
Next
From: Frederiko Costa
Date:
Subject: Re: Find all running postgres DB servers on a network