Thread: pg_stat_get_backen_last_activity() ???

pg_stat_get_backen_last_activity() ???

From
Sean Chittenden
Date:
Is there any notion of pg_stat_get_backend_last_activity()?  I've got
a rogue process someplace and I'm having a devil of a time tracking
down which process is not committing its transaction and letting the
rest of the world continue to access the tables.  Based off of the
last query performed, I'd have some idea as to where to hunt and could
figure this out, but as best as I can tell, the only option available
is to setup a query log and cruize through there with grep/less.  Am I
missing more programatic and obvious?  -sc

PS, the archives didn't turn up anything for 'idle in transaction' ???

--
Sean Chittenden

Re: pg_stat_get_backen_last_activity() ???

From
Tom Lane
Date:
Sean Chittenden <sean@chittenden.org> writes:
> Is there any notion of pg_stat_get_backend_last_activity()?  I've got
> a rogue process someplace and I'm having a devil of a time tracking
> down which process is not committing its transaction and letting the
> rest of the world continue to access the tables.

Perhaps look in pg_locks to see who's holding the oldest open
transaction number, or who's got the most locks that other people
are waiting for?  That should get you at least the PID of the offender.
Then use ps and/or pg_stat_activity to figure out what the offender
is doing.

            regards, tom lane

Re: pg_stat_get_backen_last_activity() ???

From
Sean Chittenden
Date:
> > Is there any notion of pg_stat_get_backend_last_activity()?  I've
> > got a rogue process someplace and I'm having a devil of a time
> > tracking down which process is not committing its transaction and
> > letting the rest of the world continue to access the tables.
>
> Perhaps look in pg_locks to see who's holding the oldest open
> transaction number, or who's got the most locks that other people
> are waiting for?  That should get you at least the PID of the
> offender.  Then use ps and/or pg_stat_activity to figure out what
> the offender is doing.

I can already identify that.  Using the PID of the backend + sockstat
on both the DB server and on the remote www server I can get the PID
of the webserver process causing the problem, but I have no clue which
customer, which page, which query, etc.  The www process is doing
something along the lines of:

BEGIN;
SELECT....
[waits forever]
END;

I could easily grep and figure out what the query is if I knew what
the prior query was because that'd give me some indication as to which
page it was that was sitting there pouting inside of a transaction.
Attaching GDB to random httpd processes isn't my idea of fun.  :-/ The
other problem is that with the frequency being around once every week
or so, it's one of those largely annoying things that isn't too
critical in its frequency, but just critical enough to spend some
cycles on here and there.  -sc

--
Sean Chittenden

Re: pg_stat_get_backen_last_activity() ???

From
Tom Lane
Date:
Sean Chittenden <sean@chittenden.org> writes:
> I can already identify that.  Using the PID of the backend + sockstat
> on both the DB server and on the remote www server I can get the PID
> of the webserver process causing the problem, but I have no clue which
> customer, which page, which query, etc.  The www process is doing
> something along the lines of:

> BEGIN;
> SELECT....
> [waits forever]
> END;

Oh, you want to know the actual current query.  Turn on
stats_command_string or whatever it's called.  See the statistics
collector documentation.

If this is after-the-fact (ie, you want to know about a process that's
hung right now), attach to it with a debugger and print out
debug_query_string.

            regards, tom lane

Re: pg_stat_get_backen_last_activity() ???

From
Sean Chittenden
Date:
> > I can already identify that.  Using the PID of the backend +
> > sockstat on both the DB server and on the remote www server I can
> > get the PID of the webserver process causing the problem, but I
> > have no clue which customer, which page, which query, etc.  The
> > www process is doing something along the lines of:
>
> > BEGIN;
> > SELECT....
> > [waits forever]
> > END;
>
> Oh, you want to know the actual current query.  Turn on
> stats_command_string or whatever it's called.  See the statistics
> collector documentation.
>
> If this is after-the-fact (ie, you want to know about a process
> that's hung right now), attach to it with a debugger and print out
> debug_query_string.

Alright, turned on stats_command_string and am seeing nothing in any
of the stats tables and debug_query_string is null.

(gdb) p debug_query_string
$1 = 0
(gdb) bt
#0  0x283d0c58 in recvfrom () from /usr/lib/libc.so.4
#1  0x283c092b in recv () from /usr/lib/libc.so.4
#2  0x80f11c6 in secure_read ()
#3  0x80f4557 in StreamConnection ()
#4  0x80f4935 in pq_getbyte ()
#5  0x81377b0 in PostgresMain ()
#6  0x811a63c in PostmasterMain ()
#7  0x8119e5b in PostmasterMain ()
#8  0x811934b in PostmasterMain ()
#9  0x8118e94 in PostmasterMain ()
#10 0x80f53d7 in main ()
#11 0x8069b3e in _start ()

Any ideas on how I can figure out what the mystery query/app/lib is
that's causing my backend to hang with a zillion connections that are
all idle?

-sc

--
Sean Chittenden


Re: pg_stat_get_backen_last_activity() ???

From
Tom Lane
Date:
Sean Chittenden <sean@chittenden.org> writes:
> (gdb) bt
> #0  0x283d0c58 in recvfrom () from /usr/lib/libc.so.4
> #1  0x283c092b in recv () from /usr/lib/libc.so.4
> #2  0x80f11c6 in secure_read ()
> #3  0x80f4557 in StreamConnection ()
> #4  0x80f4935 in pq_getbyte ()
> #5  0x81377b0 in PostgresMain ()
> #6  0x811a63c in PostmasterMain ()
> #7  0x8119e5b in PostmasterMain ()
> #8  0x811934b in PostmasterMain ()
> #9  0x8118e94 in PostmasterMain ()
> #10 0x80f53d7 in main ()
> #11 0x8069b3e in _start ()

> Any ideas on how I can figure out what the mystery query/app/lib is
> that's causing my backend to hang with a zillion connections that are
> all idle?

Well, that particular backend is not hung: the stack trace proves it's
waiting for a command from its frontend.  Perhaps you need to look
through your other Postgres processes.

            regards, tom lane


Re: pg_stat_get_backen_last_activity() ???

From
Sean Chittenden
Date:
> > (gdb) bt
> > #0  0x283d0c58 in recvfrom () from /usr/lib/libc.so.4
> > #1  0x283c092b in recv () from /usr/lib/libc.so.4
> > #2  0x80f11c6 in secure_read ()
> > #3  0x80f4557 in StreamConnection ()
> > #4  0x80f4935 in pq_getbyte ()
> > #5  0x81377b0 in PostgresMain ()
> > #6  0x811a63c in PostmasterMain ()
> > #7  0x8119e5b in PostmasterMain ()
> > #8  0x811934b in PostmasterMain ()
> > #9  0x8118e94 in PostmasterMain ()
> > #10 0x80f53d7 in main ()
> > #11 0x8069b3e in _start ()
>
> > Any ideas on how I can figure out what the mystery query/app/lib
> > is that's causing my backend to hang with a zillion connections
> > that are all idle?
>
> Well, that particular backend is not hung: the stack trace proves it's
> waiting for a command from its frontend.  Perhaps you need to look
> through your other Postgres processes.

Sorry, hang is the wrong word.  Block is more correct.  I know it's
waiting for commands from the front end, but I'm trying to figure out
what front end/query/lib/page is causing the problem.  It's only once
in few thousand queries that it happens and I'm going crazy trying to
figure out what it is that's causing this.  I've set
stats_command_string to true and attached gdb to the backend, but
that's null.  Is there another var in the backend that contains a
pointer to the last query performed?  I haven't found it if there
is...  -sc

--
Sean Chittenden


Re: pg_stat_get_backen_last_activity() ???

From
Sean Chittenden
Date:
> > > Any ideas on how I can figure out what the mystery query/app/lib
> > > is that's causing my backend to hang with a zillion connections
> > > that are all idle?
> >
> > Well, that particular backend is not hung: the stack trace proves it's
> > waiting for a command from its frontend.  Perhaps you need to look
> > through your other Postgres processes.
>
> Sorry, hang is the wrong word.  Block is more correct.  I know it's
> waiting for commands from the front end, but I'm trying to figure out
> what front end/query/lib/page is causing the problem.  It's only once
> in few thousand queries that it happens and I'm going crazy trying to
> figure out what it is that's causing this.  I've set
> stats_command_string to true and attached gdb to the backend, but
> that's null.  Is there another var in the backend that contains a
> pointer to the last query performed?  I haven't found it if there
> is...  -sc

Hrm, guess this'll work:

log_pid = true
log_statement = true
log_duration = true
log_timestamp = true

Lower tech, but seems to work well effective.  I changed pg_ctl to zip
my error logs though, otherwise it grows too quickly.  :-/

-        "$po_path" ${1+"$@"} ${PGDATAOPTS+$PGDATAOPTS} </dev/null >>$logfile 2>& 1 &
+        "$po_path" ${1+"$@"} ${PGDATAOPTS+$PGDATAOPTS} </dev/null | $gzip -${ziplevel} -c >>$logfile 2>& 1 &

-sc

--
Sean Chittenden