Re: and waiting - Mailing list pgsql-hackers

From Gurjeet Singh
Subject Re: and waiting
Date
Msg-id 65937bea0802011426p4812b7f0k8e07be9530f5b12a@mail.gmail.com
Whole thread Raw
In response to Re: and waiting  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: and waiting  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
The situation seems pretty bad!!<br /><br />Here are the steps to reproduce in 'PostgreSQL 8.3beta2 on
x86_64-unknown-linux-gnu,compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux)':<br /><br />session 1: begin;<br />session 1:
updatetest set a = 112 where a = 112;<br /> session 2: update test set a = 113 where a = 112; --waits<br />session 1:
select* from pg_stat_activity; -- executed this a few times before executing 'select version()' and then following:<br
/>session1: <stat query1> -- see end of mail for this query<br /><br /><span style="font-family: courier
new,monospace;"> procpid|             current_query              | waiting |     duration     |        
backend_start</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">---------+----------------------------------------+---------+------------------+-------------------------------</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">   12577 | update test
seta = 113 where a = 112; | t       | -00:01:35.782881 | 2008-02-01 13:36:15.31027-08</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;">   11975 | select * from pg_stat_activity
;      | f       | -00:01:52.554697 | 2008-02-01 13:30:40.396392-08</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">(2 rows)</span><br style="font-family: courier
new,monospace;"/><br style="font-family: courier new,monospace;" />session 1: select * from pg_locks<br /><span
style="font-family:courier new,monospace;"><br />    locktype    | database | relation | page | tuple | virtualxid |
transactionid| classid | objid | objsubid | virtualtransaction |  pid  |       mode       | granted</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier
new,monospace;">---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+---------------<br
/>-----+-------+------------------+---------<br/>  transactionid |          |          |      |       |           
|          390 |         |       |          | 2/14     | 12577 | ShareLock        | f</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;"> transactionid |          |          |     
|      |            |           390 |         |       |          | 1/9     | 11975 | ExclusiveLock    | t</span><br
style="font-family:courier new,monospace;" /> <snip irrelevant><br /><br />Then,<br />session 2: ^C<br /><span
style="font-family:courier new,monospace;"></span><span style="font-family: courier new,monospace;">Cancel request
sent</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier new,monospace;">ERROR: 
cancelingstatement due to user request<br /><br /></span>session1: <stat query1><br /><span style="font-family:
couriernew,monospace;"><br /><span style="font-family: courier new,monospace;"> procpid |            
current_query             | waiting |     duration     |         backend_start</span></span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier
new,monospace;">---------+----------------------------------------+---------+------------------+-------------------------------</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">   12577 | update test
seta = 113 where a = 112; | t       | -00:01:35.782881 | 2008-02-01 13:36:15.31027-08</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;">   11975 | select * from pg_stat_activity
;      | f       | -00:01:52.554697 | 2008-02-01 13:30:40.396392-08</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">(2 rows)</span><br style="font-family: courier
new,monospace;"/><br />session 1: select * from pg_locks ;<br /><br style="font-family: courier new,monospace;" /><span
style="font-family:courier new,monospace;"><no traces of pid 12577></span><br /><br />session 1: select
pg_backend_pid();<br/><br /><span style="font-family: courier new,monospace;"> pg_backend_pid</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier
new,monospace;">----------------</span><brstyle="font-family: courier new,monospace;" /><span style="font-family:
couriernew,monospace;">          11975</span><br style="font-family: courier new,monospace;" /><br /> The last
mentionedoutput of <stat query> shows session 1 executing 'select * from p_s_a', whereas the <stat query>
_is_being executed in session 1!!! This result is consistently returned for a while, and later...<br /><br />session 2:
selectpg_backend_pid();<br /><span style="font-family: courier new,monospace;"><br />  pg_backend_pid</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">
----------------</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">          12577</span><br /><br /> session 1: <stat query1><br /><br /><span style="font-family:
couriernew,monospace;"> procpid |     current_query     | waiting |    duration     |         backend_start</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier
new,monospace;">---------+-----------------------+---------+-----------------+-------------------------------</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">   11975 | <IDLE>
intransaction | f       | 00:06:08.671029 | 2008-02-01 13:30:40.396392-08</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">(1 row)</span><br style="font-family: courier
new,monospace;"/><br />After a while again:<br /><br />session 1: <stat query2> -- notice 2 not 1; 'select *'
comesback to haunt!!!<br style="font-family: courier new,monospace;" /><br /><span style="font-family: courier
new,monospace;"> procpid|             current_query              | waiting |     duration     |        
backend_start<br
/>---------+----------------------------------------+---------+------------------+-------------------------------<br/>
  12577 | update test set a = 113 where a = 112; | t       | -00:01:35.782881 | 2008-02-01 13:36:15.31027-08<br />  
11975| select * from pg_stat_activity ;       | f       | -00:01:52.554697 | 2008-02-01 13:30:40.396392-08<br /> (2
rows)<br/></span><span style="font-family: courier new,monospace;"><br /><span style="font-family:
arial,sans-serif;">session1: <stat query1> -- 1 back in action</span><br style="font-family: courier
new,monospace;"/></span><span style="font-family: courier new,monospace;"><br /> procpid |            
current_query             | waiting |     duration     |         backend_start</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier
new,monospace;">---------+----------------------------------------+---------+------------------+-------------------------------</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">   12577 | update test
seta = 113 where a = 112; | t       | -00:01:35.782881 | 2008-02-01 13:36:15.31027-08</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;">   11975 | select * from pg_stat_activity
;      | f       | -00:01:52.554697 | 2008-02-01 13:30:40.396392-08</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">(2 rows)<br /></span><br />The <stat query1>
is:<br/>select<br />procpid, current_query::varchar(50), waiting, now() - query_start as duration, backend_start<br
/>frompg_stat_activity<br />where current_query <> '<IDLE>'<br /> and current_query not like '%DONT COUNT
ME1%'<br />order by duration desc<br />limit 10;<br /><br />The <stat query2> is:<br />select<br />    procpid,
current_query::varchar(50),waiting, now() - query_start as duration, backend_start<br /> from pg_stat_activity<br
/>wherecurrent_query not like '%DONT COUNT ME1 %'<br />order by duration desc<br />limit 10;<br /><br />    Found more
bugsthan I was looking for, to reproduce!!!<br /><br />    The reporter also made an observation (on 8.2.4) that there
weredeadlocks detected at around the same time. Looked at WaitOnLock(), and clearly there's a problem, but is it at the
same/onlyplace we are suspecting it to be?<br /><br />Best regards,<br /><br />PS: Ran the <stat query>ies 1 and
2again, just before hitting 'send', and the result is the same:<br /><span style="font-family: courier
new,monospace;"> procpid|             current_query              | waiting |     duration     |        
backend_start</span><brstyle="font-family: courier new,monospace;" /><span style="font-family: courier
new,monospace;">---------+----------------------------------------+---------+------------------+-------------------------------</span><br
style="font-family:courier new,monospace;" /><span style="font-family: courier new,monospace;">   12577 | update test
seta = 113 where a = 112; | t       | -00:01:35.782881 | 2008-02-01 13:36:15.31027-08</span><br style="font-family:
couriernew,monospace;" /><span style="font-family: courier new,monospace;">   11975 | select * from pg_stat_activity
;      | f       | -00:01:52.554697 | 2008-02-01 13:30:40.396392-08</span><br style="font-family: courier
new,monospace;"/><span style="font-family: courier new,monospace;">(2 rows)</span><br style="font-family: courier
new,monospace;"/><br /><br />Clearly, there's something wrong <br /><br /><div class="gmail_quote">On Feb 1, 2008 8:16
AM,Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /><blockquote
class="gmail_quote"style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left:
1ex;"><divclass="Ih2E3d">"Gurjeet Singh" <<a href="mailto:singh.gurjeet@gmail.com">singh.gurjeet@gmail.com</a>>
writes:<br/>>     I saw a strange behaviour on one of the production boxes. The<br />> pg_stat_activity shows a
processas <IDLE> and yet 'waiting' !!! On top of<br /> > it (understandably, since its IDLE), there are no
entriesfor this pid in<br />> pg_locks!<br /><br /></div>Hmm, I can reproduce something like this by aborting a wait
forlock.<br />It seems the problem is that WaitOnLock() is ignoring its own good<br /> advice, assuming that it can do
cleanupwork after waiting.<br /><br />                        regards, tom lane<br /></blockquote></div><br /><br
clear="all"/><br />-- <br />gurjeet[.singh]@EnterpriseDB.com<br />singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo
}.com<br/><br />EnterpriseDB      <a href="http://www.enterprisedb.com">http://www.enterprisedb.com</a><br /><br />17°
29'34.37"N,   78° 30' 59.76"E - Hyderabad<br />18° 32' 57.25"N,   73° 56' 25.42"E - Pune<br /> 37° 47' 19.72"N, 122°
24'1.69" W - San Francisco *<br /><br /><a href="http://gurjeet.frihost.net">http://gurjeet.frihost.net</a><br /><br
/>Mailsent from my BlackLaptop device  

pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: BUG #3921: CREATE TABLE / INCLUDING INDEXES fails with permission denied
Next
From: Alvaro Herrera
Date:
Subject: configurability of OOM killer