Thread: Re: recently and selectively slow, but very simple, update query....
Re: recently and selectively slow, but very simple, update query....
From
Stelios Mavromichalis
Date:
hello list,
since i got no reply i am afraid i'll go the dump/restore cycle path hopping this will solve my problem.Stelios Mavromichalis
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
On Mon, May 5, 2014 at 5:11 PM, Stelios Mavromichalis <mstelios@cytech.gr> wrote:
hello,i would dump+restore cycle the database without bothering you, hoping that that would solve my problem, but then i though i wouldn't learn anything out of it, nor you would have the chance to potentially trace a problem/bug thus help the community.
after reading this guide: http://wiki.postgresql.org/wiki/SlowQueryQuestions
i decided to seek for your help.
my problem is that the same query/function some times run fast/normal (as expected) and, recently like 5 days now, some/most of the times, it run really slow to _very_ slow.
the query is in essence a very simple update on a balance of _a certain user_ (no other user has this issue). yes, this user has the most frequent updates to his balance.
i've tried restarting,manual vacuuming (with analyze full etc or not), reindex the database with no improvement. also it's not a hardware problem. diagnostics run fine and no kernel messages or anything weird/unexpected. the load of the machine is also low (like 0.2).
so, without further due:
Full Table and Index Schema
the function that has the problems(easysms_jb_pay(int,int) return int):
Source code
-----------------------------------------------------------
DECLARE
user_id ALIAS FOR $1;
amount ALIAS FOR $2;
myuser record;
mg record;
newbalance float;
BEGIN
SELECT INTO myuser es.login, es.balance as esbalance
from
easysms_users es
where
es.usid = user_id;
IF NOT FOUND THEN
RAISE EXCEPTION 'Cannot find user';
return -2;
END IF;
IF myuser.login = 'jbuser' THEN
return -3;
END IF;
IF myuser.esbalance < amount THEN
return -1;
END IF;
UPDATE easysms_users SET balance = balance - amount
WHERE usid = user_id;
return 1;
END;
the related table:
Table "public.easysms_users"
Column | Type | Modifiers
------------------------+-----------------------------+-------------------------------------------------------------
login | character varying(20) |
passwd | character varying(32) | not null
mobile | character varying(16) | not null
name | character varying(20) |
lastname | character varying(20) |
balance | bigint | not null default 0
email | character varying(40) |
status | character varying(1) | default 'p'::character varying
lang | character varying(2) |
trusted | boolean | default false
opt_originator | character varying(16) |
opt_fullname | character varying(50) |
opt_afm | character varying(30) |
opt_invoice_details | text |
opt_postal_address | text |
opt_want_invoice | smallint | default 0
bulklimit | integer | default 100
lastlogin | timestamp without time zone |
daily_report | boolean | default false
pro | boolean | default true
country_code | integer |
mobnumber | character varying(10) |
cctld | character varying(2) |
mpid | integer |
ifee | boolean |
gsm_code | character varying(8) |
account_reminder_email | boolean | default false
usid | integer | default (-2)
namedays | boolean | default true
opt_concat | boolean | default false
opt_smtype | character(1) | default 't'::bpchar
opt_url | text |
opt_permit_concat | boolean | default true
opt_email | boolean | default false
suser | boolean | default false
susid | integer |
perm | character varying(20) |
opt_statsperiod | character varying(3) |
opt_balance | boolean |
opt_lblimit | integer |
opt_override | boolean | default false
opt_letstamp | timestamp with time zone | default (now() - '1 day'::interval)
opt_lbststamp | timestamp with time zone | default now()
opt_pushdlr_enabled | boolean | default false
opt_pushdlr_ltstamp | timestamp with time zone | default now()
opt_pushdlr_rperiod | integer | default 300
opt_pushdlr_dperiod | integer | default 2
opt_pushdlrs | boolean | default false
regdate | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
opt_occupation | character varying(50) |
opt_invoice_address | text |
opt_city | character varying(50) |
opt_invoice_city | character varying(50) |
opt_pcode | character varying(30) |
opt_invoice_pcode | character varying(30) |
opt_doy | character varying(50) |
opt_phone | character varying(50) |
opt_invoice_country | character varying(50) |
opt_country | character varying(50) |
billid | integer |
opt_smpp_enabled | boolean | default false
Indexes:
"idx_easysms_users_usid" UNIQUE, btree (usid)
"easysms_users_cctld_idx" btree (cctld)
"easysms_users_email_idx" btree (email)
"easysms_users_mobile_idx" btree (mobile)
"easysms_users_mpid_idx" btree (mpid)
"easysms_users_status_idx" btree (status)
Table Metadata
done not contain large objects
has a fair amount of nulls
does receive a large number of updates, no deletes
is not growing rapidly, but very slow
indexes you can see the schema
does not use triggers
History
what i've mentioned at the start of this email. i can't think of any event that could link to this behavior.
Hardware Components (Dedicated to dbs, also runs a low traffic mysql, runs open suse 12.3 x86-64bit)
Harddisk 2x 2000 GB SATA 3,5" 7.200 rpm (in raid 1)
RAM 32x Gigabyte RAM
RAID-Controller HP SmartArrayP410 (battery backed, write back is enabled)
Barebone Hewlett Packard DL320e G8
CPU Intel Xeon E3-1230v2
Maintenance Setup
autovacuuming on default settings. manual vacuum only on cases like this and not regularly. see db config
WAL Configuration
nothing special here, all run on same disk/part. see db config
GUC Settings
name | current_setting | source
------------------------------+-------------------+----------------------
application_name | psql | client
checkpoint_completion_target | 0.9 | configuration file
checkpoint_segments | 64 | configuration file
client_encoding | UTF8 | client
client_min_messages | log | configuration file
DateStyle | ISO, DMY | configuration file
deadlock_timeout | 10s | configuration file
debug_print_rewritten | off | configuration file
default_statistics_target | 100 | configuration file
default_text_search_config | pg_catalog.simple | configuration file
effective_cache_size | 8GB | configuration file
fsync | off | configuration file
lc_messages | el_GR.UTF-8 | configuration file
lc_monetary | el_GR.UTF-8 | configuration file
lc_numeric | el_GR.UTF-8 | configuration file
lc_time | el_GR.UTF-8 | configuration file
listen_addresses | * | configuration file
log_connections | off | configuration file
log_destination | syslog | configuration file
log_disconnections | off | configuration file
log_error_verbosity | verbose | configuration file
log_hostname | on | configuration file
log_line_prefix | %d %u | configuration file
log_lock_waits | on | configuration file
log_min_duration_statement | 1s | configuration file
log_min_error_statement | debug5 | configuration file
log_min_messages | info | configuration file
log_statement | none | configuration file
logging_collector | on | configuration file
maintenance_work_mem | 704MB | configuration file
max_connections | 400 | configuration file
max_prepared_transactions | 1000 | configuration file
max_stack_depth | 2MB | environment variable
random_page_cost | 1.5 | configuration file
shared_buffers | 2816MB | configuration file
TimeZone | Europe/Athens | configuration file
wal_buffers | 16MB | configuration file
work_mem | 28MB | configuration file
(38 rows)
Postgres version
# select version();
version
---------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.7 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388], 64-bit
(1 row)
normal speed query that really stacks: <http://explain.depesz.com/s/XeQm>
slow version of it: <http://explain.depesz.com/s/AjwK>thank you so very much in advance for your time and efforts to help.best regards,/mstelios
Stelios Mavromichalis
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
Stelios Mavromichalis wrote >> the load of the machine is also low (like 0.2). Which means little if the update is waiting for a lock to be released by one other process; which is more likely the situation (or some other concurrency contention) especially as you said that this particular user generates significant transaction/query volume (implied by the fact the user has the most balance updates). During slow-update executions you want to look at: pg_stat_activity pg_locks to see what other concurrent activity is taking place. It is doubtful that dump/restore would have any effect given that the symptoms are sporadic and we are only talking about a select statement that returns a single row; and an update that does not hit any indexed column and therefore benefits from "HOT" optimization. HTH David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Re-recently-and-selectively-slow-but-very-simple-update-query-tp5802553p5802555.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: Re: recently and selectively slow, but very simple, update query....
From
Stelios Mavromichalis
Date:
hello David,
thank you! for the reply.also the fact that if i use another usid, with great many updates, will roll normally leads me to think that it might be a corruption or something, thus the dump/restore hope :)
as a prior step to dump/restore i am thinking of deleting and re-inserting that particular row. that might share some light you think?
best regards,
best regards,
/mstelios
Stelios Mavromichalis
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
On Tue, May 6, 2014 at 12:51 AM, David G Johnston <david.g.johnston@gmail.com> wrote:
Stelios Mavromichalis wrote>> the load of the machine is also low (like 0.2).Which means little if the update is waiting for a lock to be released by one
other process; which is more likely the situation (or some other concurrency
contention) especially as you said that this particular user generates
significant transaction/query volume (implied by the fact the user has the
most balance updates).
During slow-update executions you want to look at:
pg_stat_activity
pg_locks
to see what other concurrent activity is taking place.
It is doubtful that dump/restore would have any effect given that the
symptoms are sporadic and we are only talking about a select statement that
returns a single row; and an update that does not hit any indexed column and
therefore benefits from "HOT" optimization.
HTH
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Re-recently-and-selectively-slow-but-very-simple-update-query-tp5802553p5802555.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Stelios Mavromichalis wrote > as a prior step to dump/restore i am thinking of deleting and re-inserting > that particular row. that might share some light you think? I still dislike the randomness of the unresponsiveness... Every time you perform an update you "delete and insert" that row - that is how an update works in MVCC - so doing so explicitly is unlikely to provide any benefit. Since that row is continually being inserted, and no other rows are having this issue, I'm seriously doubting that a dump/restore is going to have any effect either. Note that the index scan took twice as long in the bad case - but still reasonable and you didn't notice any benefit from a REINDEX. This is what I would expect. The only other difference, if concurrency has been ruled out, is the 4 vs 18 buffers that had to be read. I cannot imagine that, since all 22 were in cache, that simply reading that much more data would account for the difference (we're talking a 10,000-fold increase, not 2to4-fold). The reason for this particular difference, IIUC, is how may candidate tuples are present whose visibility has to be accounted for (assuming 1 buffer per tuple, you needed to scan 4 vs 18 for visibility in the two queries). Is there any log file information you can share? Especially if you can set log_min_statement_duration (or whatever that GUC is named) so that whenever one of these gets adversely delayed it appears in the log along with whatever other system messages are being sent. Checkpoints are a typical culprit though that should be affecting a great deal more than what you indicate you are seeing. I'm pretty certain you are seeing this here largely because of the frequency of activity on this particular user; not because the data itself is corrupted. It could be some kind of symptom of internal concurrency that you just haven't observed yet but it could also be I/O or other system contention that you also haven't properly instrumented. Unfortunately that is beyond my current help-providing skill-set. A dump-restore likely would not make anything worse though I'd be surprised if it were to improve matters. It also doesn't seem like hardware - unless the RAM is bad. Software bugs are unlikely if this had been working well before 5 days ago. So, you need to observe the system during both periods (good and bad) and observe something that is different - probably not within PostgreSQL if indeed you've minimized concurrency. And also see if you can see if any other queries, executed during both these times, exhibit a performance decrease. Logging all statements would help matters greatly if you can afford it in your production environment - it would make looking for internal concurrency much easier. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Re-recently-and-selectively-slow-but-very-simple-update-query-tp5802553p5802579.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: Re: recently and selectively slow, but very simple, update query....
From
Stelios Mavromichalis
Date:
hello again,
some more data on the subject:you are right, delete/re-insert didn't solve the problem (haven't yet tried dump/restore, i might tonight, when low traffic).
a short snip from logs were i log all queries that take longer then 1sec:
2014-05-06T15:57:46.303880+03:00 s10 postgres[46220]: [1891-1] prosms prosms LOG: 00000: duration: 1947.172 ms execute <unnamed>: select easysms_jb_pay($1,
$2)
2014-05-06T15:57:46.304230+03:00 s10 postgres[46220]: [1891-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:46.304439+03:00 s10 postgres[46220]: [1891-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:56.199005+03:00 s10 postgres[58002]: [2570-1] prosms prosms LOG: 00000: duration: 6869.886 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:57:56.199349+03:00 s10 postgres[58002]: [2570-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:56.199567+03:00 s10 postgres[58002]: [2570-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:59.134982+03:00 s10 postgres[58002]: [2571-1] prosms prosms LOG: 00000: duration: 1797.747 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:57:59.135334+03:00 s10 postgres[58002]: [2571-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:59.135562+03:00 s10 postgres[58002]: [2571-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:58:07.149477+03:00 s10 postgres[46220]: [1892-1] prosms prosms LOG: 00000: duration: 3938.979 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:58:07.149830+03:00 s10 postgres[46220]: [1892-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:58:07.150067+03:00 s10 postgres[46220]: [1892-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T16:01:33.784422+03:00 s10 postgres[58002]: [2572-1] prosms prosms LOG: 00000: duration: 2921.212 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T16:01:33.784842+03:00 s10 postgres[58002]: [2572-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '4'
2014-05-06T16:01:33.785037+03:00 s10 postgres[58002]: [2572-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:46.303880+03:00 s10 postgres[46220]: [1891-1] prosms prosms LOG: 00000: duration: 1947.172 ms execute <unnamed>: select easysms_jb_pay($1,
$2)
2014-05-06T15:57:46.304230+03:00 s10 postgres[46220]: [1891-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:46.304439+03:00 s10 postgres[46220]: [1891-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:56.199005+03:00 s10 postgres[58002]: [2570-1] prosms prosms LOG: 00000: duration: 6869.886 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:57:56.199349+03:00 s10 postgres[58002]: [2570-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:56.199567+03:00 s10 postgres[58002]: [2570-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:59.134982+03:00 s10 postgres[58002]: [2571-1] prosms prosms LOG: 00000: duration: 1797.747 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:57:59.135334+03:00 s10 postgres[58002]: [2571-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:59.135562+03:00 s10 postgres[58002]: [2571-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:58:07.149477+03:00 s10 postgres[46220]: [1892-1] prosms prosms LOG: 00000: duration: 3938.979 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:58:07.149830+03:00 s10 postgres[46220]: [1892-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:58:07.150067+03:00 s10 postgres[46220]: [1892-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T16:01:33.784422+03:00 s10 postgres[58002]: [2572-1] prosms prosms LOG: 00000: duration: 2921.212 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T16:01:33.784842+03:00 s10 postgres[58002]: [2572-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '4'
2014-05-06T16:01:33.785037+03:00 s10 postgres[58002]: [2572-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
should you deem helpful yes i can enable full query logging and summon the listing for you.
a typical vmstat:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 363476 325852 294064 30520736 0 0 26 347 0 0 1 0 99 0 0
1 0 363476 328400 294068 30520732 0 0 0 28 4930 8014 6 1 94 0 0
2 0 363476 331756 294068 30520732 0 0 0 4384 4950 7980 6 1 93 0 0
0 0 363476 334016 294068 30520756 0 0 0 4384 4961 7981 7 1 92 0 0
0 0 363476 334700 294068 30520756 0 0 0 4424 4012 6467 4 1 95 0 0
1 0 363476 330852 294068 30520788 0 0 0 0 2559 3861 5 1 95 0 0
1 0 363476 331316 294072 30520788 0 0 0 4408 5013 8127 6 1 94 0 0
1 0 363476 330788 294072 30520788 0 0 0 4384 5535 9055 6 1 93 0 0
0 0 363476 331496 294072 30520804 0 0 0 4384 5031 8092 7 1 92 0 0
2 0 363476 331268 294072 30520804 0 0 0 4428 5052 8246 6 1 94 0 0
1 0 363476 330848 294080 30520812 0 0 0 32 4892 7950 5 1 94 0 0
1 0 363476 330480 294080 30520812 0 0 0 4388 4935 8036 6 1 94 0 0
2 0 363476 332616 294084 30521092 0 0 0 4408 5064 8194 6 1 93 0 0
0 0 363476 333596 294084 30521008 0 0 0 4384 5205 8463 8 1 91 0 0
1 0 363476 333324 294084 30521008 0 0 0 40 5014 8151 6 1 94 0 0
0 0 363476 332740 294084 30521016 0 0 0 4384 5047 8163 6 1 93 0 0
1 0 363476 336052 294084 30520888 0 0 0 4384 4849 7780 6 1 94 0 0
1 0 363476 334732 294088 30520892 0 0 8 4400 5520 9012 6 1 93 0 0
0 0 363476 334064 294096 30520884 0 0 0 220 3903 6193 6 1 94 0 0
0 0 363476 333124 294096 30520916 0 0 0 2232 4088 6462 6 1 93 0 0
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 363476 325852 294064 30520736 0 0 26 347 0 0 1 0 99 0 0
1 0 363476 328400 294068 30520732 0 0 0 28 4930 8014 6 1 94 0 0
2 0 363476 331756 294068 30520732 0 0 0 4384 4950 7980 6 1 93 0 0
0 0 363476 334016 294068 30520756 0 0 0 4384 4961 7981 7 1 92 0 0
0 0 363476 334700 294068 30520756 0 0 0 4424 4012 6467 4 1 95 0 0
1 0 363476 330852 294068 30520788 0 0 0 0 2559 3861 5 1 95 0 0
1 0 363476 331316 294072 30520788 0 0 0 4408 5013 8127 6 1 94 0 0
1 0 363476 330788 294072 30520788 0 0 0 4384 5535 9055 6 1 93 0 0
0 0 363476 331496 294072 30520804 0 0 0 4384 5031 8092 7 1 92 0 0
2 0 363476 331268 294072 30520804 0 0 0 4428 5052 8246 6 1 94 0 0
1 0 363476 330848 294080 30520812 0 0 0 32 4892 7950 5 1 94 0 0
1 0 363476 330480 294080 30520812 0 0 0 4388 4935 8036 6 1 94 0 0
2 0 363476 332616 294084 30521092 0 0 0 4408 5064 8194 6 1 93 0 0
0 0 363476 333596 294084 30521008 0 0 0 4384 5205 8463 8 1 91 0 0
1 0 363476 333324 294084 30521008 0 0 0 40 5014 8151 6 1 94 0 0
0 0 363476 332740 294084 30521016 0 0 0 4384 5047 8163 6 1 93 0 0
1 0 363476 336052 294084 30520888 0 0 0 4384 4849 7780 6 1 94 0 0
1 0 363476 334732 294088 30520892 0 0 8 4400 5520 9012 6 1 93 0 0
0 0 363476 334064 294096 30520884 0 0 0 220 3903 6193 6 1 94 0 0
0 0 363476 333124 294096 30520916 0 0 0 2232 4088 6462 6 1 93 0 0
the process that is constantly writing the majority of data is:
"postgres: stats collector process"
and varies from 2.5mb/sec up to 5.7mb/sec
all the other postgres (and non-postgres) processes write very little data and rarely.
and varies from 2.5mb/sec up to 5.7mb/sec
all the other postgres (and non-postgres) processes write very little data and rarely.
the checkpointer process is like 78.6kb/sec (a few seconds now as i write this email but no other is having a constant rate or I/O)
also, _while having the problem_ the results of the following queries are (taken from http://wiki.postgresql.org/wiki/Lock_Monitoring):
SELECT relation::regclass, * FROM pg_locks WHERE NOT granted;
(0 rows)
SELECT bl.pid AS blocked_pid, a.usename AS blocked_user, ka.query AS blocking_statement, now() - ka.query_start AS blocking_duration, kl.pid AS blocking_pid, ka.usename AS blocking_user, a.query AS blocked_statement, now() - a.query_start AS blocked_duration FROM pg_catalog.pg_locks bl JOIN pg_catalog.pg_stat_activity a ON a.pid = bl.pid JOIN pg_catalog.pg_locks kl ON kl.transactionid = bl.transactionid AND kl.pid != bl.pid JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid WHERE NOT bl.granted;
(0 rows)
SELECT waiting.locktype AS waiting_locktype, waiting.relation::regclass AS waiting_table, waiting_stm.query AS waiting_query, waiting.mode AS waiting_mode, waiting.pid AS waiting_pid, other.locktype AS other_locktype, other.relation::regclass AS other_table, other_stm.query AS other_query, other.mode AS other_mode, other.pid AS other_pid, other.granted AS other_granted FROM pg_catalog.pg_locks AS waiting JOIN pg_catalog.pg_stat_activity AS waiting_stm ON ( waiting_stm.pid = waiting.pid ) JOIN pg_catalog.pg_locks AS other ON ( ( waiting."database" = other."database" AND waiting.relation = other.relation ) OR waiting.transactionid = other.transactionid ) JOIN pg_catalog.pg_stat_activity AS other_stm ON ( other_stm.pid = other.pid ) WHERE NOT waiting.granted AND waiting.pid <> other.pid;
(0 rows)
WITH RECURSIVE c(requested, current) AS ( VALUES ('AccessShareLock'::text, 'AccessExclusiveLock'::text), ('RowShareLock'::text, 'ExclusiveLock'::text), ('RowShareLock'::text, 'AccessExclusiveLock'::text), ('RowExclusiveLock'::text, 'ShareLock'::text), ('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('RowExclusiveLock'::text, 'ExclusiveLock'::text), ('RowExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ShareLock'::text, 'RowExclusiveLock'::text), ('ShareLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareLock'::text, 'ShareRowExclusiveLock'::text), ('ShareLock'::text, 'ExclusiveLock'::text), ('ShareLock'::text, 'AccessExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ShareLock'::text), ('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ExclusiveLock'::text, 'RowShareLock'::text), ('ExclusiveLock'::text, 'RowExclusiveLock'::text), ('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ExclusiveLock'::text, 'ShareLock'::text), ('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ExclusiveLock'::text, 'ExclusiveLock'::text), ('ExclusiveLock'::text, 'AccessExclusiveLock'::text), ('AccessExclusiveLock'::text, 'AccessShareLock'::text), ('AccessExclusiveLock'::text, 'RowShareLock'::text), ('AccessExclusiveLock'::text, 'RowExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ShareLock'::text), ('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ExclusiveLock'::text), ('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text) ), l AS ( SELECT (locktype,DATABASE,relation::regclass::text,page,tuple,virtualxid,transactionid,classid,objid,objsubid) AS target, virtualtransaction, pid, mode, granted FROM pg_catalog.pg_locks ), t AS ( SELECT blocker.target AS blocker_target, blocker.pid AS blocker_pid, blocker.mode AS blocker_mode, blocked.target AS target, blocked.pid AS pid, blocked.mode AS mode FROM l blocker JOIN l blocked ON ( NOT blocked.granted AND blocker.granted AND blocked.pid != blocker.pid AND blocked.target IS NOT DISTINCT FROM blocker.target) JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode) ), r AS ( SELECT blocker_target, blocker_pid, blocker_mode, '1'::int AS depth, target, pid, mode, blocker_pid::text || ',' || pid::text AS seq FROM t UNION ALL SELECT blocker.blocker_target, blocker.blocker_pid, blocker.blocker_mode, blocker.depth + 1, blocked.target, blocked.pid, blocked.mode, blocker.seq || ',' || blocked.pid::text FROM r blocker JOIN t blocked ON (blocked.blocker_pid = blocker.pid) WHERE blocker.depth < 1000 ) SELECT * FROM r ORDER BY seq;
(0 rows)
no finding here either :(
best,
/mstelios
Stelios Mavromichalis
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
On Tue, May 6, 2014 at 2:54 AM, David G Johnston <david.g.johnston@gmail.com> wrote:
Stelios Mavromichalis wrote> as a prior step to dump/restore i am thinking of deleting and re-insertingI still dislike the randomness of the unresponsiveness...
> that particular row. that might share some light you think?
Every time you perform an update you "delete and insert" that row - that is
how an update works in MVCC - so doing so explicitly is unlikely to provide
any benefit. Since that row is continually being inserted, and no other
rows are having this issue, I'm seriously doubting that a dump/restore is
going to have any effect either. Note that the index scan took twice as
long in the bad case - but still reasonable and you didn't notice any
benefit from a REINDEX. This is what I would expect.
The only other difference, if concurrency has been ruled out, is the 4 vs 18
buffers that had to be read. I cannot imagine that, since all 22 were in
cache, that simply reading that much more data would account for the
difference (we're talking a 10,000-fold increase, not 2to4-fold). The
reason for this particular difference, IIUC, is how may candidate tuples are
present whose visibility has to be accounted for (assuming 1 buffer per
tuple, you needed to scan 4 vs 18 for visibility in the two queries).
Is there any log file information you can share? Especially if you can set
log_min_statement_duration (or whatever that GUC is named) so that whenever
one of these gets adversely delayed it appears in the log along with
whatever other system messages are being sent. Checkpoints are a typical
culprit though that should be affecting a great deal more than what you
indicate you are seeing.
I'm pretty certain you are seeing this here largely because of the frequency
of activity on this particular user; not because the data itself is
corrupted. It could be some kind of symptom of internal concurrency that
you just haven't observed yet but it could also be I/O or other system
contention that you also haven't properly instrumented. Unfortunately that
is beyond my current help-providing skill-set.
A dump-restore likely would not make anything worse though I'd be surprised
if it were to improve matters. It also doesn't seem like hardware - unless
the RAM is bad. Software bugs are unlikely if this had been working well
before 5 days ago. So, you need to observe the system during both periods
(good and bad) and observe something that is different - probably not within
PostgreSQL if indeed you've minimized concurrency. And also see if you can
see if any other queries, executed during both these times, exhibit a
performance decrease. Logging all statements would help matters greatly if
you can afford it in your production environment - it would make looking for
internal concurrency much easier.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Re-recently-and-selectively-slow-but-very-simple-update-query-tp5802553p5802579.htmlSent from the PostgreSQL - performance mailing list archive at Nabble.com.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Re: Re: recently and selectively slow, but very simple, update query....
From
Stelios Mavromichalis
Date:
hello everyone,
i feel like i owe you all an apology.Stelios Mavromichalis
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios
On Tue, May 6, 2014 at 4:31 PM, Stelios Mavromichalis <mstelios@cytech.gr> wrote:
hello again,some more data on the subject:
you are right, delete/re-insert didn't solve the problem (haven't yet tried dump/restore, i might tonight, when low traffic).a short snip from logs were i log all queries that take longer then 1sec:
2014-05-06T15:57:46.303880+03:00 s10 postgres[46220]: [1891-1] prosms prosms LOG: 00000: duration: 1947.172 ms execute <unnamed>: select easysms_jb_pay($1,
$2)
2014-05-06T15:57:46.304230+03:00 s10 postgres[46220]: [1891-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:46.304439+03:00 s10 postgres[46220]: [1891-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:56.199005+03:00 s10 postgres[58002]: [2570-1] prosms prosms LOG: 00000: duration: 6869.886 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:57:56.199349+03:00 s10 postgres[58002]: [2570-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:56.199567+03:00 s10 postgres[58002]: [2570-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:57:59.134982+03:00 s10 postgres[58002]: [2571-1] prosms prosms LOG: 00000: duration: 1797.747 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:57:59.135334+03:00 s10 postgres[58002]: [2571-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:59.135562+03:00 s10 postgres[58002]: [2571-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T15:58:07.149477+03:00 s10 postgres[46220]: [1892-1] prosms prosms LOG: 00000: duration: 3938.979 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T15:58:07.149830+03:00 s10 postgres[46220]: [1892-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2'
2014-05-06T15:58:07.150067+03:00 s10 postgres[46220]: [1892-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989
2014-05-06T16:01:33.784422+03:00 s10 postgres[58002]: [2572-1] prosms prosms LOG: 00000: duration: 2921.212 ms execute <unnamed>: select easysms_jb_pay($1,$2)
2014-05-06T16:01:33.784842+03:00 s10 postgres[58002]: [2572-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '4'
2014-05-06T16:01:33.785037+03:00 s10 postgres[58002]: [2572-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989should you deem helpful yes i can enable full query logging and summon the listing for you.a typical vmstat:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 363476 325852 294064 30520736 0 0 26 347 0 0 1 0 99 0 0
1 0 363476 328400 294068 30520732 0 0 0 28 4930 8014 6 1 94 0 0
2 0 363476 331756 294068 30520732 0 0 0 4384 4950 7980 6 1 93 0 0
0 0 363476 334016 294068 30520756 0 0 0 4384 4961 7981 7 1 92 0 0
0 0 363476 334700 294068 30520756 0 0 0 4424 4012 6467 4 1 95 0 0
1 0 363476 330852 294068 30520788 0 0 0 0 2559 3861 5 1 95 0 0
1 0 363476 331316 294072 30520788 0 0 0 4408 5013 8127 6 1 94 0 0
1 0 363476 330788 294072 30520788 0 0 0 4384 5535 9055 6 1 93 0 0
0 0 363476 331496 294072 30520804 0 0 0 4384 5031 8092 7 1 92 0 0
2 0 363476 331268 294072 30520804 0 0 0 4428 5052 8246 6 1 94 0 0
1 0 363476 330848 294080 30520812 0 0 0 32 4892 7950 5 1 94 0 0
1 0 363476 330480 294080 30520812 0 0 0 4388 4935 8036 6 1 94 0 0
2 0 363476 332616 294084 30521092 0 0 0 4408 5064 8194 6 1 93 0 0
0 0 363476 333596 294084 30521008 0 0 0 4384 5205 8463 8 1 91 0 0
1 0 363476 333324 294084 30521008 0 0 0 40 5014 8151 6 1 94 0 0
0 0 363476 332740 294084 30521016 0 0 0 4384 5047 8163 6 1 93 0 0
1 0 363476 336052 294084 30520888 0 0 0 4384 4849 7780 6 1 94 0 0
1 0 363476 334732 294088 30520892 0 0 8 4400 5520 9012 6 1 93 0 0
0 0 363476 334064 294096 30520884 0 0 0 220 3903 6193 6 1 94 0 0
0 0 363476 333124 294096 30520916 0 0 0 2232 4088 6462 6 1 93 0 0the process that is constantly writing the majority of data is:"postgres: stats collector process"
and varies from 2.5mb/sec up to 5.7mb/sec
all the other postgres (and non-postgres) processes write very little data and rarely.the checkpointer process is like 78.6kb/sec (a few seconds now as i write this email but no other is having a constant rate or I/O)also, _while having the problem_ the results of the following queries are (taken from http://wiki.postgresql.org/wiki/Lock_Monitoring):SELECT relation::regclass, * FROM pg_locks WHERE NOT granted;
(0 rows)
SELECT bl.pid AS blocked_pid, a.usename AS blocked_user, ka.query AS blocking_statement, now() - ka.query_start AS blocking_duration, kl.pid AS blocking_pid, ka.usename AS blocking_user, a.query AS blocked_statement, now() - a.query_start AS blocked_duration FROM pg_catalog.pg_locks bl JOIN pg_catalog.pg_stat_activity a ON a.pid = bl.pid JOIN pg_catalog.pg_locks kl ON kl.transactionid = bl.transactionid AND kl.pid != bl.pid JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid WHERE NOT bl.granted;
(0 rows)SELECT waiting.locktype AS waiting_locktype, waiting.relation::regclass AS waiting_table, waiting_stm.query AS waiting_query, waiting.mode AS waiting_mode, waiting.pid AS waiting_pid, other.locktype AS other_locktype, other.relation::regclass AS other_table, other_stm.query AS other_query, other.mode AS other_mode, other.pid AS other_pid, other.granted AS other_granted FROM pg_catalog.pg_locks AS waiting JOIN pg_catalog.pg_stat_activity AS waiting_stm ON ( waiting_stm.pid = waiting.pid ) JOIN pg_catalog.pg_locks AS other ON ( ( waiting."database" = other."database" AND waiting.relation = other.relation ) OR waiting.transactionid = other.transactionid ) JOIN pg_catalog.pg_stat_activity AS other_stm ON ( other_stm.pid = other.pid ) WHERE NOT waiting.granted AND waiting.pid <> other.pid;
(0 rows)WITH RECURSIVE c(requested, current) AS ( VALUES ('AccessShareLock'::text, 'AccessExclusiveLock'::text), ('RowShareLock'::text, 'ExclusiveLock'::text), ('RowShareLock'::text, 'AccessExclusiveLock'::text), ('RowExclusiveLock'::text, 'ShareLock'::text), ('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('RowExclusiveLock'::text, 'ExclusiveLock'::text), ('RowExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ShareLock'::text, 'RowExclusiveLock'::text), ('ShareLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareLock'::text, 'ShareRowExclusiveLock'::text), ('ShareLock'::text, 'ExclusiveLock'::text), ('ShareLock'::text, 'AccessExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ShareLock'::text), ('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ExclusiveLock'::text, 'RowShareLock'::text), ('ExclusiveLock'::text, 'RowExclusiveLock'::text), ('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ExclusiveLock'::text, 'ShareLock'::text), ('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ExclusiveLock'::text, 'ExclusiveLock'::text), ('ExclusiveLock'::text, 'AccessExclusiveLock'::text), ('AccessExclusiveLock'::text, 'AccessShareLock'::text), ('AccessExclusiveLock'::text, 'RowShareLock'::text), ('AccessExclusiveLock'::text, 'RowExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ShareLock'::text), ('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ExclusiveLock'::text), ('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text) ), l AS ( SELECT (locktype,DATABASE,relation::regclass::text,page,tuple,virtualxid,transactionid,classid,objid,objsubid) AS target, virtualtransaction, pid, mode, granted FROM pg_catalog.pg_locks ), t AS ( SELECT blocker.target AS blocker_target, blocker.pid AS blocker_pid, blocker.mode AS blocker_mode, blocked.target AS target, blocked.pid AS pid, blocked.mode AS mode FROM l blocker JOIN l blocked ON ( NOT blocked.granted AND blocker.granted AND blocked.pid != blocker.pid AND blocked.target IS NOT DISTINCT FROM blocker.target) JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode) ), r AS ( SELECT blocker_target, blocker_pid, blocker_mode, '1'::int AS depth, target, pid, mode, blocker_pid::text || ',' || pid::text AS seq FROM t UNION ALL SELECT blocker.blocker_target, blocker.blocker_pid, blocker.blocker_mode, blocker.depth + 1, blocked.target, blocked.pid, blocked.mode, blocker.seq || ',' || blocked.pid::text FROM r blocker JOIN t blocked ON (blocked.blocker_pid = blocker.pid) WHERE blocker.depth < 1000 ) SELECT * FROM r ORDER BY seq;
(0 rows)no finding here either :(best,/msteliosStelios Mavromichalis
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: msteliosOn Tue, May 6, 2014 at 2:54 AM, David G Johnston <david.g.johnston@gmail.com> wrote:Stelios Mavromichalis wrote> as a prior step to dump/restore i am thinking of deleting and re-insertingI still dislike the randomness of the unresponsiveness...
> that particular row. that might share some light you think?
Every time you perform an update you "delete and insert" that row - that is
how an update works in MVCC - so doing so explicitly is unlikely to provide
any benefit. Since that row is continually being inserted, and no other
rows are having this issue, I'm seriously doubting that a dump/restore is
going to have any effect either. Note that the index scan took twice as
long in the bad case - but still reasonable and you didn't notice any
benefit from a REINDEX. This is what I would expect.
The only other difference, if concurrency has been ruled out, is the 4 vs 18
buffers that had to be read. I cannot imagine that, since all 22 were in
cache, that simply reading that much more data would account for the
difference (we're talking a 10,000-fold increase, not 2to4-fold). The
reason for this particular difference, IIUC, is how may candidate tuples are
present whose visibility has to be accounted for (assuming 1 buffer per
tuple, you needed to scan 4 vs 18 for visibility in the two queries).
Is there any log file information you can share? Especially if you can set
log_min_statement_duration (or whatever that GUC is named) so that whenever
one of these gets adversely delayed it appears in the log along with
whatever other system messages are being sent. Checkpoints are a typical
culprit though that should be affecting a great deal more than what you
indicate you are seeing.
I'm pretty certain you are seeing this here largely because of the frequency
of activity on this particular user; not because the data itself is
corrupted. It could be some kind of symptom of internal concurrency that
you just haven't observed yet but it could also be I/O or other system
contention that you also haven't properly instrumented. Unfortunately that
is beyond my current help-providing skill-set.
A dump-restore likely would not make anything worse though I'd be surprised
if it were to improve matters. It also doesn't seem like hardware - unless
the RAM is bad. Software bugs are unlikely if this had been working well
before 5 days ago. So, you need to observe the system during both periods
(good and bad) and observe something that is different - probably not within
PostgreSQL if indeed you've minimized concurrency. And also see if you can
see if any other queries, executed during both these times, exhibit a
performance decrease. Logging all statements would help matters greatly if
you can afford it in your production environment - it would make looking for
internal concurrency much easier.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Re-recently-and-selectively-slow-but-very-simple-update-query-tp5802553p5802579.htmlSent from the PostgreSQL - performance mailing list archive at Nabble.com.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance