Thread: Unpredictable shark slowdown after migrating to 8.4

Unpredictable shark slowdown after migrating to 8.4

From
Sergey Konoplev
Date:
Hello community,


Second time after migration 8.3.7 --> 8.4.1 I was caught by this
problem. Migration was 8 days ago.
(note, I never seen such situation on 8.3)

Environment:

PostgreSQL 8.4.1 + patch
http://archives.postgresql.org/pgsql-committers/2009-10/msg00056.php
CentOS release 5.4 (Final)
SunFire X4600 M2; 3U; 8xOpteron 8380 2.5GHz; 96GB; 12x146GB 15K RPM
DB size ~90G TPS~1000

Symptoms:

In short period of time total query execution time increases incredibly.

Time   Sum duration (ms)
17:17 12811
17:18 8870
17:19 33744
17:20 9991
17:21 13392
17:22 163928
17:23 1151709
17:24 12112797  <-- here it cuts due to connection threshold
17:25 12305390
17:26 12970853
17:27 12957648

LA changes rather insignificantly from ~6 to ~8.
CPU user time increases from ~350 to ~600 ), system from ~50 to ~250.
Neither additional significant  disc activity nor iowait.
Another thing I noticed is autoanalyze finish on tables that few
minutes later were used in the first and mostly canceled by timeout
queries.


First time I assigned the blame to multiply locks on one of the
mentioned above tables. There was heavy delete. I started delete rows
manually by small batches and found a record that hung my delete for a
long time (many times greater then stmt timeout) even when I tried to
delete it by PK. Situation was saved by disabling some functional that
uses this table until next day when I got rid of the aggressive
deletes.

Second time I didn't find any reason that explains the situation.

Was this situation mentioned before and is there a solution or
workaround? (I didn't find any) If not please give me a glue where to
dig or what information should I provide?

Thank you.


-- 
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/


Re: Unpredictable shark slowdown after migrating to 8.4

From
Robert Haas
Date:
On Wed, Nov 11, 2009 at 12:50 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:
> Was this situation mentioned before and is there a solution or
> workaround? (I didn't find any) If not please give me a glue where to
> dig or what information should I provide?

I think you should use log_min_duration_statement or auto_explain to
figure out which queries are giving you grief.  I don't believe that
8.4 is in general slower than 8.3, so there must be something about
how you are using it that is making it slower for you.  But without
more information it's hard to speculate what it might be.  It's also
not entirely clear that your 8.4 setup is really the same as your 8.3
setup.  You might have different configuration, differences in your
tables or table data, differences in workload, etc.  Without
controlling for all those factors it's hard to draw any conclusions.

Also, I don't believe this is an appropriate topic for pgsql-hackers.
If you have EXPLAIN ANALYZE results for the affected queries, try
pgsql-performance.

...Robert


Re: Unpredictable shark slowdown after migrating to 8.4

From
Sergey Konoplev
Date:
On Thu, Nov 12, 2009 at 4:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Nov 11, 2009 at 12:50 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:
>> Was this situation mentioned before and is there a solution or
>> workaround? (I didn't find any) If not please give me a glue where to
>> dig or what information should I provide?
>
> I think you should use log_min_duration_statement or auto_explain to
> figure out which queries are giving you grief. I don't believe that
> 8.4 is in general slower than 8.3, so there must be something about
> how you are using it that is making it slower for you.  But without
> more information it's hard to speculate what it might be.  It's also
> not entirely clear that your 8.4 setup is really the same as your 8.3
> setup.  You might have different configuration, differences in your
> tables or table data, differences in workload, etc.  Without
> controlling for all those factors it's hard to draw any conclusions.

Well I turned on log_min_duration_statement, set up auto_explain,
pg_stat_statements, pgrowlocks, pg_buffercache, wrote some monitoring
queries and started to wait when the situation repeats.

Today it happens! Situation was absolutely the same as I described in
my previous letter. One more thing I noticed about CPU user-time this
time is that after connections count gets close to pgbouncer threshold
it decreased from ~800 to ~10 very fast.

Log monitoring shows me that query plans were the same as usual (thanx
auto_explain).

I reset pg_stat_statements and few minutes later did select from it. I
noticed that slowest queries was on tables with high number of updates
(but isn't it usual?).

I tried to get locks with this queries

SELECT   t.tablename,   (SELECT count(*) FROM pgrowlocks(schemaname || '.' || t.tablename)) AS locks
FROM pg_tables t
WHERE schemaname = 'public'
ORDER BY 2 DESC
LIMIT 10;

SELECT * FROM pgrowlocks('public.person_online');

but nothing was returned.

Here is portions of vmstat and iostat results http://pastie.org/701326

This time situation was saved by PG restart to. Obviously all I
provided tells almost nothing and I'm very confused with it. So please
tell me what else could I do to get more info?

> Also, I don't believe this is an appropriate topic for pgsql-hackers.
> If you have EXPLAIN ANALYZE results for the affected queries, try
> pgsql-performance.

I do have but this results are good and the same as when nothing has
happened when everything is allright.

--
Regards,
Sergey Konoplev


Re: Unpredictable shark slowdown after migrating to 8.4

From
Alvaro Herrera
Date:
Sergey Konoplev escribió:

> I tried to get locks with this queries

Did you try pg_locks?



-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Unpredictable shark slowdown after migrating to 8.4

From
Robert Haas
Date:
On Mon, Nov 16, 2009 at 1:53 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:
> On Thu, Nov 12, 2009 at 4:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Nov 11, 2009 at 12:50 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:
>>> Was this situation mentioned before and is there a solution or
>>> workaround? (I didn't find any) If not please give me a glue where to
>>> dig or what information should I provide?
>>
>> I think you should use log_min_duration_statement or auto_explain to
>> figure out which queries are giving you grief. I don't believe that
>> 8.4 is in general slower than 8.3, so there must be something about
>> how you are using it that is making it slower for you.  But without
>> more information it's hard to speculate what it might be.  It's also
>> not entirely clear that your 8.4 setup is really the same as your 8.3
>> setup.  You might have different configuration, differences in your
>> tables or table data, differences in workload, etc.  Without
>> controlling for all those factors it's hard to draw any conclusions.
>
> Well I turned on log_min_duration_statement, set up auto_explain,
> pg_stat_statements, pgrowlocks, pg_buffercache, wrote some monitoring
> queries and started to wait when the situation repeats.
>
> Today it happens! Situation was absolutely the same as I described in
> my previous letter. One more thing I noticed about CPU user-time this
> time is that after connections count gets close to pgbouncer threshold
> it decreased from ~800 to ~10 very fast.
>
> Log monitoring shows me that query plans were the same as usual (thanx
> auto_explain).
>
> I reset pg_stat_statements and few minutes later did select from it. I
> noticed that slowest queries was on tables with high number of updates
> (but isn't it usual?).
>
> I tried to get locks with this queries
>
> SELECT
>    t.tablename,
>    (SELECT count(*) FROM pgrowlocks(schemaname || '.' || t.tablename)) AS locks
> FROM pg_tables t
> WHERE schemaname = 'public'
> ORDER BY 2 DESC
> LIMIT 10;
>
> SELECT * FROM pgrowlocks('public.person_online');
>
> but nothing was returned.
>
> Here is portions of vmstat and iostat results http://pastie.org/701326
>
> This time situation was saved by PG restart to. Obviously all I
> provided tells almost nothing and I'm very confused with it. So please
> tell me what else could I do to get more info?
>
>> Also, I don't believe this is an appropriate topic for pgsql-hackers.
>> If you have EXPLAIN ANALYZE results for the affected queries, try
>> pgsql-performance.
>
> I do have but this results are good and the same as when nothing has
> happened when everything is allright.

Can you show us the non-commented settings from your postgresql.conf?

Can you show us what the vmstat output looks like when everything is
running normally?  It looks like the blocks out are pretty high, but I
don't know how that compares to normal for you.

...Robert


Re: Unpredictable shark slowdown after migrating to 8.4

From
Andres Freund
Date:
On Wednesday 11 November 2009 18:50:46 Sergey Konoplev wrote:
> Hello community,
> 
> 
> Second time after migration 8.3.7 --> 8.4.1 I was caught by this
> problem. Migration was 8 days ago.
> (note, I never seen such situation on 8.3)
Is 8.4 configured similarly to 8.3?

Andres


Re: Unpredictable shark slowdown after migrating to 8.4

From
Sergey Konoplev
Date:
On Mon, Nov 16, 2009 at 9:56 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Sergey Konoplev escribió:
>
>> I tried to get locks with this queries
>
> Did you try pg_locks?
>

I tried monitor locks with pgrowlocks. Isn't it better way? If it
isn't what points should I pay attention with pg_lock?

I've just write the query

SELECT pid, count(1) AS locks, current_query
FROM pg_locks AS l LEFT JOIN pg_stat_activity AS a ON pid = procpid
WHERE l.mode IN ('RowExclusiveLock', 'ShareUpdateExclusiveLock',
'ExclusiveLock')
GROUP BY 1,3 ORDER BY 2 DESC LIMIT 10;

would it be what we need?

--
Regards,
Sergey Konoplev


Re: Unpredictable shark slowdown after migrating to 8.4

From
Sergey Konoplev
Date:
>
> Can you show us the non-commented settings from your postgresql.conf?

Working postgresql.conf http://pastie.org/702748

>
> Can you show us what the vmstat output looks like when everything is
> running normally?  It looks like the blocks out are pretty high, but I
> don't know how that compares to normal for you.
>

Here it is http://pastie.org/702742



--
Regards,
Sergey Konoplev


Re: Unpredictable shark slowdown after migrating to 8.4

From
Sergey Konoplev
Date:
On Mon, Nov 16, 2009 at 10:17 PM, Andres Freund <andres@anarazel.de> wrote:
> On Wednesday 11 November 2009 18:50:46 Sergey Konoplev wrote:
>> Hello community,
>>
>>
>> Second time after migration 8.3.7 --> 8.4.1 I was caught by this
>> problem. Migration was 8 days ago.
>> (note, I never seen such situation on 8.3)
> Is 8.4 configured similarly to 8.3?
>

It is.

8.3 conf - http://pastie.org/702752
8.4 conf - http://pastie.org/702748


-- 
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/


Re: Unpredictable shark slowdown after migrating to 8.4

From
Alvaro Herrera
Date:
Sergey Konoplev escribió:
> On Mon, Nov 16, 2009 at 9:56 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > Sergey Konoplev escribió:
> >
> >> I tried to get locks with this queries
> >
> > Did you try pg_locks?
> >
> 
> I tried monitor locks with pgrowlocks. Isn't it better way? If it
> isn't what points should I pay attention with pg_lock?

pgrowlocks shows row locks only.  pg_locks shows all regular locks, i.e.
locks on tables, pages, transactions, etc.  You should be concerned with
pgrowlocks only if you see a transaction waiting for locktype=transaction.

> I've just write the query
> 
> SELECT pid, count(1) AS locks, current_query
> FROM pg_locks AS l LEFT JOIN pg_stat_activity AS a ON pid = procpid
> WHERE l.mode IN ('RowExclusiveLock', 'ShareUpdateExclusiveLock',
> 'ExclusiveLock')
> GROUP BY 1,3 ORDER BY 2 DESC LIMIT 10;

Why only those modes?  I'd search for locks with granted=false, then see
all the other locks held by the process that's holding the conflicting
lock with granted=true (i.e. the one you're waiting on).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Unpredictable shark slowdown after migrating to 8.4

From
Sergey Konoplev
Date:
Thank you for the hints.

> Why only those modes?  I'd search for locks with granted=false, then see
> all the other locks held by the process that's holding the conflicting
> lock with granted=true (i.e. the one you're waiting on).


Something like this?

SELECT   granted,   pid,   virtualxid,   transactionid,   virtualtransaction,   count(1) AS locks,   current_query
FROM   pg_locks AS l   LEFT JOIN pg_stat_activity AS a ON       pid = procpid
GROUP BY 1, 2, 3, 4, 5, 7
ORDER BY 1, 6 DESC;


And two more queries to do extended analysis of its results after restarting PG:

SELECT   pg_stat_activity.datname,   pg_class.relname,   pg_locks.transactionid,   pg_locks.mode,   pg_locks.granted,
pg_stat_activity.usename,  pg_stat_activity.current_query,   pg_stat_activity.query_start,
age(now(),pg_stat_activity.query_start)AS "age",   pg_stat_activity.procpid 
FROM   pg_stat_activity,   pg_locks   LEFT OUTER JOIN pg_class ON       pg_locks.relation = pg_class.oid
WHERE   pg_locks.pid = pg_stat_activity.procpid
ORDER BY   query_start;


SELECT * FROM pg_locks;


Are there another things I should do when the problem rise up again?

--
Regards,
Sergey Konoplev