Thread: 8.1 iss

8.1 iss

From
"PostgreSQL"
Date:
SELECT v_barcode, count(v_barcode) FROM lead GROUP BY v_barcode HAVING
count(*) > 1;

This is a pretty good example of the place where 8.1 seems to be quite
broken.  I understand that this query will want to do a full table scan
(even through v_barcode is indexed).  And the table is largish, at 34
million rows.  In the 8.0 world, this took around 4 minutes.  With 8.1beta3,
this has run for 30 minutes (as I began to write this) and is still going
strong.

And it behaves differently than I'd expect.  Top shows the postmaster
process running the query as using up 99.9 percent of one CPU, while the i/o
wait time never gets above 3%.  vmstat shows the "block out" (bo) number
quite high, 15 to 20 thousand, which also surprises me.  "block in" is from
0 to about 2500.  iostat shows 15,000 to 20,000 blocks written every 5
seconds, while it shows 0 blocks read.  There is no other significant
process running on the box.  (Apache is running but is not being used here a
3:00a.m. on Sunday).  This is a dual Opteron box with 16 Gb memory and a
3ware SATA raid runing 64bit SUSE.  Something seems badly wrong.

As I post this, the query is approaching an hour of run time.  I've listed
an explain of the query and my non-default conf parameters below.  Please
advise on anything I should change or try, or on any information I can
provide that could help diagnose this.


GroupAggregate  (cost=9899282.83..10285434.26 rows=223858 width=15)
  Filter: (count(*) > 1)
  ->  Sort  (cost=9899282.83..9994841.31 rows=38223392 width=15)
        Sort Key: v_barcode
        ->  Seq Scan on lead  (cost=0.00..1950947.92 rows=38223392 width=15)

shared_buffers = 50000
work_mem = 16384
maintenance_work_mem = 16384
max_fsm_pages = 100000
max_fsm_relations = 5000
wal_buffers = 32
checkpoint_segments = 32
effective_cache_size = 50000
default_statistics_target = 50



Re: 8.1 iss

From
Tom Lane
Date:
"PostgreSQL" <martin@portant.com> writes:
> This is a pretty good example of the place where 8.1 seems to be quite
> broken.

That's a bit of a large claim on the basis of one data point.
Did you remember to re-ANALYZE after loading the table into the
new database?

            regards, tom lane

Re: 8.1 iss

From
Greg Stark
Date:
"PostgreSQL" <martin@portant.com> writes:

...
> As I post this, the query is approaching an hour of run time.  I've listed
> an explain of the query and my non-default conf parameters below.  Please
> advise on anything I should change or try, or on any information I can
> provide that could help diagnose this.
>
>
> GroupAggregate  (cost=9899282.83..10285434.26 rows=223858 width=15)
>   Filter: (count(*) > 1)
>   ->  Sort  (cost=9899282.83..9994841.31 rows=38223392 width=15)
>         Sort Key: v_barcode
>         ->  Seq Scan on lead  (cost=0.00..1950947.92 rows=38223392 width=15)
>
> shared_buffers = 50000
> work_mem = 16384
...

It sounds to me like it's doing a large on-disk sort. Increasing work_mem
should improve the efficiency. If you increase it enough it might even be able
to do it in memory, but probably not.

The shared_buffers is excessive but if you're using the default 8kB block
sizes then it 400MB of shared pages on a 16GB machine ought not cause
problems. It might still be worth trying lowering this to 10,000 or so.

Is this a custom build from postgresql.org sources? RPM build? Or is it a BSD
ports or Gentoo build with unusual options?

Perhaps posting actual vmstat and iostat output might help if someone catches
something you didn't see?

--
greg

Re: 8.1 iss

From
"Luke Lonergan"
Date:
Greg,

Increasing memory actually slows down the current sort performance.

We're working on a fix for this now in bizgres.

Luke
--------------------------
Sent from my BlackBerry Wireless Device


-----Original Message-----
From: pgsql-performance-owner@postgresql.org <pgsql-performance-owner@postgresql.org>
To: PostgreSQL <martin@portant.com>
CC: pgsql-performance@postgresql.org <pgsql-performance@postgresql.org>
Sent: Sun Nov 06 14:24:00 2005
Subject: Re: [PERFORM] 8.1 iss


"PostgreSQL" <martin@portant.com> writes:

...
> As I post this, the query is approaching an hour of run time.  I've listed 
> an explain of the query and my non-default conf parameters below.  Please 
> advise on anything I should change or try, or on any information I can 
> provide that could help diagnose this.
> 
> 
> GroupAggregate  (cost=9899282.83..10285434.26 rows=223858 width=15)
>   Filter: (count(*) > 1)
>   ->  Sort  (cost=9899282.83..9994841.31 rows=38223392 width=15)
>         Sort Key: v_barcode
>         ->  Seq Scan on lead  (cost=0.00..1950947.92 rows=38223392 width=15)
> 
> shared_buffers = 50000
> work_mem = 16384
...

It sounds to me like it's doing a large on-disk sort. Increasing work_mem
should improve the efficiency. If you increase it enough it might even be able
to do it in memory, but probably not.

The shared_buffers is excessive but if you're using the default 8kB block
sizes then it 400MB of shared pages on a 16GB machine ought not cause
problems. It might still be worth trying lowering this to 10,000 or so.

Is this a custom build from postgresql.org sources? RPM build? Or is it a BSD
ports or Gentoo build with unusual options?

Perhaps posting actual vmstat and iostat output might help if someone catches
something you didn't see?

-- 
greg


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Re: 8.1 iss

From
Dennis Bjorklund
Date:
On Sun, 6 Nov 2005, PostgreSQL wrote:

> SELECT v_barcode, count(v_barcode) FROM lead GROUP BY v_barcode HAVING
> count(*) > 1;
>
> This is a dual Opteron box with 16 Gb memory and a 3ware SATA raid
> runing 64bit SUSE.  Something seems badly wrong.
>
> GroupAggregate  (cost=9899282.83..10285434.26 rows=223858 width=15)
>   Filter: (count(*) > 1)
>   ->  Sort  (cost=9899282.83..9994841.31 rows=38223392 width=15)
>         Sort Key: v_barcode
>         ->  Seq Scan on lead  (cost=0.00..1950947.92 rows=38223392 width=15)

What do the plan look like in 8.0? Since it's so much faster I assume you
get a different plan.

> shared_buffers = 50000
> work_mem = 16384
> maintenance_work_mem = 16384
> max_fsm_pages = 100000
> max_fsm_relations = 5000
> wal_buffers = 32
> checkpoint_segments = 32
> effective_cache_size = 50000
> default_statistics_target = 50

The effective_cache_size is way too low, only 390M and you have a machine
with 16G. Try bumping it to 1000000 (which means almost 8G, how nice it
would be to be able to write 8G instead...). It could be set even higher
but it's hard for me to know what else your memory is used for.

I don't know if this setting will affect this very query, but it should
have a positive effect on a lot of queries.

work_mem also seems low, but it's hard to suggest a good value on it
without knowing more about how your database is used.

--
/Dennis Björklund


Re: 8.1 iss

From
"PostgreSQL"
Date:
My most humble apologies to the pg development team (pg_lets?).

I took Greg Stark's advice and set:

shared_buffers = 10000  # was 50000
work_mem = 1048576    # 1Gb - was 16384

Also, I noticed that the EXPLAIN ANALYZE consistently thought reads would
take longer than they actually did, so I decreased random_page_cost down to
1 (the server has a SATA Raid at level 10).

Queries that previously seemed to stall out are still a little slow but
nothing like before.  And I'm seeing a more normal balance of CPU and disk
i/o while a query is running instead of the high-cpu-low-disk-read situation
I was seeing before.  Concurrency is way up.

I tried a couple of interim sizes for work_mem and so far, the larger the
better (the server has 16Gb).  I'll test a little larger size this evening
and see what it does.  Yes, I've read the warning that this is per process.

Kudos to you Greg, thanks Luke for your comment (though it seems to disagree
with my experience).  Also to Dennis, there were not drastic changes in the
plan between 8.0 and 8.1, it was just the actual execution times.

Martin

"PostgreSQL" <martin@portant.com> wrote in message
news:dkko49$1v06$1@news.hub.org...
> SELECT v_barcode, count(v_barcode) FROM lead GROUP BY v_barcode HAVING
> count(*) > 1;
>
> This is a pretty good example of the place where 8.1 seems to be quite
> broken.
...



Re: 8.1 iss

From
Mario Weilguni
Date:
Am Montag, 7. November 2005 18:22 schrieb PostgreSQL:
> My most humble apologies to the pg development team (pg_lets?).
>
> I took Greg Stark's advice and set:
>
> shared_buffers = 10000  # was 50000
> work_mem = 1048576    # 1Gb - was 16384
>
> Also, I noticed that the EXPLAIN ANALYZE consistently thought reads would
> take longer than they actually did, so I decreased random_page_cost down to
> 1 (the server has a SATA Raid at level 10).

Don't do that, use 1.5 or 2, setting it to 1 will only work well if you have
small databases fitting completly in memory.