Thread: Query performance over a large proportion of data

From:
"Steve McLellan"
Date:

Hi,

I'd be grateful for any advice we can get... we recently switched from MySQL
to PostgreSQL on the basis of some trials we carried out with datasets of
varying sizes and varying rates of contention. For the most part we've been
pleased with performance, but one particular application runs queries that
pull back a lot of results across what is for us a large dataset. We've
noticed enormous speed improvements over MySQL with complex queries, but some
of these simpler queries are causing us some problems. We were hoping that the
machine would be able to cache the majority of the database in memory and be
able to run these kinds of queries very quickly. The box doesn't seem to be
doing much I/O during these queries, and top usually reports the processor
usage slowly increasing to about 75% but no higher than that (and then
dropping once it's finished). We adjusted settings in postgresql.conf as
recommended in various places on the web. In particular, experimentation led
us to turn of enable_seq_scan, because it consistently led to faster query
times, but we're not sure why that's the case or if it's a good idea
generally.

This example has been anonymised slightly, although I've checked it for typos.
Our 'fact_table' has 6 million rows, each of which joins to one of 1.7 million
rows in record_dimension, and one of 15,000 rows in 'date_dimension'. We have
other tables that also join to 'fact_table', but for this example these three
tables suffice. The total size (as reported on the file system, presumably
including indexes) is 7.5GB. The query below pulls 12 months' worth of data
(accounting for 20% of the rows in 'fact_table') with restrictions that
account for 15% of the rows in 'record_dimension'. It's a read-only database
(we dump it fresh nightly).

The server itself is a dual-core 3.7GHz Xeon Dell (each core reporting 2
logical CPUs) running an amd64 build of FreeBSD 6.2, and postgres 8.3.5 built
from source. It's got 400GB storage in RAID-5 (on 5 disks). It has 8GB of
physical RAM. I'm able to use about 6GB of that for my own purposes; the
server doesn't do much else but replicates a very low-usage mysql database.
While it's running postgres only seems to use about 1.2GB of RAM.

Postgres configuration is below the query and EXPLAIN.

Any help would be much appreciated.

=============

SELECT "record_dimension"."Id 1" AS "Id 1", "record_dimension"."Id 2" AS
"fact_table"."Id 2", "Id 3" AS "Id 3"
FROM "fact_table"
  INNER JOIN "record_dimension" ON "fact_table"."record_key" =
"record_dimension"."record_key"
  INNER JOIN "date_dimension" ON "fact_table"."date_key" =
"date_dimension"."date_key"
WHERE "record_dimension"."Region" = 'Big Region'
  AND "date_dimension"."Month" BETWEEN '110' AND '121'
  AND "record_dimension"."A Common Property"
  AND "record_dimension"."Country" = 'USA';

ENABLE_SEQSCAN ON
 Nested Loop  (cost=466.34..192962.24 rows=15329 width=12) (actual
time=13653.238..31332.113 rows=131466 loops=1)
   ->  Hash Join  (cost=466.34..115767.54 rows=141718 width=8) (actual
time=13649.952..19548.019 rows=1098344 loops=1)
         Hash Cond: (fact_table.date_key = date_dimension.date_key)
         ->  Seq Scan on fact_table (cost=0.00..91589.38 rows=5945238
width=12) (actual time=0.014..8761.184 rows=5945238 loops=1)
         ->  Hash  (cost=461.99..461.99 rows=348 width=4) (actual
time=4.651..4.651 rows=378 loops=1)
               ->  Seq Scan on date_dimension  (cost=0.00..461.99 rows=348
width=4) (actual time=0.044..4.007 rows=378 loops=1)
                     Filter: (("Month" >= 110::smallint) AND ("Month" <=
121::smallint))
   ->  Index Scan using record_dimension_pkey on record_dimension
(cost=0.00..0.53 rows=1 width=12) (actual time=0.007..0.007 rows=0
loops=1098344)
         Index Cond: (record_dimension.record_key = fact_table.record_key)
         Filter: (record_dimension."A Common Property" AND
((record_dimension."Region")::text = 'Big Region'::text) AND
((record_dimension."Country")::text = 'USA'::text))
 Total runtime: 31522.166 ms

(131466 rows)
(Actual query time: 8606.963 ms)

I/O during the query:
+-----------------+-----------------------------------------+-----------------------------------+
|                   |         SEQUENTIAL I/O                |        INDEXED I/O
           |
|                   | scans | tuples | heap_blks |cached    | scans | tuples  |
idx_blks |cached|
|-----------------+-------+--------+-----------+------------+-------+---------+----------+------+
|date_dimension     |     1 |  14599 |         0 |  243     |     0 |      0  |
  0 |    0 |
|fact_table        |     1 |5945238 |         0 |32137     |     0 |      0  |        0
|    0 |
|record_dimension    |     0 |      0 |         0 |1098344     |1098344 |1098344 |
     0 |3300506 |

ENABLE_SEQSCAN OFF
 Nested Loop  (cost=0.00..355177.96 rows=15329 width=12) (actual
time=14763.749..32483.625 rows=131466 loops=1)
   ->  Merge Join  (cost=0.00..277983.26 rows=141718 width=8) (actual
time=14760.467..20623.975 rows=1098344 loops=1)
         Merge Cond: (date_dimension.date_key = fact_table.date_key)
         ->  Index Scan using date_dimension_pkey on date_dimension
(cost=0.00..706.23 rows=348 width=4) (actual time=0.074..1.635
rows=13 loops=1)
               Filter: (("Month" >= 110::smallint) AND ("Month" <=
121::smallint))
         ->  Index Scan using date_key on fact_table (cost=0.00..261696.89
rows=5945238 width=12) (actual time=0.016..9903.593 rows=5945238
loops=1)
   ->  Index Scan using record_dimension_pkey on record_dimension
(cost=0.00..0.53 rows=1 width=12) (actual time=0.007..0.007 rows=0
loops=1098344)
         Index Cond: (record_dimension.record_key = fact_table.record_key)
         Filter: (record_dimension."A Common Property" AND
((record_dimension."Region")::text = 'Big Region'::text) AND
((record_dimension."Country")::text = 'USA'::text))
 Total runtime: 32672.995 ms
(10 rows)

(131466 rows)
(Actual query time: 9049.854 ms)

postgresql.conf
=============
shared_buffers=1200MB
work_mem = 100MB
maintenance_work_mem = 200MB
max_fsm_pages = 179200

fsync = off
synchronous_commit = off
full_page_writes = off

enable_seqscan = off

effective_cache_size = 2000MB

default_statistics_target = 100

lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'



From:
Scott Marlowe
Date:

On Tue, Mar 10, 2009 at 3:12 PM, Steve McLellan <> wrote:
> Hi,
>
> I'd be grateful for any advice we can get... we recently switched from MySQL
> to PostgreSQL on the basis of some trials we carried out with datasets of
> varying sizes and varying rates of contention. For the most part we've been
> pleased with performance, but one particular application runs queries that
> pull back a lot of results across what is for us a large dataset. We've
> noticed enormous speed improvements over MySQL with complex queries, but some
> of these simpler queries are causing us some problems. We were hoping that the
> machine would be able to cache the majority of the database in memory and be
> able to run these kinds of queries very quickly. The box doesn't seem to be
> doing much I/O during these queries, and top usually reports the processor
> usage slowly increasing to about 75% but no higher than that (and then
> dropping once it's finished). We adjusted settings in postgresql.conf as
> recommended in various places on the web. In particular, experimentation led
> us to turn of enable_seq_scan, because it consistently led to faster query
> times, but we're not sure why that's the case or if it's a good idea
> generally.

No, it's not.  The query planner in postgresql is quite good, and
unless you're sure it's making a pathologically bad decision, turning
on / off things like seqscan are kind of like a bullet in the brain to
cure a headache.

> This example has been anonymised slightly, although I've checked it for typos.
> Our 'fact_table' has 6 million rows, each of which joins to one of 1.7 million
> rows in record_dimension, and one of 15,000 rows in 'date_dimension'. We have
> other tables that also join to 'fact_table', but for this example these three
> tables suffice. The total size (as reported on the file system, presumably
> including indexes) is 7.5GB. The query below pulls 12 months' worth of data
> (accounting for 20% of the rows in 'fact_table') with restrictions that
> account for 15% of the rows in 'record_dimension'. It's a read-only database
> (we dump it fresh nightly).
>
> The server itself is a dual-core 3.7GHz Xeon Dell (each core reporting 2
> logical CPUs) running an amd64 build of FreeBSD 6.2, and postgres 8.3.5 built
> from source. It's got 400GB storage in RAID-5 (on 5 disks). It has 8GB of
> physical RAM. I'm able to use about 6GB of that for my own purposes; the
> server doesn't do much else but replicates a very low-usage mysql database.
> While it's running postgres only seems to use about 1.2GB of RAM.

What do you mean you're able to use about 6GB for your own purposes?
Note that postgresql relies on the OS to the majority of its caching
so if you're doing something that chews up 6G ram on the same machine
you are affecting pgsql performance on it.

I'll let someone else look through the explain analyze and all, but as
regards your sequential scan being turned off, you're far better off
adjusting the cost of seqscan and random_page_cost in postgresql.conf
to push the planner towards random access. Also increasing your
effective cache size up will favor index scans over sequential scans.
Then, use enable_seqscan=off / on to test if you have the best query
plan or not.  Don't just leave enable_seqscan = off.

From:
"Kevin Grittner"
Date:

>>> "Steve McLellan" <> wrote:
> The server itself is a dual-core 3.7GHz Xeon Dell (each core
> reporting 2 logical CPUs) running an amd64 build of FreeBSD 6.2, and
> postgres 8.3.5 built from source. It's got 400GB storage in RAID-5
> (on 5 disks). It has 8GB of physical RAM. I'm able to use about 6GB
> of that for my own purposes; the server doesn't do much else but
> replicates a very low-usage mysql database.

> shared_buffers=1200MB

You might want to play with this -- that's not a bad starting point,
but your best performance with your load could be on either side of
that value.

> work_mem = 100MB

Probably kinda high, especially if you expect a lot of connections.
This much memory can be concurrently used, possibly more than once, by
each active connection.

> fsync = off

Don't use this setting unless you can afford to lose your entire
database cluster.  We use it for initial (repeatable) loads, but not
much else.

> enable_seqscan = off

Not a good idea; some queries will optimize better with seqscans.
You can probably get the behavior you want using other adjustments.

> effective_cache_size = 2000MB

From what you said above, I'd bump this up to 5GB or more.

You probably need to reduce random_page_cost.  If your caching is
complete enough, you might want to set it equal to seq_page_cost
(never set it lower that seq_page_cost!) and possibly reduce both of
these to 0.1.

Some people have had good luck with boosting cpu_tuple_cost and
cpu_index_tuple_cost.  (I've seen 0.5 for both recommended.)  I've
never had to do that, but if the earlier suggestions don't get good
plans, you might try that.

I hope that helps.

-Kevin

From:
Scott Marlowe
Date:

On Tue, Mar 10, 2009 at 3:12 PM, Steve McLellan <> wrote:
>
>  Nested Loop  (cost=466.34..192962.24 rows=15329 width=12) (actual
> time=13653.238..31332.113 rows=131466 loops=1)


Both your query plans end with this nested loop join which is taking
up about half your time in your query.  Notice the estimation of the
result set is off by a factor of about 10  here, which means a nested
loop might be not so good a choice for this.  Try increasing default
stats target and re-analyzing to see if that helps.  1000 is the max
you can give that a shot right off to see if it helps.  If it does,
drop it until the numbers start to go off again and stop.

For a quicker test, you can set enable_nestloop = off in the psql
command line and then run the query by hand and see if that helps.

From:
Tom Lane
Date:

"Kevin Grittner" <> writes:
> You probably need to reduce random_page_cost.  If your caching is
> complete enough, you might want to set it equal to seq_page_cost
> (never set it lower that seq_page_cost!) and possibly reduce both of
> these to 0.1.

> Some people have had good luck with boosting cpu_tuple_cost and
> cpu_index_tuple_cost.  (I've seen 0.5 for both recommended.)  I've
> never had to do that, but if the earlier suggestions don't get good
> plans, you might try that.

It might be worth pointing out here that all that matters are the
relative values of the various xxx_cost parameters.  If your DB is
mostly or entirely cached, you probably want to lower the estimated cost
of I/O relative to CPU work.  You can do that *either* by dropping the
seq_/random_page_costs, *or* by raising the cpu_xxx_costs (there are
more than two of those BTW).  Doing both, as Kevin's comments might be
read to suggest, is not useful ... and in particular I bet that having
seq_page_cost actually less than cpu_tuple_cost would lead to some
pretty wacko decision-making by the planner.

See
http://www.postgresql.org/docs/8.3/static/runtime-config-query.html
for some more info about what you're twiddling here.

            regards, tom lane

From:
Tom Lane
Date:

"Steve McLellan" <> writes:
> lc_messages = 'en_US.UTF-8'
> lc_monetary = 'en_US.UTF-8'
> lc_numeric = 'en_US.UTF-8'
> lc_time = 'en_US.UTF-8'

BTW, aside from the points already made: the above indicates that you
initialized your database in en_US.utf8 locale.  This is not necessarily
a good decision from a performance standpoint --- you might be much
better off with C locale, and might even prefer it if you favor
ASCII-order sorting over "dictionary" sorting.  utf8 encoding might
create some penalties you don't need too.  This all depends on a lot
of factors you didn't mention; maybe you actually need utf8 data,
or maybe your application doesn't do many string comparisons and so
isn't sensitive to the speed of strcoll() anyway.  But I've seen it
be a gotcha for people moving from MySQL, which AFAIK doesn't worry
about honoring locale-specific sort order.

            regards, tom lane

From:
Steve McLellan
Date:

 

Tom Lane <>
Sent by:
03/10/2009 08:16 PM AST

"Steve McLellan" <> writes:
> lc_messages = 'en_US.UTF-8'
> lc_monetary = 'en_US.UTF-8'
> lc_numeric = 'en_US.UTF-8'
> lc_time = 'en_US.UTF-8'

BTW, aside from the points already made: the above indicates that you
initialized your database in en_US.utf8 locale.  This is not necessarily
a good decision from a performance standpoint --- you might be much
better off with C locale, and might even prefer it if you favor
ASCII-order sorting over "dictionary" sorting.  utf8 encoding might
create some penalties you don't need too.  This all depends on a lot
of factors you didn't mention; maybe you actually need utf8 data,
or maybe your application doesn't do many string comparisons and so
isn't sensitive to the speed of strcoll() anyway.  But I've seen it
be a gotcha for people moving from MySQL, which AFAIK doesn't worry
about honoring locale-specific sort order.

    regards, tom lane
Thanks for the reply. We did intentionally initialize it in UTF-8 locale. We could get away with using C locale in this case, although we try to standardise on UTF-8 in general since we do in other instances require it. The only string comparisons we do are equalities, although it can be the case that we're comparing a large number of rows. We may give it a try and see what kind of performance hit that's giving us. Currently we're trying to get some big easy wins through parameter settings; I imagine we will want to start shaving some more time off queries in the near future.

Thanks, Steve
From:
Steve McLellan
Date:


Scott Marlowe <>
03/10/2009 05:19 PM

>
>  Nested Loop  (cost=466.34..192962.24 rows=15329 width=12) (actual
> time=13653.238..31332.113 rows=131466 loops=1)


Both your query plans end with this nested loop join which is taking
up about half your time in your query.  Notice the estimation of the
result set is off by a factor of about 10  here, which means a nested
loop might be not so good a choice for this.  Try increasing default
stats target and re-analyzing to see if that helps.  1000 is the max
you can give that a shot right off to see if it helps.  If it does,
drop it until the numbers start to go off again and stop.

For a quicker test, you can set enable_nestloop = off in the psql
command line and then run the query by hand and see if that helps.
Thanks - the nested loop is indeed causing problems - reducing seq_page_cost had the same effect of removing the nested loop for this query. We'd noticed the poor row count estimation. Increasing the statistics doesn't seem to have much effect, but we'll have more of a go with it.


From:
Steve McLellan
Date:



"Kevin Grittner" <>
03/10/2009 05:06 PM EST

> enable_seqscan = off

Not a good idea; some queries will optimize better with seqscans.
You can probably get the behavior you want using other adjustments.

The bullet to cure the headache, as Scott said. 


You probably need to reduce random_page_cost.  If your caching is
complete enough, you might want to set it equal to seq_page_cost
(never set it lower that seq_page_cost!) and possibly reduce both of
these to 0.1.

Reducing seq_page_cost relative to random_page_cost seems to make an enormous difference for this query. Removing the nested loop seems to be what makes a difference. We'll continue to play with these and check there are no adverse effects on other queries.

Thanks again, Steve
From:
Scott Marlowe
Date:

On Tue, Mar 10, 2009 at 9:15 PM, Steve McLellan <> wrote:
> Thanks - the nested loop is indeed causing problems - reducing
> seq_page_cost had the same effect of removing the nested loop for this
> query. We'd noticed the poor row count estimation. Increasing the statistics
> doesn't seem to have much effect, but we'll have more of a go with it.

More than likely it's the sequential page cost versus the cpu_xxx cost
setttings that's really making the difference.  I.e. if you raised the
cpu_xxx settings you'd get the same result.  But I'm not sure, just a
guess.

From:
decibel
Date:

On Mar 10, 2009, at 4:12 PM, Steve McLellan wrote:
> The server itself is a dual-core 3.7GHz Xeon Dell (each core
> reporting 2
> logical CPUs) running an amd64 build of FreeBSD 6.2, and postgres
> 8.3.5 built
> from source.


Uh, you're running an amd64 build on top of an Intel CPU? I didn't
think FBSD would allow that, but if it does it wouldn't surprise me
if kernel/OS performance stunk. If Postgres then used the same
settings it would make matters even worse (IIRC there is some code
that's different in an AMD vs Intel build).
--
Decibel!, aka Jim C. Nasby, Database Architect  
Give your computer some brain candy! www.distributed.net Team #1828



From:
Matteo Beccati
Date:

decibel wrote:
> On Mar 10, 2009, at 4:12 PM, Steve McLellan wrote:
>> The server itself is a dual-core 3.7GHz Xeon Dell (each core reporting 2
>> logical CPUs) running an amd64 build of FreeBSD 6.2, and postgres
>> 8.3.5 built
>> from source.
>
> Uh, you're running an amd64 build on top of an Intel CPU? I didn't think
> FBSD would allow that, but if it does it wouldn't surprise me if
> kernel/OS performance stunk. If Postgres then used the same settings it
> would make matters even worse (IIRC there is some code that's different
> in an AMD vs Intel build).

Uh? Amd64 just the name of the FreeBSD port for AMD/Intel 64 bit CPUs.

See: http://www.freebsd.org/platforms/amd64.html
and: http://en.wikipedia.org/wiki/X86-64


Cheers

--
Matteo Beccati

OpenX - http://www.openx.org