Thread: planner with index scan cost way off actual cost, advices to tweak cost constants?

planner with index scan cost way off actual cost, advices to tweak cost constants?

From
Guillaume Cottenceau
Date:
Hi,

I have a problem with the postgres planner, which gives a cost to
index scan which is much higher than actual cost (worst case
considered, e.g. without any previous disk cache), and am posting
here for advices for tweaking cost constants. Because of this
problem, the planner typically chooses a seq scan when an index
scan would be more efficient, and I would like to correct this if
possible.

Reading the documentation and postgresql list archives, I have
run ANALYZE right before my tests, I have increased the
statistics target to 50 for the considered table; my problem is
that the index scan cost reported by EXPLAIN seems to be around
12.7 times higher that it should, a figure I suppose incompatible
(too large) for just random_page_cost and effective_cache_size
tweaks.


Structure of the table:

\d sent_messages
                                     Table "public.sent_messages"
  Column  |           Type           |                           Modifiers
----------+--------------------------+----------------------------------------------------------------
 uid      | integer                  | not null default nextval('public.sent_messages_uid_seq'::text)
 sender   | character varying(25)    |
 receiver | character varying(25)    |
 action   | character varying(25)    |
 cost     | integer                  |
 date     | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
 status   | character varying(128)   |
 theme    | character varying(25)    |
 operator | character varying(15)    |
Indexes:
    "sent_messages_pkey" primary key, btree (uid)
    "idx_sent_msgs_date_theme_status" btree (date, theme, status)


What I did:

- SET default_statistics_target = 50

- VACUUM FULL ANALYZE VERBOSE sent_messages - copied so that you
  can have a look at rows and pages taken up by relations

INFO:  vacuuming "public.sent_messages"
INFO:  "sent_messages": found 0 removable, 3692284 nonremovable row versions in 55207 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 103 to 177 bytes long.
There were 150468 unused item pointers.
Total free space (including removable row versions) is 2507320 bytes.
0 pages are or will become empty, including 0 at the end of the table.
2469 pages containing 262732 free bytes are potential move destinations.
CPU 0.57s/0.20u sec elapsed 11.27 sec.
INFO:  index "sent_messages_pkey" now contains 3692284 row versions in 57473 pages
DETAIL:  0 index row versions were removed.
318 index pages have been deleted, 318 are currently reusable.
CPU 2.80s/1.27u sec elapsed 112.69 sec.
INFO:  index "idx_sent_msgs_date_theme_status" now contains 3692284 row versions in 88057 pages
DETAIL:  0 index row versions were removed.
979 index pages have been deleted, 979 are currently reusable.
CPU 4.22s/1.51u sec elapsed 246.88 sec.
INFO:  "sent_messages": moved 0 row versions, truncated 55207 to 55207 pages
DETAIL:  CPU 1.87s/3.18u sec elapsed 42.71 sec.
INFO:  vacuuming "pg_toast.pg_toast_77852470"
INFO:  "pg_toast_77852470": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 0 to 0 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_toast_77852470_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO:  analyzing "public.sent_messages"
INFO:  "sent_messages": 55207 pages, 15000 rows sampled, 3666236 estimated total rows

- select rows of the table with a range condition on "date", find
  a range for which seq scan and index scan runtimes seem to be
  very close (I use Linux, I cat a 2G file to /dev/null between
  each request to flush disk cache, on a machine of 1G real RAM
  and 1G of swap, so that this is the worst case tested for index
  scan), notice that the cost used by the planner is 12.67 times
  higher for index scan, at a position it should be around 1 so
  that planner could make sensible choices:

EXPLAIN ANALYZE SELECT * FROM sent_messages WHERE date > '2005-09-01' AND date < '2005-09-19';
                                                               QUERY PLAN
                 

-----------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on sent_messages  (cost=0.00..110591.26 rows=392066 width=78) (actual time=7513.205..13095.147 rows=393074
loops=1)
   Filter: ((date > '2005-09-01 00:00:00+00'::timestamp with time zone) AND (date < '2005-09-19 00:00:00+00'::timestamp
withtime zone)) 
 Total runtime: 14272.522 ms


SET enable_seqscan = false

EXPLAIN ANALYZE SELECT * FROM sent_messages WHERE date > '2005-09-01' AND date < '2005-09-19';
                                                                              QUERY PLAN
                                              

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_sent_msgs_date_theme_status on sent_messages  (cost=0.00..1402124.26 rows=392066 width=78)
(actualtime=142.638..12677.378 rows=393074 loops=1) 
   Index Cond: ((date > '2005-09-01 00:00:00+00'::timestamp with time zone) AND (date < '2005-09-19
00:00:00+00'::timestampwith time zone)) 
 Total runtime: 13846.504 ms


  Please notice that an index on the "date" column only would be
  much more efficient for the considered request (and I have
  confirmed this by creating and trying it), but I don't
  necessarily would need this index if the existing index was
  used. Of course real queries use smaller date ranges.

- I then tried to tweak random_page_cost and effective_cache_size
  following advices from documentation:

SET random_page_cost = 2;
SET effective_cache_size = 10000;
EXPLAIN SELECT * FROM sent_messages WHERE date > '2005-09-01' AND date < '2005-09-19';
                                                                 QUERY PLAN
                     

---------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_sent_msgs_date_theme_status on sent_messages  (cost=0.00..595894.94 rows=392066 width=78)
   Index Cond: ((date > '2005-09-01 00:00:00+00'::timestamp with time zone) AND (date < '2005-09-19
00:00:00+00'::timestampwith time zone)) 


  We can see that estimated index scan cost goes down but by a
  factor of approx. 2.3 which is far from enough to "fix" it. I
  am reluctant in changing way more the random_page_cost and
  effective_cache_size values as I'm suspecting it might have
  other (bad) consequences if it is too far away from reality
  (even if Linux is known to aggressively cache), the application
  being multithreaded (there is a warning about concurrent
  queries using different indexes in documentation). But I
  certainly could benefit from others' experience on this matter.


I apologize for this long email but I wanted to be sure I gave
enough information on the data and things I have tried to fix the
problem myself. If anyone can see what I am doing wrong, I would
be very interested in pointers.

Thanks in advance!

Btw, I use postgres 7.4.5 with -B 1000 -N 500 and all
postgresql.conf default values except timezone = 'UTC', on an
ext3 partition with data=ordered, and run Linux 2.6.12.

--
Guillaume Cottenceau

Guillaume,

On 17 Mar 2006 11:09:50 +0100, Guillaume Cottenceau
wrote:
> Reading the documentation and postgresql list archives, I have
> run ANALYZE right before my tests, I have increased the
> statistics target to 50 for the considered table; my problem is
> that the index scan cost reported by EXPLAIN seems to be around
> 12.7 times higher that it should, a figure I suppose incompatible
> (too large) for just random_page_cost and effective_cache_size
> tweaks.

It's not surprising you have a high cost for an index scan which is
planned to return and returns so much rows. I really don't think the
planner does something wrong on this one.
AFAIK, increasing the statistics target won't do anything to reduce
the cost as the planner estimation for the number of returned rows is
already really accurate and probably can't be better.

> Of course real queries use smaller date ranges.

What about providing us the respective plans for your real queries?
And in a real case. It's a bad idea to compare index scan and seqscan
when your data have to be loaded in RAM.
Before doing so create an index on the date column to have the most
effective index possible.

> - I then tried to tweak random_page_cost and effective_cache_size
>   following advices from documentation:
>
> SET random_page_cost = 2;

random_page_cost is the way to go for this sort of thing but I don't
think it's a good idea to have it too low globally and I'm still
thinking the problem is that your test case is not accurate.

--
Guillaume

Re: planner with index scan cost way off actual cost,

From
Mark Kirkwood
Date:
Guillaume Cottenceau wrote:

>
> SET random_page_cost = 2;
> SET effective_cache_size = 10000;
> EXPLAIN SELECT * FROM sent_messages WHERE date > '2005-09-01' AND date < '2005-09-19';
>                                                                  QUERY PLAN
                       
>
---------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using idx_sent_msgs_date_theme_status on sent_messages  (cost=0.00..595894.94 rows=392066 width=78)
>    Index Cond: ((date > '2005-09-01 00:00:00+00'::timestamp with time zone) AND (date < '2005-09-19
00:00:00+00'::timestampwith time zone)) 
>
>
>   We can see that estimated index scan cost goes down but by a
>   factor of approx. 2.3 which is far from enough to "fix" it. I
>   am reluctant in changing way more the random_page_cost and
>   effective_cache_size values as I'm suspecting it might have
>   other (bad) consequences if it is too far away from reality
>   (even if Linux is known to aggressively cache), the application
>   being multithreaded (there is a warning about concurrent
>   queries using different indexes in documentation). But I
>   certainly could benefit from others' experience on this matter.
>
>
> I apologize for this long email but I wanted to be sure I gave
> enough information on the data and things I have tried to fix the
> problem myself. If anyone can see what I am doing wrong, I would
> be very interested in pointers.
>
> Thanks in advance!
>


> Btw, I use postgres 7.4.5 with -B 1000 -N 500 and all
> postgresql.conf default values except timezone = 'UTC', on an
> ext3 partition with data=ordered, and run Linux 2.6.12.
>

I didn't see any mention of how much memory is on your server, but
provided you have say 1G, and are using the box solely for a database
server, I would increase both shared_buffers and effective_cache size.

shared_buffer = 12000
effective_cache_size = 25000

This would mean you are reserving 100M for Postgres to cache relation
pages, and informing the planner that it can expect ~200M available from
the disk buffer cache. To give a better recommendation, we need to know
more about your server and workload (e.g server memory configuration and
usage plus how close you get to 500 connections).

Cheers

Mark




Re: planner with index scan cost way off actual cost, advices to tweak cost constants?

From
Guillaume Cottenceau
Date:
Guillaume,

Thanks for your answer.

> On 17 Mar 2006 11:09:50 +0100, Guillaume Cottenceau
> wrote:
> > Reading the documentation and postgresql list archives, I have
> > run ANALYZE right before my tests, I have increased the
> > statistics target to 50 for the considered table; my problem is
> > that the index scan cost reported by EXPLAIN seems to be around
> > 12.7 times higher that it should, a figure I suppose incompatible
> > (too large) for just random_page_cost and effective_cache_size
> > tweaks.
>
> It's not surprising you have a high cost for an index scan which is
> planned to return and returns so much rows. I really don't think the
> planner does something wrong on this one.

My point is that the planner's cost estimate is way above the
actual cost of the query, so the planner doesn't use the best
plan. Even if the index returns so much rows, actual cost of the
query is so that index scan (worst case, all disk cache flushed)
is still better than seq scan but the planner uses seq scan.

> AFAIK, increasing the statistics target won't do anything to reduce
> the cost as the planner estimation for the number of returned rows is
> already really accurate and probably can't be better.

Ok, thanks.

> > Of course real queries use smaller date ranges.
>
> What about providing us the respective plans for your real queries?
> And in a real case. It's a bad idea to compare index scan and seqscan

The original query is more complicated and sometimes involves
restricting the resultset with another constraint. I am not sure
it is very interesting to show it; I know that best performance
would be achieved with an index on the date column for the shown
query, and an index on the date column and the other column when
doing a query on these..

> when your data have to be loaded in RAM.

What do you mean? That I should not flush disk cache before
timing? I did so to find the worst case.. I am not sure it is the
best solution.. maybe half worst case would be? but this depends
a lot on whether the index pages would stay in disk cache or not
before next query.. which cannot be told for sure unless a full
serious timing of the real application is done (and my
application can be used in quite different scenarios, which means
such a test is not entirely possible/meaningful).

> Before doing so create an index on the date column to have the most
> effective index possible.

Yes, as I said, I know that doing this would improve a lot the
queries. My point was to understand why the cost of the index
scan is so "inaccurate" compared to actual cost. Adding an index
on the date column enlarges the data by 100-150M so I'd rather
save this if possible.

> > - I then tried to tweak random_page_cost and effective_cache_size
> >   following advices from documentation:
> >
> > SET random_page_cost = 2;
>
> random_page_cost is the way to go for this sort of thing but I don't
> think it's a good idea to have it too low globally and I'm still

Thanks, I suspected so.

> thinking the problem is that your test case is not accurate.

Ok.

--
Guillaume Cottenceau

Re: planner with index scan cost way off actual cost, advices to tweak cost constants?

From
Guillaume Cottenceau
Date:
Hi Mark,

Thanks for your reply.

> Guillaume Cottenceau wrote:

[...]

> > Btw, I use postgres 7.4.5 with -B 1000 -N 500 and all
> > postgresql.conf default values except timezone = 'UTC', on an
> > ext3 partition with data=ordered, and run Linux 2.6.12.
>
> I didn't see any mention of how much memory is on your server, but
> provided you have say 1G, and are using the box solely for a database
> server, I would increase both shared_buffers and effective_cache size.

This test machine has 1G of (real) memory, servers often have 2G
or 4G. The thing is that the application runs on the same
machine, and as it is a java application, it takes up a little
memory too (we can say half of it should go to java and half to
postgres, I guess). Determining the best memory "plan" is not so
easy, though your information is priceless and will help a lot!

> shared_buffer = 12000
> effective_cache_size = 25000
>
> This would mean you are reserving 100M for Postgres to cache relation
> pages, and informing the planner that it can expect ~200M available
> from the disk buffer cache. To give a better recommendation, we need

Ok, thanks. I wanted to investigate this field, but as the
application is multithreaded and uses a lot of postgres clients,
I wanted to make sure the shared_buffers values is globally for
postgres, not just per (TCP) connection to postgres, before
increasing the value, fearing to take the whole server down.

On a server with 235 connections and -N 512 -B 1024, reading
http://virtualthreads.blogspot.com/2006/02/understanding-memory-usage-on-linux.html
I came up with the following figure:

for i in `pidof postmaster`; do pmap -d $i | grep -i writeable ; done | perl -MMDK::Common -ne 'do { push @a, $1; $tot
+=$1 } if /writeable.private: (\d+)K/; END { print "total postgres private memory: ${tot}K\nmin: " . min(@a) . "K\nmax:
". max(@a) . "K\n"; }' 
total postgres private memory: 432080K
min: 936K
max: 4216K

As the server has 2G of memory, I was reluctant to increase the
amount of shared memory since overall postgres memory use seems
already quite high - though 100M more would not kill the server,
obviously. Btw, can you comment on the upper figures?

> to know more about your server and workload (e.g server memory
> configuration and usage plus how close you get to 500 connections).

Depending on the server, it can have 200, up to around 400
connections open. As of workload, I am not sure what metrics are
suitable. Typically postgres can be seen in the top processes but
most queries are quick and average load average reported by the
linux kernel is nearly always below 0.3, and often 0.1. These are
single or dual xeon 2.8 GHz machines with hardware raid (megaraid
or percraid driver) with reasonable performance.

--
Guillaume Cottenceau

On Mon, Mar 20, 2006 at 09:35:14AM +0100, Guillaume Cottenceau wrote:
> > shared_buffer = 12000
> > effective_cache_size = 25000
> >
> > This would mean you are reserving 100M for Postgres to cache relation
> > pages, and informing the planner that it can expect ~200M available
> > from the disk buffer cache. To give a better recommendation, we need
>
> Ok, thanks. I wanted to investigate this field, but as the
> application is multithreaded and uses a lot of postgres clients,
> I wanted to make sure the shared_buffers values is globally for
> postgres, not just per (TCP) connection to postgres, before
> increasing the value, fearing to take the whole server down.

shared_buffer is for the entire 'cluster', not per-connection or
per-database.

Also, effective_cache_size of 25000 on a 1G machine seems pretty
conservative to me. I'd set it to at least 512MB, if not closer to
800MB.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

On Mon, Mar 20, 2006 at 09:14:32AM +0100, Guillaume Cottenceau wrote:
> Guillaume,
>
> Thanks for your answer.
>
> > On 17 Mar 2006 11:09:50 +0100, Guillaume Cottenceau
> > wrote:
> > > Reading the documentation and postgresql list archives, I have
> > > run ANALYZE right before my tests, I have increased the
> > > statistics target to 50 for the considered table; my problem is
> > > that the index scan cost reported by EXPLAIN seems to be around
> > > 12.7 times higher that it should, a figure I suppose incompatible
> > > (too large) for just random_page_cost and effective_cache_size
> > > tweaks.
> >
> > It's not surprising you have a high cost for an index scan which is
> > planned to return and returns so much rows. I really don't think the
> > planner does something wrong on this one.
>
> My point is that the planner's cost estimate is way above the
> actual cost of the query, so the planner doesn't use the best
> plan. Even if the index returns so much rows, actual cost of the
> query is so that index scan (worst case, all disk cache flushed)
> is still better than seq scan but the planner uses seq scan.

Yes. The cost estimator for an index scan supposedly does a linear
interpolation between a minimum cost and a maximum cost depending on the
correlation of the first field in the index. The problem is that while
the comment states it's a linear interpolation, the actual formula
squares the correlation before interpolating. This means that unless the
correlation is very high, you're going to get an unrealistically high
cost for an index scan. I have data that supports this at
http://stats.distributed.net/~decibel/, but I've never been able to get
around to testing a patch to see if it improves things.

<snip>
> > thinking the problem is that your test case is not accurate.
>
> Ok.

Actually, I suspect your test case was probably fine, but take a look at
the data I've got and see what you think. If you want to spend some time
on this it should be possible to come up with a test case that uses
either pgbench or dbt2/3 to generate data, so that others can easily
reproduce (I can't really make the data I used for my testing
available).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: planner with index scan cost way off actual cost, advices to tweak cost constants?

From
Guillaume Cottenceau
Date:
"Jim C. Nasby" <jnasby 'at' pervasive.com> writes:

[...]

> > My point is that the planner's cost estimate is way above the
> > actual cost of the query, so the planner doesn't use the best
> > plan. Even if the index returns so much rows, actual cost of the
> > query is so that index scan (worst case, all disk cache flushed)
> > is still better than seq scan but the planner uses seq scan.
>
> Yes. The cost estimator for an index scan supposedly does a linear
> interpolation between a minimum cost and a maximum cost depending on the
> correlation of the first field in the index. The problem is that while
> the comment states it's a linear interpolation, the actual formula
> squares the correlation before interpolating. This means that unless the
> correlation is very high, you're going to get an unrealistically high
> cost for an index scan. I have data that supports this at
> http://stats.distributed.net/~decibel/, but I've never been able to get
> around to testing a patch to see if it improves things.

Interesting.

It would be nice to investigate the arguments behind the choice
you describe for the formula used to perform the interpolation. I
have absolutely no knowledge on pg internals so this is rather
new/fresh for me, I have no idea how smart that choice is (but
based on my general feeling about pg, I'm suspecting this is
actually smart but I am not smart enough to see why ;p).

--
Guillaume Cottenceau

Re: planner with index scan cost way off actual cost,

From
Mark Kirkwood
Date:
Jim C. Nasby wrote:
> On Mon, Mar 20, 2006 at 09:35:14AM +0100, Guillaume Cottenceau wrote:
>
>>>shared_buffer = 12000
>>>effective_cache_size = 25000
>>>
>>>This would mean you are reserving 100M for Postgres to cache relation
>>>pages, and informing the planner that it can expect ~200M available
>>>from the disk buffer cache. To give a better recommendation, we need
>>
>>Ok, thanks. I wanted to investigate this field, but as the
>>application is multithreaded and uses a lot of postgres clients,
>>I wanted to make sure the shared_buffers values is globally for
>>postgres, not just per (TCP) connection to postgres, before
>>increasing the value, fearing to take the whole server down.
>
>
> shared_buffer is for the entire 'cluster', not per-connection or
> per-database.
>
> Also, effective_cache_size of 25000 on a 1G machine seems pretty
> conservative to me. I'd set it to at least 512MB, if not closer to
> 800MB.

I was going to recommend higher - but not knowing what else was running,
kept it to quite conservative :-)... and given he's running java, the
JVM could easily eat 512M all by itself!

Cheers

Mark

Re: planner with index scan cost way off actual cost,

From
"Jim C. Nasby"
Date:
On Tue, Mar 21, 2006 at 10:40:45PM +1200, Mark Kirkwood wrote:
> I was going to recommend higher - but not knowing what else was running,
> kept it to quite conservative :-)... and given he's running java, the
> JVM could easily eat 512M all by itself!

Oh, didn't pick up on java being in the mix. Yeah, it can be a real pig.
I think people often place too much emphasis on having a seperate
application server, but in the case of java you often have no choice.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

On Tue, Mar 21, 2006 at 11:13:06AM +0100, Guillaume Cottenceau wrote:
> "Jim C. Nasby" <jnasby 'at' pervasive.com> writes:
>
> [...]
>
> > > My point is that the planner's cost estimate is way above the
> > > actual cost of the query, so the planner doesn't use the best
> > > plan. Even if the index returns so much rows, actual cost of the
> > > query is so that index scan (worst case, all disk cache flushed)
> > > is still better than seq scan but the planner uses seq scan.
> >
> > Yes. The cost estimator for an index scan supposedly does a linear
> > interpolation between a minimum cost and a maximum cost depending on the
> > correlation of the first field in the index. The problem is that while
> > the comment states it's a linear interpolation, the actual formula
> > squares the correlation before interpolating. This means that unless the
> > correlation is very high, you're going to get an unrealistically high
> > cost for an index scan. I have data that supports this at
> > http://stats.distributed.net/~decibel/, but I've never been able to get
> > around to testing a patch to see if it improves things.
>
> Interesting.
>
> It would be nice to investigate the arguments behind the choice
> you describe for the formula used to perform the interpolation. I
> have absolutely no knowledge on pg internals so this is rather
> new/fresh for me, I have no idea how smart that choice is (but
> based on my general feeling about pg, I'm suspecting this is
> actually smart but I am not smart enough to see why ;p).

If you feel like running some tests, you need to change

    run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

in src/backend/optimizer/path/costsize.c to something like

    run_cost += max_IO_cost + abs(indexCorrelation) * (min_IO_cost - max_IO_cost);

That might not produce a perfect cost estimate, but I'll wager that it
will be substantially better than what's in there now. FYI, see also
http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: planner with index scan cost way off actual cost,

From
Guillaume Cottenceau
Date:
"Jim C. Nasby" <jnasby 'at' pervasive.com> writes:

> On Tue, Mar 21, 2006 at 10:40:45PM +1200, Mark Kirkwood wrote:
> > I was going to recommend higher - but not knowing what else was running,
> > kept it to quite conservative :-)... and given he's running java, the
> > JVM could easily eat 512M all by itself!
>
> Oh, didn't pick up on java being in the mix. Yeah, it can be a real pig.
> I think people often place too much emphasis on having a seperate
> application server, but in the case of java you often have no choice.

Fortunately the servers use 2G or 4G of memory, only my test
machine had 1G, as I believe I precised in a message; so I'm
definitely going to use Mark's advices to enlarge a lot the
shared buffers. Btw, what about sort_mem? I have seen it only
little referenced in the documentation.

Also, I'd still be interested in comments on the result of pmap
showing around 450M of "private memory" used by pg, if anyone can
share insight about it. Though most people seem freebsd-oriented,
and this might be very much linux-centric.

--
Guillaume Cottenceau

Re: planner with index scan cost way off actual cost, advices to tweak cost constants?

From
Guillaume Cottenceau
Date:
"Jim C. Nasby" <jnasby 'at' pervasive.com> writes:

> If you feel like running some tests, you need to change
>
>     run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
>
> in src/backend/optimizer/path/costsize.c to something like
>
>     run_cost += max_IO_cost + abs(indexCorrelation) * (min_IO_cost - max_IO_cost);

Short after the beginning of a discussion about planner
associating too high cost for index scan, I'm suggested to change
source-code.. I'm already frightened about the near future :)

> That might not produce a perfect cost estimate, but I'll wager that it
> will be substantially better than what's in there now. FYI, see also
> http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php

Sad that Tom didn't share his thoughts about your cost algorithm
question in this message.

--
Guillaume Cottenceau

On Tue, Mar 21, 2006 at 02:30:22PM +0100, Guillaume Cottenceau wrote:
> "Jim C. Nasby" <jnasby 'at' pervasive.com> writes:
>
> > If you feel like running some tests, you need to change
> >
> >     run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
> >
> > in src/backend/optimizer/path/costsize.c to something like
> >
> >     run_cost += max_IO_cost + abs(indexCorrelation) * (min_IO_cost - max_IO_cost);
>
> Short after the beginning of a discussion about planner
> associating too high cost for index scan, I'm suggested to change
> source-code.. I'm already frightened about the near future :)

Well, this is mostly because I've just never gotten around to following
up on this.

> > That might not produce a perfect cost estimate, but I'll wager that it
> > will be substantially better than what's in there now. FYI, see also
> > http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php
>
> Sad that Tom didn't share his thoughts about your cost algorithm
> question in this message.

See above. :)

If someone comes up with a before and after comparison showing that the
change makes the estimator more accurate I'm sure the code will be
changed in short order. The nice thing about this case is that basically
any PostgreSQL user can do the heavy lifting, instead of relying on the
primary contributors for a change.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: planner with index scan cost way off actual cost,

From
"Jim C. Nasby"
Date:
On Tue, Mar 21, 2006 at 02:03:19PM +0100, Guillaume Cottenceau wrote:
> "Jim C. Nasby" <jnasby 'at' pervasive.com> writes:
>
> > On Tue, Mar 21, 2006 at 10:40:45PM +1200, Mark Kirkwood wrote:
> > > I was going to recommend higher - but not knowing what else was running,
> > > kept it to quite conservative :-)... and given he's running java, the
> > > JVM could easily eat 512M all by itself!
> >
> > Oh, didn't pick up on java being in the mix. Yeah, it can be a real pig.
> > I think people often place too much emphasis on having a seperate
> > application server, but in the case of java you often have no choice.
>
> Fortunately the servers use 2G or 4G of memory, only my test
> machine had 1G, as I believe I precised in a message; so I'm
> definitely going to use Mark's advices to enlarge a lot the
> shared buffers. Btw, what about sort_mem? I have seen it only
> little referenced in the documentation.

The biggest issue with setting work_mem (you're not doing current
development on 7.4 are you?) is ensuring that you don't push the server
into swapping. Remember that work_mem controls how much memory can be
used for EACH sort or hash (maybe others) operation. Each query can
consume multiples of work_mem (since it can do multiple sorts, for
example), and of course each backend could be running a query at the
same time. Because of all this it's pretty difficult to make work_mem
recomendations without knowing a lot more about your environment.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: planner with index scan cost way off actual cost,

From
Simon Riggs
Date:
On Fri, 2006-03-17 at 11:09 +0100, Guillaume Cottenceau wrote:

> INFO:  index "idx_sent_msgs_date_theme_status" now contains 3692284 row versions in 88057 pages

> SET effective_cache_size = 10000;

SET effective_cache_size > 88057, round up to 100000

to ensure the index cost calculation knows the whole index will be
cached, which it clearly could be with 4GB RAM.

If the cost is still wrong, it is because the index order doesn't
correlate physically with the key columns. Use CLUSTER.

Best Regards, Simon Riggs


Re: planner with index scan cost way off actual cost,

From
Guillaume Cottenceau
Date:
"Jim C. Nasby" <jnasby 'at' pervasive.com> writes:

> On Tue, Mar 21, 2006 at 02:03:19PM +0100, Guillaume Cottenceau wrote:
> > "Jim C. Nasby" <jnasby 'at' pervasive.com> writes:
> >
> > > On Tue, Mar 21, 2006 at 10:40:45PM +1200, Mark Kirkwood wrote:
> > > > I was going to recommend higher - but not knowing what else was running,
> > > > kept it to quite conservative :-)... and given he's running java, the
> > > > JVM could easily eat 512M all by itself!
> > >
> > > Oh, didn't pick up on java being in the mix. Yeah, it can be a real pig.
> > > I think people often place too much emphasis on having a seperate
> > > application server, but in the case of java you often have no choice.
> >
> > Fortunately the servers use 2G or 4G of memory, only my test
> > machine had 1G, as I believe I precised in a message; so I'm
> > definitely going to use Mark's advices to enlarge a lot the
> > shared buffers. Btw, what about sort_mem? I have seen it only
> > little referenced in the documentation.
>
> The biggest issue with setting work_mem (you're not doing current
> development on 7.4 are you?) is ensuring that you don't push the server

Yes, we use 7.4.5 actually, because "it just works", so production
wants to first deal with all the things that don't work before
upgrading. I have recently discovered about the background writer
of 8.x which could be a supplementary reason to push for an
ugprade though.

> into swapping. Remember that work_mem controls how much memory can be
> used for EACH sort or hash (maybe others) operation. Each query can
> consume multiples of work_mem (since it can do multiple sorts, for
> example), and of course each backend could be running a query at the
> same time. Because of all this it's pretty difficult to make work_mem
> recomendations without knowing a lot more about your environment.

Ok, I see. Thanks for the info!

--
Guillaume Cottenceau

Re: planner with index scan cost way off actual cost,

From
Scott Marlowe
Date:
On Wed, 2006-03-22 at 02:04, Guillaume Cottenceau wrote:
> "Jim C. Nasby" <jnasby 'at' pervasive.com> writes:
>
> > On Tue, Mar 21, 2006 at 02:03:19PM +0100, Guillaume Cottenceau wrote:
> > > "Jim C. Nasby" <jnasby 'at' pervasive.com> writes:
> > >
> > > > On Tue, Mar 21, 2006 at 10:40:45PM +1200, Mark Kirkwood wrote:
> > > > > I was going to recommend higher - but not knowing what else was running,
> > > > > kept it to quite conservative :-)... and given he's running java, the
> > > > > JVM could easily eat 512M all by itself!
> > > >
> > > > Oh, didn't pick up on java being in the mix. Yeah, it can be a real pig.
> > > > I think people often place too much emphasis on having a seperate
> > > > application server, but in the case of java you often have no choice.
> > >
> > > Fortunately the servers use 2G or 4G of memory, only my test
> > > machine had 1G, as I believe I precised in a message; so I'm
> > > definitely going to use Mark's advices to enlarge a lot the
> > > shared buffers. Btw, what about sort_mem? I have seen it only
> > > little referenced in the documentation.
> >
> > The biggest issue with setting work_mem (you're not doing current
> > development on 7.4 are you?) is ensuring that you don't push the server
>
> Yes, we use 7.4.5 actually, because "it just works", so production
> wants to first deal with all the things that don't work before
> upgrading. I have recently discovered about the background writer
> of 8.x which could be a supplementary reason to push for an
> ugprade though.

Imagine you get a call from the manufacturer of your car.  There's a
problem with the fuel pump, and, in a small percentage of accidents,
your car could catch fire and kill everyone inside.

Do you go in for the recall, or ignore it because you just want your car
to "just work?"

In the case of the third number in postgresql releases, that's what
you're talking about.  the updates that have come after the 7.4.5
version, just talking 7.4 series here, have included a few crash and
data loss fixes.  Rare, but possible.

Don't worry about upgrading to 8.x until later, fine, but you should
really be upgrading to the latest patch level of 7.4.

I fight this same fight at work, by the way.  It's hard convincing
people that the updates are security / crash / data loss only...

Re: planner with index scan cost way off actual cost,

From
Guillaume Cottenceau
Date:
Hi Scott,

Scott Marlowe <smarlowe 'at' g2switchworks.com> writes:

> On Wed, 2006-03-22 at 02:04, Guillaume Cottenceau wrote:

[...]

> > Yes, we use 7.4.5 actually, because "it just works", so production
> > wants to first deal with all the things that don't work before
> > upgrading. I have recently discovered about the background writer
> > of 8.x which could be a supplementary reason to push for an
> > ugprade though.
>
> Imagine you get a call from the manufacturer of your car.  There's a
> problem with the fuel pump, and, in a small percentage of accidents,
> your car could catch fire and kill everyone inside.
>
> Do you go in for the recall, or ignore it because you just want your car
> to "just work?"

Ah, this holy computer/OS/whatever-to-cars comparison.. How many
million electrons would the world save if computer people would
abandon it? :)

> In the case of the third number in postgresql releases, that's what
> you're talking about.  the updates that have come after the 7.4.5
> version, just talking 7.4 series here, have included a few crash and
> data loss fixes.  Rare, but possible.

I guess we didn't know that. I for myself have (a bit more)
excuses because I'm on the development side :) But I've passed
the information to the operation team, thank you.

--
Guillaume Cottenceau