Thread: planner with index scan cost way off actual cost, advices to tweak cost constants?
planner with index scan cost way off actual cost, advices to tweak cost constants?
From
Guillaume Cottenceau
Date:
Hi, I have a problem with the postgres planner, which gives a cost to index scan which is much higher than actual cost (worst case considered, e.g. without any previous disk cache), and am posting here for advices for tweaking cost constants. Because of this problem, the planner typically chooses a seq scan when an index scan would be more efficient, and I would like to correct this if possible. Reading the documentation and postgresql list archives, I have run ANALYZE right before my tests, I have increased the statistics target to 50 for the considered table; my problem is that the index scan cost reported by EXPLAIN seems to be around 12.7 times higher that it should, a figure I suppose incompatible (too large) for just random_page_cost and effective_cache_size tweaks. Structure of the table: \d sent_messages Table "public.sent_messages" Column | Type | Modifiers ----------+--------------------------+---------------------------------------------------------------- uid | integer | not null default nextval('public.sent_messages_uid_seq'::text) sender | character varying(25) | receiver | character varying(25) | action | character varying(25) | cost | integer | date | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone status | character varying(128) | theme | character varying(25) | operator | character varying(15) | Indexes: "sent_messages_pkey" primary key, btree (uid) "idx_sent_msgs_date_theme_status" btree (date, theme, status) What I did: - SET default_statistics_target = 50 - VACUUM FULL ANALYZE VERBOSE sent_messages - copied so that you can have a look at rows and pages taken up by relations INFO: vacuuming "public.sent_messages" INFO: "sent_messages": found 0 removable, 3692284 nonremovable row versions in 55207 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 103 to 177 bytes long. There were 150468 unused item pointers. Total free space (including removable row versions) is 2507320 bytes. 0 pages are or will become empty, including 0 at the end of the table. 2469 pages containing 262732 free bytes are potential move destinations. CPU 0.57s/0.20u sec elapsed 11.27 sec. INFO: index "sent_messages_pkey" now contains 3692284 row versions in 57473 pages DETAIL: 0 index row versions were removed. 318 index pages have been deleted, 318 are currently reusable. CPU 2.80s/1.27u sec elapsed 112.69 sec. INFO: index "idx_sent_msgs_date_theme_status" now contains 3692284 row versions in 88057 pages DETAIL: 0 index row versions were removed. 979 index pages have been deleted, 979 are currently reusable. CPU 4.22s/1.51u sec elapsed 246.88 sec. INFO: "sent_messages": moved 0 row versions, truncated 55207 to 55207 pages DETAIL: CPU 1.87s/3.18u sec elapsed 42.71 sec. INFO: vacuuming "pg_toast.pg_toast_77852470" INFO: "pg_toast_77852470": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 0 to 0 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 0 bytes. 0 pages are or will become empty, including 0 at the end of the table. 0 pages containing 0 free bytes are potential move destinations. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "pg_toast_77852470_index" now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.02 sec. INFO: analyzing "public.sent_messages" INFO: "sent_messages": 55207 pages, 15000 rows sampled, 3666236 estimated total rows - select rows of the table with a range condition on "date", find a range for which seq scan and index scan runtimes seem to be very close (I use Linux, I cat a 2G file to /dev/null between each request to flush disk cache, on a machine of 1G real RAM and 1G of swap, so that this is the worst case tested for index scan), notice that the cost used by the planner is 12.67 times higher for index scan, at a position it should be around 1 so that planner could make sensible choices: EXPLAIN ANALYZE SELECT * FROM sent_messages WHERE date > '2005-09-01' AND date < '2005-09-19'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on sent_messages (cost=0.00..110591.26 rows=392066 width=78) (actual time=7513.205..13095.147 rows=393074 loops=1) Filter: ((date > '2005-09-01 00:00:00+00'::timestamp with time zone) AND (date < '2005-09-19 00:00:00+00'::timestamp withtime zone)) Total runtime: 14272.522 ms SET enable_seqscan = false EXPLAIN ANALYZE SELECT * FROM sent_messages WHERE date > '2005-09-01' AND date < '2005-09-19'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_sent_msgs_date_theme_status on sent_messages (cost=0.00..1402124.26 rows=392066 width=78) (actualtime=142.638..12677.378 rows=393074 loops=1) Index Cond: ((date > '2005-09-01 00:00:00+00'::timestamp with time zone) AND (date < '2005-09-19 00:00:00+00'::timestampwith time zone)) Total runtime: 13846.504 ms Please notice that an index on the "date" column only would be much more efficient for the considered request (and I have confirmed this by creating and trying it), but I don't necessarily would need this index if the existing index was used. Of course real queries use smaller date ranges. - I then tried to tweak random_page_cost and effective_cache_size following advices from documentation: SET random_page_cost = 2; SET effective_cache_size = 10000; EXPLAIN SELECT * FROM sent_messages WHERE date > '2005-09-01' AND date < '2005-09-19'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_sent_msgs_date_theme_status on sent_messages (cost=0.00..595894.94 rows=392066 width=78) Index Cond: ((date > '2005-09-01 00:00:00+00'::timestamp with time zone) AND (date < '2005-09-19 00:00:00+00'::timestampwith time zone)) We can see that estimated index scan cost goes down but by a factor of approx. 2.3 which is far from enough to "fix" it. I am reluctant in changing way more the random_page_cost and effective_cache_size values as I'm suspecting it might have other (bad) consequences if it is too far away from reality (even if Linux is known to aggressively cache), the application being multithreaded (there is a warning about concurrent queries using different indexes in documentation). But I certainly could benefit from others' experience on this matter. I apologize for this long email but I wanted to be sure I gave enough information on the data and things I have tried to fix the problem myself. If anyone can see what I am doing wrong, I would be very interested in pointers. Thanks in advance! Btw, I use postgres 7.4.5 with -B 1000 -N 500 and all postgresql.conf default values except timezone = 'UTC', on an ext3 partition with data=ordered, and run Linux 2.6.12. -- Guillaume Cottenceau
Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
From
"Guillaume Smet"
Date:
Guillaume, On 17 Mar 2006 11:09:50 +0100, Guillaume Cottenceau wrote: > Reading the documentation and postgresql list archives, I have > run ANALYZE right before my tests, I have increased the > statistics target to 50 for the considered table; my problem is > that the index scan cost reported by EXPLAIN seems to be around > 12.7 times higher that it should, a figure I suppose incompatible > (too large) for just random_page_cost and effective_cache_size > tweaks. It's not surprising you have a high cost for an index scan which is planned to return and returns so much rows. I really don't think the planner does something wrong on this one. AFAIK, increasing the statistics target won't do anything to reduce the cost as the planner estimation for the number of returned rows is already really accurate and probably can't be better. > Of course real queries use smaller date ranges. What about providing us the respective plans for your real queries? And in a real case. It's a bad idea to compare index scan and seqscan when your data have to be loaded in RAM. Before doing so create an index on the date column to have the most effective index possible. > - I then tried to tweak random_page_cost and effective_cache_size > following advices from documentation: > > SET random_page_cost = 2; random_page_cost is the way to go for this sort of thing but I don't think it's a good idea to have it too low globally and I'm still thinking the problem is that your test case is not accurate. -- Guillaume
Guillaume Cottenceau wrote: > > SET random_page_cost = 2; > SET effective_cache_size = 10000; > EXPLAIN SELECT * FROM sent_messages WHERE date > '2005-09-01' AND date < '2005-09-19'; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using idx_sent_msgs_date_theme_status on sent_messages (cost=0.00..595894.94 rows=392066 width=78) > Index Cond: ((date > '2005-09-01 00:00:00+00'::timestamp with time zone) AND (date < '2005-09-19 00:00:00+00'::timestampwith time zone)) > > > We can see that estimated index scan cost goes down but by a > factor of approx. 2.3 which is far from enough to "fix" it. I > am reluctant in changing way more the random_page_cost and > effective_cache_size values as I'm suspecting it might have > other (bad) consequences if it is too far away from reality > (even if Linux is known to aggressively cache), the application > being multithreaded (there is a warning about concurrent > queries using different indexes in documentation). But I > certainly could benefit from others' experience on this matter. > > > I apologize for this long email but I wanted to be sure I gave > enough information on the data and things I have tried to fix the > problem myself. If anyone can see what I am doing wrong, I would > be very interested in pointers. > > Thanks in advance! > > Btw, I use postgres 7.4.5 with -B 1000 -N 500 and all > postgresql.conf default values except timezone = 'UTC', on an > ext3 partition with data=ordered, and run Linux 2.6.12. > I didn't see any mention of how much memory is on your server, but provided you have say 1G, and are using the box solely for a database server, I would increase both shared_buffers and effective_cache size. shared_buffer = 12000 effective_cache_size = 25000 This would mean you are reserving 100M for Postgres to cache relation pages, and informing the planner that it can expect ~200M available from the disk buffer cache. To give a better recommendation, we need to know more about your server and workload (e.g server memory configuration and usage plus how close you get to 500 connections). Cheers Mark
Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
From
Guillaume Cottenceau
Date:
Guillaume, Thanks for your answer. > On 17 Mar 2006 11:09:50 +0100, Guillaume Cottenceau > wrote: > > Reading the documentation and postgresql list archives, I have > > run ANALYZE right before my tests, I have increased the > > statistics target to 50 for the considered table; my problem is > > that the index scan cost reported by EXPLAIN seems to be around > > 12.7 times higher that it should, a figure I suppose incompatible > > (too large) for just random_page_cost and effective_cache_size > > tweaks. > > It's not surprising you have a high cost for an index scan which is > planned to return and returns so much rows. I really don't think the > planner does something wrong on this one. My point is that the planner's cost estimate is way above the actual cost of the query, so the planner doesn't use the best plan. Even if the index returns so much rows, actual cost of the query is so that index scan (worst case, all disk cache flushed) is still better than seq scan but the planner uses seq scan. > AFAIK, increasing the statistics target won't do anything to reduce > the cost as the planner estimation for the number of returned rows is > already really accurate and probably can't be better. Ok, thanks. > > Of course real queries use smaller date ranges. > > What about providing us the respective plans for your real queries? > And in a real case. It's a bad idea to compare index scan and seqscan The original query is more complicated and sometimes involves restricting the resultset with another constraint. I am not sure it is very interesting to show it; I know that best performance would be achieved with an index on the date column for the shown query, and an index on the date column and the other column when doing a query on these.. > when your data have to be loaded in RAM. What do you mean? That I should not flush disk cache before timing? I did so to find the worst case.. I am not sure it is the best solution.. maybe half worst case would be? but this depends a lot on whether the index pages would stay in disk cache or not before next query.. which cannot be told for sure unless a full serious timing of the real application is done (and my application can be used in quite different scenarios, which means such a test is not entirely possible/meaningful). > Before doing so create an index on the date column to have the most > effective index possible. Yes, as I said, I know that doing this would improve a lot the queries. My point was to understand why the cost of the index scan is so "inaccurate" compared to actual cost. Adding an index on the date column enlarges the data by 100-150M so I'd rather save this if possible. > > - I then tried to tweak random_page_cost and effective_cache_size > > following advices from documentation: > > > > SET random_page_cost = 2; > > random_page_cost is the way to go for this sort of thing but I don't > think it's a good idea to have it too low globally and I'm still Thanks, I suspected so. > thinking the problem is that your test case is not accurate. Ok. -- Guillaume Cottenceau
Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
From
Guillaume Cottenceau
Date:
Hi Mark, Thanks for your reply. > Guillaume Cottenceau wrote: [...] > > Btw, I use postgres 7.4.5 with -B 1000 -N 500 and all > > postgresql.conf default values except timezone = 'UTC', on an > > ext3 partition with data=ordered, and run Linux 2.6.12. > > I didn't see any mention of how much memory is on your server, but > provided you have say 1G, and are using the box solely for a database > server, I would increase both shared_buffers and effective_cache size. This test machine has 1G of (real) memory, servers often have 2G or 4G. The thing is that the application runs on the same machine, and as it is a java application, it takes up a little memory too (we can say half of it should go to java and half to postgres, I guess). Determining the best memory "plan" is not so easy, though your information is priceless and will help a lot! > shared_buffer = 12000 > effective_cache_size = 25000 > > This would mean you are reserving 100M for Postgres to cache relation > pages, and informing the planner that it can expect ~200M available > from the disk buffer cache. To give a better recommendation, we need Ok, thanks. I wanted to investigate this field, but as the application is multithreaded and uses a lot of postgres clients, I wanted to make sure the shared_buffers values is globally for postgres, not just per (TCP) connection to postgres, before increasing the value, fearing to take the whole server down. On a server with 235 connections and -N 512 -B 1024, reading http://virtualthreads.blogspot.com/2006/02/understanding-memory-usage-on-linux.html I came up with the following figure: for i in `pidof postmaster`; do pmap -d $i | grep -i writeable ; done | perl -MMDK::Common -ne 'do { push @a, $1; $tot +=$1 } if /writeable.private: (\d+)K/; END { print "total postgres private memory: ${tot}K\nmin: " . min(@a) . "K\nmax: ". max(@a) . "K\n"; }' total postgres private memory: 432080K min: 936K max: 4216K As the server has 2G of memory, I was reluctant to increase the amount of shared memory since overall postgres memory use seems already quite high - though 100M more would not kill the server, obviously. Btw, can you comment on the upper figures? > to know more about your server and workload (e.g server memory > configuration and usage plus how close you get to 500 connections). Depending on the server, it can have 200, up to around 400 connections open. As of workload, I am not sure what metrics are suitable. Typically postgres can be seen in the top processes but most queries are quick and average load average reported by the linux kernel is nearly always below 0.3, and often 0.1. These are single or dual xeon 2.8 GHz machines with hardware raid (megaraid or percraid driver) with reasonable performance. -- Guillaume Cottenceau
Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
From
"Jim C. Nasby"
Date:
On Mon, Mar 20, 2006 at 09:35:14AM +0100, Guillaume Cottenceau wrote: > > shared_buffer = 12000 > > effective_cache_size = 25000 > > > > This would mean you are reserving 100M for Postgres to cache relation > > pages, and informing the planner that it can expect ~200M available > > from the disk buffer cache. To give a better recommendation, we need > > Ok, thanks. I wanted to investigate this field, but as the > application is multithreaded and uses a lot of postgres clients, > I wanted to make sure the shared_buffers values is globally for > postgres, not just per (TCP) connection to postgres, before > increasing the value, fearing to take the whole server down. shared_buffer is for the entire 'cluster', not per-connection or per-database. Also, effective_cache_size of 25000 on a 1G machine seems pretty conservative to me. I'd set it to at least 512MB, if not closer to 800MB. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
From
"Jim C. Nasby"
Date:
On Mon, Mar 20, 2006 at 09:14:32AM +0100, Guillaume Cottenceau wrote: > Guillaume, > > Thanks for your answer. > > > On 17 Mar 2006 11:09:50 +0100, Guillaume Cottenceau > > wrote: > > > Reading the documentation and postgresql list archives, I have > > > run ANALYZE right before my tests, I have increased the > > > statistics target to 50 for the considered table; my problem is > > > that the index scan cost reported by EXPLAIN seems to be around > > > 12.7 times higher that it should, a figure I suppose incompatible > > > (too large) for just random_page_cost and effective_cache_size > > > tweaks. > > > > It's not surprising you have a high cost for an index scan which is > > planned to return and returns so much rows. I really don't think the > > planner does something wrong on this one. > > My point is that the planner's cost estimate is way above the > actual cost of the query, so the planner doesn't use the best > plan. Even if the index returns so much rows, actual cost of the > query is so that index scan (worst case, all disk cache flushed) > is still better than seq scan but the planner uses seq scan. Yes. The cost estimator for an index scan supposedly does a linear interpolation between a minimum cost and a maximum cost depending on the correlation of the first field in the index. The problem is that while the comment states it's a linear interpolation, the actual formula squares the correlation before interpolating. This means that unless the correlation is very high, you're going to get an unrealistically high cost for an index scan. I have data that supports this at http://stats.distributed.net/~decibel/, but I've never been able to get around to testing a patch to see if it improves things. <snip> > > thinking the problem is that your test case is not accurate. > > Ok. Actually, I suspect your test case was probably fine, but take a look at the data I've got and see what you think. If you want to spend some time on this it should be possible to come up with a test case that uses either pgbench or dbt2/3 to generate data, so that others can easily reproduce (I can't really make the data I used for my testing available). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
From
Guillaume Cottenceau
Date:
"Jim C. Nasby" <jnasby 'at' pervasive.com> writes: [...] > > My point is that the planner's cost estimate is way above the > > actual cost of the query, so the planner doesn't use the best > > plan. Even if the index returns so much rows, actual cost of the > > query is so that index scan (worst case, all disk cache flushed) > > is still better than seq scan but the planner uses seq scan. > > Yes. The cost estimator for an index scan supposedly does a linear > interpolation between a minimum cost and a maximum cost depending on the > correlation of the first field in the index. The problem is that while > the comment states it's a linear interpolation, the actual formula > squares the correlation before interpolating. This means that unless the > correlation is very high, you're going to get an unrealistically high > cost for an index scan. I have data that supports this at > http://stats.distributed.net/~decibel/, but I've never been able to get > around to testing a patch to see if it improves things. Interesting. It would be nice to investigate the arguments behind the choice you describe for the formula used to perform the interpolation. I have absolutely no knowledge on pg internals so this is rather new/fresh for me, I have no idea how smart that choice is (but based on my general feeling about pg, I'm suspecting this is actually smart but I am not smart enough to see why ;p). -- Guillaume Cottenceau
Jim C. Nasby wrote: > On Mon, Mar 20, 2006 at 09:35:14AM +0100, Guillaume Cottenceau wrote: > >>>shared_buffer = 12000 >>>effective_cache_size = 25000 >>> >>>This would mean you are reserving 100M for Postgres to cache relation >>>pages, and informing the planner that it can expect ~200M available >>>from the disk buffer cache. To give a better recommendation, we need >> >>Ok, thanks. I wanted to investigate this field, but as the >>application is multithreaded and uses a lot of postgres clients, >>I wanted to make sure the shared_buffers values is globally for >>postgres, not just per (TCP) connection to postgres, before >>increasing the value, fearing to take the whole server down. > > > shared_buffer is for the entire 'cluster', not per-connection or > per-database. > > Also, effective_cache_size of 25000 on a 1G machine seems pretty > conservative to me. I'd set it to at least 512MB, if not closer to > 800MB. I was going to recommend higher - but not knowing what else was running, kept it to quite conservative :-)... and given he's running java, the JVM could easily eat 512M all by itself! Cheers Mark
On Tue, Mar 21, 2006 at 10:40:45PM +1200, Mark Kirkwood wrote: > I was going to recommend higher - but not knowing what else was running, > kept it to quite conservative :-)... and given he's running java, the > JVM could easily eat 512M all by itself! Oh, didn't pick up on java being in the mix. Yeah, it can be a real pig. I think people often place too much emphasis on having a seperate application server, but in the case of java you often have no choice. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
From
"Jim C. Nasby"
Date:
On Tue, Mar 21, 2006 at 11:13:06AM +0100, Guillaume Cottenceau wrote: > "Jim C. Nasby" <jnasby 'at' pervasive.com> writes: > > [...] > > > > My point is that the planner's cost estimate is way above the > > > actual cost of the query, so the planner doesn't use the best > > > plan. Even if the index returns so much rows, actual cost of the > > > query is so that index scan (worst case, all disk cache flushed) > > > is still better than seq scan but the planner uses seq scan. > > > > Yes. The cost estimator for an index scan supposedly does a linear > > interpolation between a minimum cost and a maximum cost depending on the > > correlation of the first field in the index. The problem is that while > > the comment states it's a linear interpolation, the actual formula > > squares the correlation before interpolating. This means that unless the > > correlation is very high, you're going to get an unrealistically high > > cost for an index scan. I have data that supports this at > > http://stats.distributed.net/~decibel/, but I've never been able to get > > around to testing a patch to see if it improves things. > > Interesting. > > It would be nice to investigate the arguments behind the choice > you describe for the formula used to perform the interpolation. I > have absolutely no knowledge on pg internals so this is rather > new/fresh for me, I have no idea how smart that choice is (but > based on my general feeling about pg, I'm suspecting this is > actually smart but I am not smart enough to see why ;p). If you feel like running some tests, you need to change run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost); in src/backend/optimizer/path/costsize.c to something like run_cost += max_IO_cost + abs(indexCorrelation) * (min_IO_cost - max_IO_cost); That might not produce a perfect cost estimate, but I'll wager that it will be substantially better than what's in there now. FYI, see also http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby 'at' pervasive.com> writes: > On Tue, Mar 21, 2006 at 10:40:45PM +1200, Mark Kirkwood wrote: > > I was going to recommend higher - but not knowing what else was running, > > kept it to quite conservative :-)... and given he's running java, the > > JVM could easily eat 512M all by itself! > > Oh, didn't pick up on java being in the mix. Yeah, it can be a real pig. > I think people often place too much emphasis on having a seperate > application server, but in the case of java you often have no choice. Fortunately the servers use 2G or 4G of memory, only my test machine had 1G, as I believe I precised in a message; so I'm definitely going to use Mark's advices to enlarge a lot the shared buffers. Btw, what about sort_mem? I have seen it only little referenced in the documentation. Also, I'd still be interested in comments on the result of pmap showing around 450M of "private memory" used by pg, if anyone can share insight about it. Though most people seem freebsd-oriented, and this might be very much linux-centric. -- Guillaume Cottenceau
Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
From
Guillaume Cottenceau
Date:
"Jim C. Nasby" <jnasby 'at' pervasive.com> writes: > If you feel like running some tests, you need to change > > run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost); > > in src/backend/optimizer/path/costsize.c to something like > > run_cost += max_IO_cost + abs(indexCorrelation) * (min_IO_cost - max_IO_cost); Short after the beginning of a discussion about planner associating too high cost for index scan, I'm suggested to change source-code.. I'm already frightened about the near future :) > That might not produce a perfect cost estimate, but I'll wager that it > will be substantially better than what's in there now. FYI, see also > http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php Sad that Tom didn't share his thoughts about your cost algorithm question in this message. -- Guillaume Cottenceau
Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
From
"Jim C. Nasby"
Date:
On Tue, Mar 21, 2006 at 02:30:22PM +0100, Guillaume Cottenceau wrote: > "Jim C. Nasby" <jnasby 'at' pervasive.com> writes: > > > If you feel like running some tests, you need to change > > > > run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost); > > > > in src/backend/optimizer/path/costsize.c to something like > > > > run_cost += max_IO_cost + abs(indexCorrelation) * (min_IO_cost - max_IO_cost); > > Short after the beginning of a discussion about planner > associating too high cost for index scan, I'm suggested to change > source-code.. I'm already frightened about the near future :) Well, this is mostly because I've just never gotten around to following up on this. > > That might not produce a perfect cost estimate, but I'll wager that it > > will be substantially better than what's in there now. FYI, see also > > http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php > > Sad that Tom didn't share his thoughts about your cost algorithm > question in this message. See above. :) If someone comes up with a before and after comparison showing that the change makes the estimator more accurate I'm sure the code will be changed in short order. The nice thing about this case is that basically any PostgreSQL user can do the heavy lifting, instead of relying on the primary contributors for a change. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Tue, Mar 21, 2006 at 02:03:19PM +0100, Guillaume Cottenceau wrote: > "Jim C. Nasby" <jnasby 'at' pervasive.com> writes: > > > On Tue, Mar 21, 2006 at 10:40:45PM +1200, Mark Kirkwood wrote: > > > I was going to recommend higher - but not knowing what else was running, > > > kept it to quite conservative :-)... and given he's running java, the > > > JVM could easily eat 512M all by itself! > > > > Oh, didn't pick up on java being in the mix. Yeah, it can be a real pig. > > I think people often place too much emphasis on having a seperate > > application server, but in the case of java you often have no choice. > > Fortunately the servers use 2G or 4G of memory, only my test > machine had 1G, as I believe I precised in a message; so I'm > definitely going to use Mark's advices to enlarge a lot the > shared buffers. Btw, what about sort_mem? I have seen it only > little referenced in the documentation. The biggest issue with setting work_mem (you're not doing current development on 7.4 are you?) is ensuring that you don't push the server into swapping. Remember that work_mem controls how much memory can be used for EACH sort or hash (maybe others) operation. Each query can consume multiples of work_mem (since it can do multiple sorts, for example), and of course each backend could be running a query at the same time. Because of all this it's pretty difficult to make work_mem recomendations without knowing a lot more about your environment. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, 2006-03-17 at 11:09 +0100, Guillaume Cottenceau wrote: > INFO: index "idx_sent_msgs_date_theme_status" now contains 3692284 row versions in 88057 pages > SET effective_cache_size = 10000; SET effective_cache_size > 88057, round up to 100000 to ensure the index cost calculation knows the whole index will be cached, which it clearly could be with 4GB RAM. If the cost is still wrong, it is because the index order doesn't correlate physically with the key columns. Use CLUSTER. Best Regards, Simon Riggs
"Jim C. Nasby" <jnasby 'at' pervasive.com> writes: > On Tue, Mar 21, 2006 at 02:03:19PM +0100, Guillaume Cottenceau wrote: > > "Jim C. Nasby" <jnasby 'at' pervasive.com> writes: > > > > > On Tue, Mar 21, 2006 at 10:40:45PM +1200, Mark Kirkwood wrote: > > > > I was going to recommend higher - but not knowing what else was running, > > > > kept it to quite conservative :-)... and given he's running java, the > > > > JVM could easily eat 512M all by itself! > > > > > > Oh, didn't pick up on java being in the mix. Yeah, it can be a real pig. > > > I think people often place too much emphasis on having a seperate > > > application server, but in the case of java you often have no choice. > > > > Fortunately the servers use 2G or 4G of memory, only my test > > machine had 1G, as I believe I precised in a message; so I'm > > definitely going to use Mark's advices to enlarge a lot the > > shared buffers. Btw, what about sort_mem? I have seen it only > > little referenced in the documentation. > > The biggest issue with setting work_mem (you're not doing current > development on 7.4 are you?) is ensuring that you don't push the server Yes, we use 7.4.5 actually, because "it just works", so production wants to first deal with all the things that don't work before upgrading. I have recently discovered about the background writer of 8.x which could be a supplementary reason to push for an ugprade though. > into swapping. Remember that work_mem controls how much memory can be > used for EACH sort or hash (maybe others) operation. Each query can > consume multiples of work_mem (since it can do multiple sorts, for > example), and of course each backend could be running a query at the > same time. Because of all this it's pretty difficult to make work_mem > recomendations without knowing a lot more about your environment. Ok, I see. Thanks for the info! -- Guillaume Cottenceau
On Wed, 2006-03-22 at 02:04, Guillaume Cottenceau wrote: > "Jim C. Nasby" <jnasby 'at' pervasive.com> writes: > > > On Tue, Mar 21, 2006 at 02:03:19PM +0100, Guillaume Cottenceau wrote: > > > "Jim C. Nasby" <jnasby 'at' pervasive.com> writes: > > > > > > > On Tue, Mar 21, 2006 at 10:40:45PM +1200, Mark Kirkwood wrote: > > > > > I was going to recommend higher - but not knowing what else was running, > > > > > kept it to quite conservative :-)... and given he's running java, the > > > > > JVM could easily eat 512M all by itself! > > > > > > > > Oh, didn't pick up on java being in the mix. Yeah, it can be a real pig. > > > > I think people often place too much emphasis on having a seperate > > > > application server, but in the case of java you often have no choice. > > > > > > Fortunately the servers use 2G or 4G of memory, only my test > > > machine had 1G, as I believe I precised in a message; so I'm > > > definitely going to use Mark's advices to enlarge a lot the > > > shared buffers. Btw, what about sort_mem? I have seen it only > > > little referenced in the documentation. > > > > The biggest issue with setting work_mem (you're not doing current > > development on 7.4 are you?) is ensuring that you don't push the server > > Yes, we use 7.4.5 actually, because "it just works", so production > wants to first deal with all the things that don't work before > upgrading. I have recently discovered about the background writer > of 8.x which could be a supplementary reason to push for an > ugprade though. Imagine you get a call from the manufacturer of your car. There's a problem with the fuel pump, and, in a small percentage of accidents, your car could catch fire and kill everyone inside. Do you go in for the recall, or ignore it because you just want your car to "just work?" In the case of the third number in postgresql releases, that's what you're talking about. the updates that have come after the 7.4.5 version, just talking 7.4 series here, have included a few crash and data loss fixes. Rare, but possible. Don't worry about upgrading to 8.x until later, fine, but you should really be upgrading to the latest patch level of 7.4. I fight this same fight at work, by the way. It's hard convincing people that the updates are security / crash / data loss only...
Hi Scott, Scott Marlowe <smarlowe 'at' g2switchworks.com> writes: > On Wed, 2006-03-22 at 02:04, Guillaume Cottenceau wrote: [...] > > Yes, we use 7.4.5 actually, because "it just works", so production > > wants to first deal with all the things that don't work before > > upgrading. I have recently discovered about the background writer > > of 8.x which could be a supplementary reason to push for an > > ugprade though. > > Imagine you get a call from the manufacturer of your car. There's a > problem with the fuel pump, and, in a small percentage of accidents, > your car could catch fire and kill everyone inside. > > Do you go in for the recall, or ignore it because you just want your car > to "just work?" Ah, this holy computer/OS/whatever-to-cars comparison.. How many million electrons would the world save if computer people would abandon it? :) > In the case of the third number in postgresql releases, that's what > you're talking about. the updates that have come after the 7.4.5 > version, just talking 7.4 series here, have included a few crash and > data loss fixes. Rare, but possible. I guess we didn't know that. I for myself have (a bit more) excuses because I'm on the development side :) But I've passed the information to the operation team, thank you. -- Guillaume Cottenceau