Thread: cpu_tuple_cost
Hi List, i have a query plan who is bad with standard cpu_tuple_costs and good if I raise cpu_tuple_costs. Is it is a good practice to raise them if i want to force postgres to use indexes more often? Or is it is better to disable sequence scans? CIMSOFT=# ANALYSE mitpln; ANALYZE CIMSOFT=# EXPLAIN ANALYSE SELECT * FROM mitpln WHERE date_to_yearmonth_dec(mpl_date)='20050'; QUERY PLAN -------------------------------------------------------------------------------- Seq Scan on mitpln (cost=0.00..1411.85 rows=2050 width=69) (actual time=562.000..1203.000 rows=1269 loops=1) Filter: ((date_to_yearmonth_dec((mpl_date)::timestamp without time zone))::text = '20050'::text) Total runtime: 1203.000 ms (3 rows) CIMSOFT=# SET cpu_tuple_cost = 0.07; SET CIMSOFT=# EXPLAIN ANALYSE SELECT * FROM mitpln WHERE date_to_yearmonth_dec(mpl_date)='20050'; QUERY PLAN -------------------------------------------------------------------------------- Index Scan using mitpln_yearmonth_dec on mitpln (cost=0.00..2962.86 rows=2050width=69) (actual time=0.000..0.000 rows=1269 loops=1) Index Cond: ((date_to_yearmonth_dec((mpl_date)::timestamp without time zone))::text = '20050'::text) Total runtime: 16.000 ms (3 rows) CIMSOFT=# \d mitpln Table "public.mitpln" Column | Type | Modifiers --------------+-----------------------+----------------------------------------- mpl_id | integer | not null default nextval('public.mitpln_mpl_id_seq'::text) mpl_date | date | mpl_minr | integer | not null mpl_tpl_name | character varying(20) | mpl_feiertag | character varying(50) | mpl_min | real | mpl_saldo | real | mpl_buch | boolean | not null default false mpl_absaldo | real | mpl_vhz | real | dbrid | character varying | default nextval('db_id_seq'::text) Indexes: "mitpln_pkey" PRIMARY KEY, btree (mpl_id) "mitpln_idindex" UNIQUE, btree (dbrid) "xtt5126" UNIQUE, btree (mpl_date, mpl_minr) "mitpln_yearmonth_dec" btree (date_to_yearmonth_dec(mpl_date::timestamp with out time zone)) CIMSOFT=# SELECT count(*) FROM mitpln; count ------- 26128 (1 row)
Daniel Schuchardt <daniel_schuchardt@web.de> writes: > i have a query plan who is bad with standard cpu_tuple_costs and good if > I raise cpu_tuple_costs. Is it is a good practice to raise them if i > want to force postgres to use indexes more often? Reducing random_page_cost is usually the best way to get the planner to favor indexscans more. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Reducing random_page_cost is usually the best way to get the > planner to favor indexscans more. On that note, can I raise the idea again of dropping the default value for random_page_cost in postgresql.conf? I think 4 is too conservative in this day and age. Certainly the person who will be negatively impacted by a default drop of 4 to 3 will be the exception and not the rule. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200503140702 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFCNX2avJuQZxSWSsgRAk7QAJ4lye7pEcQIWMRV2fs15bHGY2zBbACeJtLC E/vUG/lagjcyWPt9gfngsn0= =CKIq -----END PGP SIGNATURE-----
Greg Sabino Mullane wrote: [ There is text before PGP section. ] > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > Reducing random_page_cost is usually the best way to get the > > planner to favor indexscans more. > > On that note, can I raise the idea again of dropping the default > value for random_page_cost in postgresql.conf? I think 4 is too > conservative in this day and age. Certainly the person who will > be negatively impacted by a default drop of 4 to 3 will be the > exception and not the rule. Agreed. I think we should reduce it at least to 3. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
"Greg Sabino Mullane" <greg@turnstep.com> writes: > On that note, can I raise the idea again of dropping the default > value for random_page_cost in postgresql.conf? I think 4 is too > conservative in this day and age. Certainly the person who will > be negatively impacted by a default drop of 4 to 3 will be the > exception and not the rule. The ones who'd be negatively impacted are the ones we haven't been hearing from ;-). To assume that they aren't out there is a logical fallacy. I still think that 4 is about right for large databases (where "large" is in comparison to available RAM). Also, to the extent that we think these numbers mean anything at all, we should try to keep them matching the physical parameters we think they represent. I think that the "reduce random_page_cost" mantra is not an indication that that parameter is wrong, but that the cost models it feeds into need more work. One thing we *know* is wrong is the costing of nestloop inner indexscans: there needs to be a correction for caching of index blocks across repeated scans. I've looked at this a few times but not come up with anything that seemed convincing. Another thing I've wondered about more than once is if we shouldn't discount fetching of higher-level btree pages on the grounds that they're probably in RAM already, even if the indexscan isn't inside a loop. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> wrote: > Agreed. I think we should reduce it at least to 3. Note that changing it from 4 to 3 or even 2 is unlikely to really change much. Many of the plans people complain about turn out to have critical points closer to 1.2 or 1.1. The only reason things work out better with such low values is because people have data sets that fit more or less entirely in RAM. So values close to 1 or even equal to 1 actually represent the reality. The "this day and age" argument isn't very convincing. Hard drive capacity growth has far outstripped hard drive seek time and bandwidth improvements. Random access has more penalty than ever. -- greg
Greg, > On that note, can I raise the idea again of dropping the default > value for random_page_cost in postgresql.conf? I think 4 is too > conservative in this day and age. Certainly the person who will > be negatively impacted by a default drop of 4 to 3 will be the > exception and not the rule. I don't agree. The defaults are there for people who aren't going to read enough of the documentation to set them. As such, conservative for the defaults is appropriate. If we were going to change anything automatically, it would be to set effective_cache_size to 1/3 of RAM at initdb time. However, I don't know any method to determine RAM size that works on all the platforms we support. Tom, > Also, to the extent that we think these numbers mean anything at all, > we should try to keep them matching the physical parameters we think > they represent. Personally, what I would love to see is the system determining and caching some of these parameters automatically. For example, in a database which has been running in production for a couple of days, it should be possible to determine the ratio of average random seek tuple cost to average seq scan tuple cost. Other parameters should really work the same way. Effective_cache_size, for example, is a blunt instrument to replace what the database should ideally do through automated interactive fine tuning. Particularly since we have 2 separate caches (or 3, if you count t1 and t2 from 2Q). What the planner really needs to know is: is this table or index already in the t1 or t2 cache (can't we determine this?)? How likely is it to be in the filesystem cache? The latter question is not just one of size (table < memory), but one of frequency of access. Of course, this stuff is really, really hard which is why we rely on the GUCs ... -- Josh Berkus Aglio Database Solutions San Francisco
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Josh Berkus wrote: > I don't agree. The defaults are there for people who aren't going to read > enough of the documentation to set them. As such, conservative for the > defaults is appropriate. Sure, but I would argue that 4 is *too* conservative. We've certainly changed other values over the years. I see it as those most affected by this change are those who are least likely to have the know-how to change the default, and are also the majority of our users. I've often had to reduce this, working on many databases, on many versions of PostgreSQL. Granted, I don't work on any huge, complex, hundreds of gig databases, but that supports my point - if you are really better off with a /higher/ (than 3) random_page_cost, you already should be tweaking a lot of stuff yourself anyway. Tom Lane has a good point about tweaking other default parameters as well, and that's a worthy goal, but I don't think extended searching for a "sweet spot" should prevent us from making a small yet important (IMO!) change in the default of this one variable. N.B. My own personal starting default is 2, but I thought 3 was a nice middle ground more likely to reach consensus here. :) - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200503141727 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFCNhFAvJuQZxSWSsgRAgZiAJ9947emxFoMMXKooJHi2ZPIQr9xGACgjaFf hBCPTuHZwGFzomf1Z1TDpVo= =KX9t -----END PGP SIGNATURE-----
On Mar 15, 2005, at 6:35 AM, Greg Sabino Mullane wrote: > Granted, I don't work on > any huge, complex, hundreds of gig databases, but that supports my > point - > if you are really better off with a /higher/ (than 3) > random_page_cost, you > already should be tweaking a lot of stuff yourself anyway. I think this is a good point. The people that tend to benefit from the lower cost are precisely the people least likely to know to change it. It's the "install & go" crowd with smaller databases and only a few users/low concurrency that expect it to "just work". The bigger installations are more like to have dedicated DB admins that understand tuning. Wasn't there an idea on the table once to ship with several different configuration files with different defaults for small, medium, large, etc. installs? Wouldn't it make sense to ask the user during initdb to pick from one of the default config files? Or even have a few simple questions like "How much memory do you expect to be available to PostgreSQL?" and "How many concurrent users do you expect to have?". It's one thing to know how much memory is in a machine, it quite another thing to know how much the user wants dedicated to PostgreSQL. A couple of questions like that can go a long way to coming up with better ballpark figures. -- Jeff Hoffmann jeff@propertykey.com
"Greg Sabino Mullane" <greg@turnstep.com> writes: > N.B. My own personal starting default is 2, but I thought 3 was a nice > middle ground more likely to reach consensus here. :) Your argument seems to be "this produces nice results for me", not "I have done experiments to measure the actual value of the parameter and it is X". I *have* done experiments of that sort, which is where the default of 4 came from. I remain of the opinion that reducing random_page_cost is a band-aid that compensates (but only partially) for problems elsewhere. We can see that it's not a real fix from the not-infrequent report that people have to reduce random_page_cost below 1.0 to get results anywhere near local reality. That doesn't say that the parameter value is wrong, it says that the model it's feeding into is wrong. regards, tom lane
Tom Lane wrote: > "Greg Sabino Mullane" <greg@turnstep.com> writes: > >>N.B. My own personal starting default is 2, but I thought 3 was a nice >>middle ground more likely to reach consensus here. :) > > > Your argument seems to be "this produces nice results for me", not > "I have done experiments to measure the actual value of the parameter > and it is X". I *have* done experiments of that sort, which is where > the default of 4 came from. I remain of the opinion that reducing > random_page_cost is a band-aid that compensates (but only partially) > for problems elsewhere. We can see that it's not a real fix from > the not-infrequent report that people have to reduce random_page_cost > below 1.0 to get results anywhere near local reality. That doesn't say > that the parameter value is wrong, it says that the model it's feeding > into is wrong. > I would like to second that. A while back I performed a number of experiments on differing hardware and came to the conclusion that *real* random_page_cost was often higher than 4 (like 10-15 for multi-disk raid systems). However I have frequently adjusted Pg's random_page_cost to be less than 4 - if it helped queries perform better. So yes, it looks like the model is the issue - not the value of the parameter! regards Mark
Gregory Stark wrote: >The "this day and age" argument isn't very convincing. Hard drive capacity >growth has far outstripped hard drive seek time and bandwidth improvements. >Random access has more penalty than ever. > > In point of fact, there haven't been noticeable seek time improvements for years. Transfer rates, on the other hand, have gone through the roof. Which is why I would question the published tuning advice that recommends lowering it to 2 for arrays. Arrays increase the effective transfer rate more than they reduce random access times. Dropping from 4 to 2 would reflect going from a typical single 7200rpm ATA drive to a 15000rpm SCSI drive, but striping will move it back up again - probably even higher than 4 with a big array (at a guess, perhaps the relationship might be approximated as a square root after allowing for the array type?). With default settings, I've seen the planner pick the wrong index unless random_page_cost was set to 2. But in testing on an ATA drive, I achieved slightly better plan costings by increasing cpu_tuple_cost (relative to cpu_index_tuple_cost - by default it's only a factor of 10) and actually *raising* random_page_cost to 5! So why pick on one parameter? It's all going to vary according to the query and the data. I agree with Tom 100%. Pulling levers on a wonky model is no solution.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Your argument seems to be "this produces nice results for me", not > "I have done experiments to measure the actual value of the parameter > and it is X". I *have* done experiments of that sort, which is where > the default of 4 came from. I remain of the opinion that reducing > random_page_cost is a band-aid that compensates (but only partially) > for problems elsewhere. We can see that it's not a real fix from > the not-infrequent report that people have to reduce random_page_cost > below 1.0 to get results anywhere near local reality. That doesn't say > that the parameter value is wrong, it says that the model it's feeding > into is wrong. Good points: allow me to rephrase my question then: When I install a new version of PostgreSQL and start testing my applications, one of the most common problems is that many of my queries are not hitting an index. I typically drop random_page_cost to 2 or lower and this speeds things very significantly. How can I determine a better way to speed up my queries, and why would this be advantageous over simply dropping random_page_cost? How can I use my particular situation to help develop a better model and perhaps make the defaults work better for my queries and other people with databaes like mine. (fairly simple schema, not too large (~2 Gig total), SCSI, medium to high complexity queries, good amount of RAM available)? - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200503150600 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFCNsCbvJuQZxSWSsgRAs0sAJwLFsGApzfYNV5jPL0gGVW5BH37hwCfRSW8 ed3sLnMg1UOTgN3oL9JSIFo= =cZIe -----END PGP SIGNATURE-----
David Brown <time@bigpond.net.au> writes: > Gregory Stark wrote: > > >The "this day and age" argument isn't very convincing. Hard drive capacity > >growth has far outstripped hard drive seek time and bandwidth improvements. > >Random access has more penalty than ever. > > In point of fact, there haven't been noticeable seek time improvements for > years. Transfer rates, on the other hand, have gone through the roof. Er, yeah. I stated it wrong. The real ratio here is between seek time and throughput. Typical 7200RPM drives have average seek times are in the area of 10ms. Typical sustained transfer rates are in the range of 40Mb/s. Postgres reads 8kB blocks at a time. So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's a factor of 49. I don't think anyone wants random_page_cost to be set to 50 though. For a high end 15k drive I see average seek times get as low as 3ms. And sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for random access reads or about a random_page_cost of 37. Still pretty extreme. So what's going on with the empirically derived value of 4? Perhaps this is because even though Postgres is reading an entire table sequentially it's unlikely to be the only I/O consumer? The sequential reads would be interleaved occasionally by some other I/O forcing a seek to continue. In which case the true random_page_cost seems like it would be extremely sensitive to the amount of readahead the OS does. To reach a random_page_cost of 4 given the numbers above for a 7200RPM drive requires that just under 25% of the I/O of a sequential table scan be random seeks [*]. That translates to 32kB of sequential reading, which actually does sound like a typical value for OS readahead. I wonder if those same empirical tests would show even higher values of random_page_cost if the readahead were turned up to 64kB or 128kB. [*] A bit of an algebraic diversion: 1s/10ms = 100 random buffers/s. random_page_cost = 4 so net sequential buffers/s = 400. solve: 400 buffers = rnd+seq 1000ms = .2*seq + 10*rnd -- greg
> > >The "this day and age" argument isn't very convincing. Hard drive > > >capacity growth has far outstripped hard drive seek time > and bandwidth improvements. > > >Random access has more penalty than ever. > > > > In point of fact, there haven't been noticeable seek time > improvements > > for years. Transfer rates, on the other hand, have gone > through the roof. > > Er, yeah. I stated it wrong. The real ratio here is between > seek time and throughput. > > Typical 7200RPM drives have average seek times are in the > area of 10ms. > Typical sustained transfer rates are in the range of 40Mb/s. > Postgres reads 8kB blocks at a time. > > So 800kB/s for random access reads. And 40Mb/s for sequential > reads. That's a factor of 49. I don't think anyone wants > random_page_cost to be set to 50 though. > > For a high end 15k drive I see average seek times get as low > as 3ms. And sustained transfer rates get as high as 100Mb/s. > So about 2.7Mb/s for random access reads or about a > random_page_cost of 37. Still pretty extreme. > > So what's going on with the empirically derived value of 4? > Perhaps this is because even though Postgres is reading an > entire table sequentially it's unlikely to be the only I/O > consumer? The sequential reads would be interleaved > occasionally by some other I/O forcing a seek to continue. What about the cache memory on the disk? Even IDE disks have some 8Mb cache today, which makes a lot of difference for fairly short scans. Even if it's just read cache. That'll bring the speed of random access down to a 1=1 relationship with sequential access, assuming all fits in the cache. //Magnus
"Magnus Hagander" <mha@sollentuna.net> writes: > What about the cache memory on the disk? Even IDE disks have some 8Mb > cache today, which makes a lot of difference for fairly short scans. > Even if it's just read cache. That'll bring the speed of random access > down to a 1=1 relationship with sequential access, assuming all fits in > the cache. 8MB cache is really insignificant compared to the hundreds or thousands of megabytes the OS would be using to cache. You could just add the 8MB to your effective_cache_size (except it's not really 100% effective since it would contain some of the same blocks as the OS cache). -- greg
Greg, > So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's > a factor of 49. I don't think anyone wants random_page_cost to be set to 50 > though. > > For a high end 15k drive I see average seek times get as low as 3ms. And > sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for > random access reads or about a random_page_cost of 37. Still pretty > extreme. Actually, what you're demonstrating here is that there's really no point in having a random_page_cost GUC, since the seek/scan ratio is going to be high regardless. Although I can point out that you left out the fact that the disk needs to do a seek to find the beginning of the seq scan area, and even then some file fragmentation is possible. Finally, I've never seen PostgreSQL manage more than 70% of the maximum read rate, and in most cases more like 30%. > So what's going on with the empirically derived value of 4? It's not empirically derived; it's a value we plug into an internal-to-postgresql formula. And "4" is a fairly conservative value that works for a lot of systems. Realistically, the values we should be deriving from are: -- median file cache size for postgresql files -- average disk read throughput -- effective processor calculation throughput -- median I/O contention However, working those 4 hardware "facts" into forumulas that allow us to calculate the actual cost of a query execution plan is somebody's PhD paper. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Although I can point out that you left out the fact that the disk needs to do > a seek to find the beginning of the seq scan area, and even then some file > fragmentation is possible. Finally, I've never seen PostgreSQL manage more > than 70% of the maximum read rate, and in most cases more like 30%. Hm. I just did a quick test. It wasn't really long enough to get a good estimate, but it seemed to reach about 30MB/s on this drive that's only capable of 40-50MB/s depending on the location on the platters. That's true though, some of my calculated 25% random seeks could be caused by fragmentation. But it seems like that would be a small part. > > So what's going on with the empirically derived value of 4? > > It's not empirically derived; it's a value we plug into an > internal-to-postgresql formula. I thought Tom said he got the value by doing empirical tests. -- greg
Josh Berkus <josh@agliodbs.com> writes: >> So what's going on with the empirically derived value of 4? > It's not empirically derived; Yes it is. I ran experiments back in the late 90s to derive it. Check the archives. Disks have gotten noticeably bigger since then, but I don't think the ratio of seek time to rotation rate has changed much. regards, tom lane
On Mon, 14 Mar 2005 21:23:29 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think that the "reduce random_page_cost" mantra >is not an indication that that parameter is wrong, but that the >cost models it feeds into need more work. One of these areas is the cost interpolation depending on correlation. This has been discussed on -hackes in October 2002 and August 2003 ("Correlation in cost_index()"). My Postgres installations contain the patch presented during that discussion (and another index correlation patch), and I use *higher* values for random_page_cost (up to 10). Servus Manfred
Tom, > Yes it is. I ran experiments back in the late 90s to derive it. > Check the archives. Hmmmm ... which list? -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: >> Yes it is. I ran experiments back in the late 90s to derive it. >> Check the archives. > Hmmmm ... which list? -hackers, no doubt. -performance didn't exist then. regards, tom lane
On Thu, Mar 17, 2005 at 09:54:29AM -0800, Josh Berkus wrote: > > > Yes it is. I ran experiments back in the late 90s to derive it. > > Check the archives. > > Hmmmm ... which list? These look like relevant threads: http://archives.postgresql.org/pgsql-hackers/2000-01/msg00910.php http://archives.postgresql.org/pgsql-hackers/2000-02/msg00215.php -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Tom Lane wrote: > > Reducing random_page_cost is usually the best way to get the planner to > favor indexscans more. > Ok, I tried a bit with random_page_cost and I have set it to 1 to become PG using the index on mitpln: CIMSOFT=# ANALYSE mitpln; ANALYZE CIMSOFT=# SET random_page_cost=2; SET CIMSOFT=# EXPLAIN ANALYSE SELECT * FROM mitpln WHERE date_to_yearmonth_dec(mpl_date)='20050'; QUERY PLAN -------------------------------------------------------------------------------- Seq Scan on mitpln (cost=0.00..1173.78 rows=1431 width=69) (actual time=219.000..1125.000 rows=1266 loops=1) Filter: ((date_to_yearmonth_dec((mpl_date)::timestamp without time zone))::text = '20050'::text) Total runtime: 1125.000 ms (3 rows) CIMSOFT=# SET random_page_cost=1; SET CIMSOFT=# EXPLAIN ANALYSE SELECT * FROM mitpln WHERE date_to_yearmonth_dec(mpl_date)='20050'; QUERY PLAN -------------------------------------------------------------------------------- Index Scan using mitpln_yearmonth_dec on mitpln (cost=0.00..699.01 rows=1431 width=69) (actual time=0.000..16.000 rows=1266 loops=1) Index Cond: ((date_to_yearmonth_dec((mpl_date)::timestamp without time zone))::text = '20050'::text) Total runtime: 16.000 ms (3 rows) CIMSOFT=# \d mitpln Table "public.mitpln" Column | Type | Modifiers --------------+-----------------------+----------------------------------------- mpl_id | integer | not null default nextval('public.mitpln_mpl_id_seq'::text) mpl_date | date | mpl_minr | integer | not null mpl_tpl_name | character varying(20) | mpl_feiertag | character varying(50) | mpl_min | real | mpl_saldo | real | mpl_buch | boolean | not null default false mpl_absaldo | real | mpl_vhz | real | dbrid | character varying | default nextval('db_id_seq'::text) Indexes: "mitpln_pkey" PRIMARY KEY, btree (mpl_id) "mitpln_idindex" UNIQUE, btree (dbrid) "xtt5126" UNIQUE, btree (mpl_date, mpl_minr) "mitpln_yearmonth_dec" btree (date_to_yearmonth_dec(mpl_date::timestamp with out time zone)) "mpl_minr" btree (mpl_minr) "mpl_minr_nobuch" btree (mpl_minr) WHERE NOT mpl_buch CIMSOFT=# SELECT count(*) FROM mitpln; count ------- 26330 (1 row) CREATE OR REPLACE FUNCTION date_to_yearmonth_dec(TIMESTAMP) RETURNS VARCHAR AS' BEGIN RETURN extract(year FROM $1) || extract(month FROM $1)-1; END'LANGUAGE plpgsql IMMUTABLE; Daniel PS : thats a 2.4 GHZ P4 Server with 1 GB Ram and RAID - SCSI (WIN2000, PG8.0.1)