Thread: Performance of MIN() and MAX()
PGSQL 6.5.0, FreeBSD 3.2, Intel Pentium II 366MHz, 128 MB The table below was filled with about 1,000,000 records. Then a bunch of indexes was created and a VACUUM executed. I was under impression that when max(<primary key>) is called, it should just take the value from the index. I believe it should not do any kind of scan. But, in fact, it scans the table. select max(id) from ItemBars takes well over 10 seconds to complete. It's almost instantaneous on MSSQL or on Interbase. Something is clearly wrong. MAX() on the primary key should not take so much time. Am I doing something wrong? Is it a known bug? If it's a bug then it's a show stopper for us. How can I help fixing it? CREATE TABLE ItemBars ( ID SERIAL PRIMARY KEY , ItemID INT NOTNULL , Interv INT NOT NULL , StaTS DATETIME NOT NULL , EndTS DATETIME NOT NULL , IsActive BOOL NOT NULL , Opn FLOAT(7) NOT NULL , High FLOAT(7) NOT NULL , Low FLOAT(7) NOT NULL , Cls FLOAT(7) NOT NULL , Vol INT NOT NULL ); Gene Sokolov.
"Gene Sokolov" <hook@aktrad.ru> writes: > I was under impression that when max(<primary key>) is called, it should > just take the value from the index. I believe it should not do any kind of > scan. But, in fact, it scans the table. You are mistaken. Postgres has no idea that min() and max() have any semantics that have anything to do with indexes. I would like to see that optimization myself, but it's not a particularly easy thing to add given the system structure and the emphasis on datatype extensibility. > it's a show stopper for us. You might be able to hack around the issue with queries like SELECT x FROM table ORDER BY x LIMIT 1; SELECT x FROM table ORDER BY x DESC LIMIT 1; to get the min and max respectively. The current 6.6 code will implement these with indexscans, although I think 6.5 would not unless given an additional cue, like a WHERE clause involving x... regards, tom lane
From: Tom Lane <tgl@sss.pgh.pa.us> > > I was under impression that when max(<primary key>) is called, it should > > just take the value from the index. I believe it should not do any kind of > > scan. But, in fact, it scans the table. > > You are mistaken. Postgres has no idea that min() and max() have any > semantics that have anything to do with indexes. I would like to see > that optimization myself, but it's not a particularly easy thing to add > given the system structure and the emphasis on datatype extensibility. > > > it's a show stopper for us. > > You might be able to hack around the issue with queries like > > SELECT x FROM table ORDER BY x LIMIT 1; > SELECT x FROM table ORDER BY x DESC LIMIT 1; It is a real show stopper. No luck completely, the indexes are ignored: ************************************************************* [PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1] bars=> create index bars_id on itemsbars(id); CREATE bars=> explain select id from itemsbars order by id limit 1; NOTICE: QUERY PLAN: Sort (cost=44404.41 rows=969073 width=4) -> Seq Scan on itemsbars (cost=44404.41 rows=969073 width=4) EXPLAIN bars=> \d itemsbars Table = itemsbars +--------------------+----------------------------------+-------+ | Field | Type | Length| +--------------------+----------------------------------+-------+ | id | int4 not null default nextval('" | 4 | | itemid | int4 not null | 4 | | interv | int4 not null | 4 | | stats | datetime not null | 8 | | endts | datetime not null | 8 | | isactive | bool not null | 1 | | opn | float8 not null | 8 | | high | float8 not null | 8 | | low | float8 not null | 8 | | cls | float8 not null | 8 | | vol | int4 not null | 4 | +--------------------+----------------------------------+-------+ Indices: bars_complex2 bars_endts bars_id bars_interv bars_itemid bars_stats itemsbars_pkey
"Gene Sokolov" <hook@aktrad.ru> writes: > From: Tom Lane <tgl@sss.pgh.pa.us> >> You might be able to hack around the issue with queries like >> SELECT x FROM table ORDER BY x LIMIT 1; >> SELECT x FROM table ORDER BY x DESC LIMIT 1; > It is a real show stopper. No luck completely, the indexes are ignored: > bars=> explain select id from itemsbars order by id limit 1; > NOTICE: QUERY PLAN: > Sort (cost=44404.41 rows=969073 width=4) > -> Seq Scan on itemsbars (cost=44404.41 rows=969073 width=4) Yes, you missed my comment that 6.5.* needs some help or it won't consider an index scan at all. This gives the right sort of plan: regression=> explain select id from itemsbars where id > 0 order by id limit 1; NOTICE: QUERY PLAN: Index Scan using itemsbars_id_key on itemsbars (cost=21.67 rows=334 width=4) The WHERE clause can be chosen so that it won't actually exclude anything, but there has to be a WHERE clause that looks useful with an index or an indexscan plan won't even get generated. (Also, the DESC case doesn't work in 6.5.*, unless you apply the backwards- index-scan patch that Hiroshi posted a few weeks ago.) This is fixed in current sources, BTW: regression=> explain select id from itemsbars order by id limit 1; NOTICE: QUERY PLAN: Index Scan using bars_id on itemsbars (cost=62.00 rows=1000 width=4) regression=> explain select id from itemsbars order by id desc ; NOTICE: QUERY PLAN: Index Scan Backward using bars_id on itemsbars (cost=62.00 rows=1000 width=4) although we still need to do some rejiggering of the cost estimation rules; current sources are probably *too* eager to use an indexscan. regards, tom lane
Tom Lane wrote in message <16225.937319132@sss.pgh.pa.us>... >although we still need to do some rejiggering of the cost estimation >rules; current sources are probably *too* eager to use an indexscan. > I did some testing today on a 1.6 million row table of random integers in the range of 0..32767. Using explain I could get a "select max(f1)..." down to a cost of about 30000 using a where clause of "f1 > 0"... After running the queries I achieved the following results (dual P133, w/ 128 megs ram, IDE)... select max(f1) from t1 [68 seconds] [explain cost 60644.00] select max(f1) from t1 where f1 > 0 [148 seconds] [explaincost 30416.67] Knowing my data does have at least one value above 30000 I can apply a better heuristic other than f1 > 0 select max(f1) from t1 where f1 > 30000 [12.43 seconds] [explain cost 30416.67] Until more agg. function optimizations are implemented, programmers might have to use the old melon to speed things up. Damond