Thread: Why sequential scan when there's a supporting index?
Hi, As you can see, I "VACUUM VERBOSE ANALYZE" my table, describe the index on the table, then try to find the max() value of the indexed field. However, EXPLAIN still shows that the query wants to sequentially scan the table. Why? TIA, Ron test2=# vacuum verbose analyze t_lane_tx; NOTICE: --Relation t_lane_tx-- NOTICE: Pages 1785858: Changed 0, Empty 0; Tup 33931294: Vac 0, Keep 0, UnUsed 0. Total CPU 77.03s/7.24u sec elapsed 494.00 sec. NOTICE: Analyzing t_lane_tx VACUUM test2=# \d i_lane_tx_tmp Index "i_lane_tx_tmp" Column | Type ---------+------ tx_date | date btree test2=# explain select max(tx_date) from t_lane_tx; NOTICE: QUERY PLAN: Aggregate (cost=2209999.20..2209999.20 rows=1 width=4) -> Seq Scan on t_lane_tx (cost=0.00..2125170.96 rows=33931296 width=4) -- +---------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://ronandheather.dhs.org:81 | | | | "I have created a government of whirled peas..." | | Maharishi Mahesh Yogi, 12-May-2002, | ! CNN, Larry King Live | +---------------------------------------------------------+
Try:
SELECT tx_date FROM t_lane_tx ORDER BY tx_date DESC LIMIT 1;
hth,
- Stuart
> -----Original Message-----
> From: Ron Johnson [mailto:ron.l.johnson@cox.net]
>
> Hi,
>
> As you can see, I "VACUUM VERBOSE ANALYZE" my table, describe
> the index on the table, then try to find the max() value of
> the indexed field. However, EXPLAIN still shows that the
> query wants to sequentially scan the table.
>
> Why?
>
> TIA,
> Ron
>
> test2=# vacuum verbose analyze t_lane_tx;
> NOTICE: --Relation t_lane_tx--
> NOTICE: Pages 1785858: Changed 0, Empty 0; Tup 33931294: Vac
> 0, Keep 0,
> UnUsed 0.
> Total CPU 77.03s/7.24u sec elapsed 494.00 sec.
> NOTICE: Analyzing t_lane_tx
> VACUUM
> test2=# \d i_lane_tx_tmp
> Index "i_lane_tx_tmp"
> Column | Type
> ---------+------
> tx_date | date
> btree
>
> test2=# explain select max(tx_date) from t_lane_tx;
> NOTICE: QUERY PLAN:
>
> Aggregate (cost=2209999.20..2209999.20 rows=1 width=4)
> -> Seq Scan on t_lane_tx (cost=0.00..2125170.96 rows=33931296
> width=4)
>
On Fri, 2002-05-24 at 08:25, Henshall, Stuart - WCP wrote: > Try: > SELECT tx_date FROM t_lane_tx ORDER BY tx_date DESC LIMIT 1; > hth, > - Stuart Thanks, Stuart and John. Is this a bug, or a "feature"? I ask, because in other RDBMS', this is absolutely supported by indexes: SELECT MAX(tx_date) FROM t_lane_tx; > > -----Original Message----- > > From: Ron Johnson [mailto:ron.l.johnson@cox.net] > > > > Hi, > > > > As you can see, I "VACUUM VERBOSE ANALYZE" my table, describe > > the index on the table, then try to find the max() value of > > the indexed field. However, EXPLAIN still shows that the > > query wants to sequentially scan the table. > > > > Why? > > > > TIA, > > Ron > > > > test2=# vacuum verbose analyze t_lane_tx; > > NOTICE: --Relation t_lane_tx-- > > NOTICE: Pages 1785858: Changed 0, Empty 0; Tup 33931294: Vac > > 0, Keep 0, > > UnUsed 0. > > Total CPU 77.03s/7.24u sec elapsed 494.00 sec. > > NOTICE: Analyzing t_lane_tx > > VACUUM > > test2=# \d i_lane_tx_tmp > > Index "i_lane_tx_tmp" > > Column | Type > > ---------+------ > > tx_date | date > > btree > > > > test2=# explain select max(tx_date) from t_lane_tx; > > NOTICE: QUERY PLAN: > > > > Aggregate (cost=2209999.20..2209999.20 rows=1 width=4) > > -> Seq Scan on t_lane_tx (cost=0.00..2125170.96 rows=33931296 > > width=4) > > -- +---------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://ronandheather.dhs.org:81 | | | | "I have created a government of whirled peas..." | | Maharishi Mahesh Yogi, 12-May-2002, | ! CNN, Larry King Live | +---------------------------------------------------------+
Ron Johnson <ron.l.johnson@cox.net> writes: > test2=# explain select max(tx_date) from t_lane_tx; > NOTICE: QUERY PLAN: > Aggregate (cost=2209999.20..2209999.20 rows=1 width=4) > -> Seq Scan on t_lane_tx (cost=0.00..2125170.96 rows=33931296 > width=4) I'm beginning to think this should be in the FAQ ... Try it like this: regression=# explain select unique1 from tenk1 order by unique1 desc limit 1; Limit (cost=0.00..0.11 rows=1 width=4) -> Index Scan Backward using tenk1_unique1 on tenk1 (cost=0.00..1071.99 rows=10000 width=4) Although the LIMIT clause isn't standard, this approach is attractive compared to max() because you can fetch any or all values in the row containing the maximal element, which is a very useful thing. Also, the approach scales to situations where you want to sort by multiple columns. Improving the handling of max() has been on the TODO list for awhile, but most of the hacker community considers it low priority because of the availability of the above workaround. Also, Postgres has a very generalized black-box approach to aggregate functions, so no one's been able to think of a reasonably clean way to teach the planner that some aggregates are connected to index sort ordering. regards, tom lane
On Fri, 2002-05-24 at 08:54, Tom Lane wrote: > Ron Johnson <ron.l.johnson@cox.net> writes: [snip] > Improving the handling of max() has been on the TODO list for awhile, > but most of the hacker community considers it low priority because of > the availability of the above workaround. Also, Postgres has a very > generalized black-box approach to aggregate functions, so no one's been > able to think of a reasonably clean way to teach the planner that some > aggregates are connected to index sort ordering. How do I vote on increasing the priority of "fixing max()"? SELECT MAX(FOO) FROM BAR; SELECT tx_date, COUNT(*) FROM t_lane_tx GROUP BY tx_date; These are awfully common statements on proprietary RDBMSs, and it confuses the _heck_ out of someone migrating to Postgres. Btw, "SELECT tx_date, COUNT(*) FROM t_lane_tx GROUP BY tx_date;" also does a Seq Scan on t_lane_tx. What's the best work-around for this query? -- +---------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://ronandheather.dhs.org:81 | | | | "I have created a government of whirled peas..." | | Maharishi Mahesh Yogi, 12-May-2002, | ! CNN, Larry King Live | +---------------------------------------------------------+
On Sat, 2002-05-25 at 02:25, Ron Johnson wrote: > > Btw, "SELECT tx_date, COUNT(*) FROM t_lane_tx GROUP BY tx_date;" > also does a Seq Scan on t_lane_tx. What's the best work-around > for this query? There is no work around for this one. In some circumstances the indexes in a PostgreSQL database will contain 'dirty' information, and so to get the correct answer in these cases PostgreSQL has to go to the real table. For my personal view I'm OK with the current behaviour. It has tradeoffs, and this is one of the negatives, but although I find myself doing this interactively quite often I only very rarely find myself doing it inside an application. Regards, Andrew. -- -------------------------------------------------------------------- Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Are you enrolled at http://schoolreunions.co.nz/ yet?
On Sat, 2002-05-25 at 03:52, Andrew McMillan wrote: > On Sat, 2002-05-25 at 02:25, Ron Johnson wrote: > > > > Btw, "SELECT tx_date, COUNT(*) FROM t_lane_tx GROUP BY tx_date;" > > also does a Seq Scan on t_lane_tx. What's the best work-around > > for this query? > > There is no work around for this one. In some circumstances the indexes > in a PostgreSQL database will contain 'dirty' information, and so to get > the correct answer in these cases PostgreSQL has to go to the real > table. "Dirty information"? Is this a consequence of READ COMMITTED transactions? > For my personal view I'm OK with the current behaviour. It has > tradeoffs, and this is one of the negatives, but although I find myself > doing this interactively quite often I only very rarely find myself > doing it inside an application. IMO, when a "proprietary" DBA (like me) hears that that statement does table scans, s/he will be stunned, and wonder what other "gotchas" are lurking out there awaiting someone who wants to query enterprise-sized tables. The main reason that I am researching Postgres (a _real_ database) is to see whether we can move historical data off the proprietary system, and on to something less expensive that people can run ad-hoc queries against... -- +---------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://ronandheather.dhs.org:81 | | | | "I have created a government of whirled peas..." | | Maharishi Mahesh Yogi, 12-May-2002, | ! CNN, Larry King Live | +---------------------------------------------------------+
On 25 May 2002, Ron Johnson wrote: > On Sat, 2002-05-25 at 03:52, Andrew McMillan wrote: > > On Sat, 2002-05-25 at 02:25, Ron Johnson wrote: > > For my personal view I'm OK with the current behaviour. It has > > tradeoffs, and this is one of the negatives, but although I find myself > > doing this interactively quite often I only very rarely find myself > > doing it inside an application. > > IMO, when a "proprietary" DBA (like me) hears that that statement > does table scans, s/he will be stunned, and wonder what other > "gotchas" are lurking out there awaiting someone who wants to > query enterprise-sized tables. I was wondering the same thing. I read the html manual, O'Reilly's Practical PostgreSQL, Sam's PostgreSQL Developer's Handbook and have browsed http://techdocs.postgresql.org. I also lurk on this list and ocasionally ask and answer questions. Where in all that information am I supposed to pick up that 'SELECT max(foo) FROM quux' should be written as 'SELECT foo FROM quux ORDER BY foo LIMIT 1'. That's really unintuitive and if this info isn't in the docs I've read so far, I'm not sure where it is. Am I supposed to read the source for comments as well? So mostly I just want to know where these suggested work arounds are documented. Is there a particular set of keywords that work for searching the hacker's list? I'm at a loss as to where to get this (obviously) valuable info. Pointers would be appreciated, Joshua b. Jore ; http://www.greentechnologist.org ; 10012 11010 11022 10202 1012 2122 11020 10202 10202 11002 1020 1012 11102 11102 11102 1201 11001 11002 10211 11020 10202 10202 11002 11021 1201 11010 11020 10211
Sorry for being whiney. I'd still like to know where the gotchas are if someone knows that. Joshua b. Jore ; http://www.greentechnologist.org ; 10012 11010 11022 10202 1012 2122 11020 10202 10202 11002 1020 1012 11102 11102 11102 1201 11001 11002 10211 11020 10202 10202 11002 11021 1201 11010 11020 10211 ---------- Forwarded message ---------- Date: Sat, 25 May 2002 11:57:39 -0500 (CDT) From: Joshua b. Jore <josh@greentechnologist.org> Cc: PgSQL Novice ML <pgsql-novice@postgresql.org> Subject: Re: [NOVICE] Why sequential scan when there's a supporting index? On 25 May 2002, Ron Johnson wrote: > On Sat, 2002-05-25 at 03:52, Andrew McMillan wrote: > > On Sat, 2002-05-25 at 02:25, Ron Johnson wrote: > > For my personal view I'm OK with the current behaviour. It has > > tradeoffs, and this is one of the negatives, but although I find myself > > doing this interactively quite often I only very rarely find myself > > doing it inside an application. > > IMO, when a "proprietary" DBA (like me) hears that that statement > does table scans, s/he will be stunned, and wonder what other > "gotchas" are lurking out there awaiting someone who wants to > query enterprise-sized tables. I was wondering the same thing. I read the html manual, O'Reilly's Practical PostgreSQL, Sam's PostgreSQL Developer's Handbook and have browsed http://techdocs.postgresql.org. I also lurk on this list and ocasionally ask and answer questions. Where in all that information am I supposed to pick up that 'SELECT max(foo) FROM quux' should be written as 'SELECT foo FROM quux ORDER BY foo LIMIT 1'. That's really unintuitive and if this info isn't in the docs I've read so far, I'm not sure where it is. Am I supposed to read the source for comments as well? So mostly I just want to know where these suggested work arounds are documented. Is there a particular set of keywords that work for searching the hacker's list? I'm at a loss as to where to get this (obviously) valuable info. Pointers would be appreciated,