Thread: another index question
I have another query that I expected to use an index but is doing a sequential scan: SELECT frm,rcpt,subject FROM mesg_headers WHERE mesgnum IN (SELECT mesgnum FROM mesg_headers ORDER BY mesgnum DESC LIMIT 1); Here's the explain: NOTICE: QUERY PLAN: Seq Scan on mesg_headers (cost=0.00..46866049756.39 rows=374843 width=36) SubPlan -> Materialize (cost=125028.26..125028.26 rows=1 width=4) -> Limit (cost=125028.26..125028.26 rows=1 width=4) -> Sort (cost=125028.26..125028.26 rows=374843 width=4) -> Seq Scan on mesg_headers (cost=0.00..81505.43 rows=374843 width=4) EXPLAIN There's an index on the mesgnum, and both queries independently use the index: email=# EXPLAIN SELECT frm,rcpt,subject FROM mesg_headers WHERE mesgnum IN (5,20); NOTICE: QUERY PLAN: Index Scan using mesg_headers_pkey, mesg_headers_pkey on mesg_headers (cost=0.00..9.98 rows=1 width=36) EXPLAIN email=# EXPLAIN SELECT mesgnum FROM mesg_headers ORDER BY mesgnum DESC LIMIT 1; NOTICE: QUERY PLAN: Limit (cost=0.00..1.47 rows=1 width=4) -> Index Scan Backward using mesg_headers_pkey on mesg_headers (cost=0.00..550028.43 rows=374843 width=4) EXPLAIN So, if both of the queries use the index, why don't they use the index when combined? Thanks. sean -- Sean Harding sharding@dogcow.org |"It's not a habit, it's cool http://www.dogcow.org/sean/ | I feel alive." | --k's Choice
On Sun Apr 01 at 04:13:42 PM, Tom Lane wrote: > Sean Harding <sharding@dogcow.org> writes: > > So, if both of the queries use the index, why don't they use the index when > > combined? > > This looks like a bug. What version are you running, exactly? Sorry. Should have mentioned that. It's 7.1RC1 on linux 2.4.2. sean -- Sean Harding sharding@dogcow.org |"Oh, the bottle has been to me, My http://www.dogcow.org/sean | closest friend, my worst enemy." | --Natalie Merchant
Sean Harding <sharding@dogcow.org> writes: > So, if both of the queries use the index, why don't they use the index when > combined? This looks like a bug. What version are you running, exactly? regards, tom lane
Sean Harding <sharding@dogcow.org> writes: >> This looks like a bug. What version are you running, exactly? > Sorry. Should have mentioned that. It's 7.1RC1 on linux 2.4.2. Hm. Curious. I can't reproduce the problem: regression=# create table mesg_headers (mesgnum int, frm text, regression(# rcpt text, subject text); CREATE regression=# create index mesg_msgnum on mesg_headers(mesgnum); CREATE regression=# explain regression-# SELECT frm,rcpt,subject FROM mesg_headers WHERE mesgnum IN (SELECT mesgnum regression(# FROM mesg_headers ORDER BY mesgnum DESC LIMIT 1); NOTICE: QUERY PLAN: Seq Scan on mesg_headers (cost=0.00..81.50 rows=1000 width=36) SubPlan -> Materialize (cost=0.06..0.06 rows=1 width=4) -> Limit (cost=0.00..0.06 rows=1 width=4) -> Index Scan Backward using mesg_msgnum on mesg_headers (cost =0.00..59.00 rows=1000 width=4) EXPLAIN regression=# Could you send me the full table schema ("pg_dump -s -t mesg_headers dbname" would do nicely) as well as the statistics from select attname,attdispersion,s.* from pg_statistic s, pg_attribute a, pg_class c where starelid = c.oid and attrelid = c.oid and staattnum = attnum and relname = 'mesg_headers'; regards, tom lane
OK, indeed this is a bug --- the planner has a bad interaction for LIMIT clauses used in "IN" subselects. Will fix. You could avoid the bug, and get a better plan at the top level anyway, by writing SELECT frm,rcpt,subject FROM mesg_headers WHERE mesgnum = (SELECT mesgnum FROM mesg_headers ORDER BY mesgnum DESC LIMIT 1); Since you know the subselect is going to produce exactly one tuple, there's no need to use IN. regards, tom lane
Tom Lane wrote: > OK, indeed this is a bug --- the planner has a bad interaction for LIMIT > clauses used in "IN" subselects. Will fix. > > You could avoid the bug, and get a better plan at the top level anyway, > by writing > > SELECT frm,rcpt,subject FROM mesg_headers WHERE mesgnum = (SELECT mesgnum > FROM mesg_headers ORDER BY mesgnum DESC LIMIT 1); > > Since you know the subselect is going to produce exactly one tuple, > there's no need to use IN. I don't know if this is a generic database issue, an SQL one, of pgsql's; but wasn't relational invented so that we shouldn't need to fine tune access paths? More to the point, shouldn't the statistics on tables plus the syntax be enough for the planner to get this (not so complex) query a good plan, even if not optimal? And if needed shouldn't we get away with hints or something like that instead of rewriting? Sorry for the basic questions, I'm trying to learn. No criticism of anyone's work intended, just a general question. -- _ / \ Leandro Guimarães Faria Corsetti Dutra +55 (11) 3040 8913 \ / Amdocs at Bell Canada +1 (514) 786 87 47 X Support Center, São Paulo, Brazil mailto:adbaamd@bell.ca / \ http://terravista.pt./Enseada/1989/ mailto:leandrod@amdocs.com
On Sun Apr 01 at 05:22:34 PM, Tom Lane wrote: > Since you know the subselect is going to produce exactly one tuple, > there's no need to use IN. Thanks Tom. That definitely helps a lot. sean -- Sean Harding sharding@dogcow.org | "Never before and never since, I promise, http://www.dogcow.org/sean/ | will the whole world be warm as this." | -- 10,000 Maniacs
How do I compare a char (bpchar?) to a varchar? I tried using "=", and I got: waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b where b.id = a.icao; ERROR: Unable to identify an operator '=' for types 'varchar' and 'bpchar' You will have to retype this query using an explicit cast Why isn't this working? Can somebody explain how to rephrase this? It doesn't work even if I try to cast the varchar to bpchar: waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b where b.id::char = a.icao; ERROR: Unable to identify an operator '=' for types 'varchar' and 'bpchar' You will have to retype this query using an explicit cast Or the bpchar to varchar: waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b where b.id = a.icao::varchar; ERROR: Unable to identify an operator '=' for types 'varchar' and 'bpchar' You will have to retype this query using an explicit cast Oh, and if I try and cast them both to varchar, the thing freaks out: waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b where b.id::varchar = a.icao::varchar; pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. We have lost the connection to the backend, so further processing is impossible. Terminating. -- Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody "I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones." -- Albert Einstein
"ADBAAMD" <adba.amdocs@bell.ca> writes: > More to the point, shouldn't the statistics on tables plus the syntax be > enough for the planner to get this (not so complex) query a good plan, > even if not optimal? We don't generate very good plans for IN at the moment. This is on the TODO list. regards, tom lane
Paul Tomblin <ptomblin@xcski.com> writes: > Oh, and if I try and cast them both to varchar, the thing freaks out: > waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b > where b.id::varchar = a.icao::varchar; > pqReadData() -- backend closed the channel unexpectedly. > This probably means the backend terminated abnormally > before or while processing the request. What in the world...!? What version are you running? What is the full schema for both tables? ("pg_dump -s -t tablename dbname" is the best way to get the complete schema info.) regards, tom lane
Paul Tomblin <ptomblin@xcski.com> writes: > How do I compare a char (bpchar?) to a varchar? I tried using "=", and I > got: > waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b > where b.id = a.icao; > ERROR: Unable to identify an operator '=' for types 'varchar' and 'bpchar' > You will have to retype this query using an explicit cast > Why isn't this working? char and varchar are not directly comparable because they have different ideas about whether trailing blanks are significant. Postgres makes you cast one or the other so that it knows which set of comparison rules to apply. Typically you probably want to cast the varchar side to char so that trailing blanks are ignored. > It doesn't work even if I try to cast the varchar to bpchar: I believe you are also running into sundry bugs in 6.5's handling of these casts. Should be cleaned up in 7.0.*. regards, tom lane