Thread: constant scalar subselect no longer equivalent to constant?
constant scalar subselect no longer equivalent to constant?
From
merlyn@stonehenge.com (Randal L. Schwartz)
Date:
I upgraded from 7.2.1 to 7.3.2 over the past weekend. One of my favorite queries went from an indexed scan to a sequential scan. Ugh. Here's the details... 200,000+ records, indexed on "stamp" (a timestamp). My query was: select count(*), otherthing from requests where stamp > (select now() - '1 hour' :: interval) group by 2 order by 1 desc limit 10; In 7.2.1, the subselect was treated as a constant, and so the btree index was used to reduce the scan significantly. I would get the result within a few seconds. In 7.3.2, I get a sequential scan! Ugh! It takes about 15 seconds. However, if I replace the subselect with a constant timestamp ('2003-03-10'), I get an index scan, so it isn't that the analyze hasn't recognized the number of records. Did I break something during the upgrade? Is there anything I can do to get the index scan back? Is this a known change between 7.2.1 and 7.3.2? -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Taking a complete shot in the dark here, but I did read that in the upgrade from 7.2.x -- 7.3 the type timestamp was changed to default from timestamp with time zone to timestamp without time zone to be more adhesive to SQL standards.... and now() is of type timestamp with time zone, and if you only declared your field of type timestamp it will be timestamp without time zone, therefore the database may be converting the value each time therefore causing a speed reduction? The fast that constant timestamp works seems to further prove my theory since then you have timestamp without time zone. I'd be interested to know if this is the solution :) Josh. On March 14, 2003 02:47 am, Randal L. Schwartz wrote: > I upgraded from 7.2.1 to 7.3.2 over the past weekend. One of my > favorite queries went from an indexed scan to a sequential scan. Ugh. > > Here's the details... 200,000+ records, indexed on "stamp" (a timestamp). > My query was: > > select count(*), otherthing from requests > where stamp > (select now() - '1 hour' :: interval) > group by 2 > order by 1 desc > limit 10; > > In 7.2.1, the subselect was treated as a constant, and so the btree > index was used to reduce the scan significantly. I would get the > result within a few seconds. > > In 7.3.2, I get a sequential scan! Ugh! It takes about 15 seconds. > > However, if I replace the subselect with a constant timestamp > ('2003-03-10'), I get an index scan, so it isn't that the analyze > hasn't recognized the number of records. > > Did I break something during the upgrade? > > Is there anything I can do to get the index scan back? > > Is this a known change between 7.2.1 and 7.3.2?
merlyn@stonehenge.com (Randal L. Schwartz) writes: > I upgraded from 7.2.1 to 7.3.2 over the past weekend. One of my > favorite queries went from an indexed scan to a sequential scan. Ugh. > Here's the details... 200,000+ records, indexed on "stamp" (a timestamp). > My query was: > select count(*), otherthing from requests > where stamp > (select now() - '1 hour' :: interval) > group by 2 > order by 1 desc > limit 10; FWIW, you don't need a subselect here anymore; it should work as well (or as poorly) to do "where stamp > (now() - '1 hour' :: interval)". > In 7.3.2, I get a sequential scan! Ugh! It takes about 15 seconds. Can you force an indexscan by doing "set enable_seqscan to off"? If not, the problem is likely a datatype mismatch as Josh suggested. If you can, then the problem is that the planner doesn't think this is selective enough to justify an indexscan. (I'm kinda surprised that 7.2 wouldn't have thought the same.) The difficulty is that since now() isn't a constant, the planner doesn't know what value the stamp column will get compared to, and so it has to fall back on a default selectivity estimate that will not be favorable to an indexscan. If that's your problem, the answer is to add a dummy condition to turn the query into a range scan. This should work: where stamp > (now() - '1 hour'::interval) and stamp < (now() + '1 hour'::interval); The planner still doesn't know the selectivity of the now() conditions, but its default estimate for a range query is lots tighter than for a one-sided inequality. It should be willing to indexscan this way. regards, tom lane
Re: constant scalar subselect no longer equivalent to constant?
From
merlyn@stonehenge.com (Randal L. Schwartz)
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> FWIW, you don't need a subselect here anymore; it should work as well Tom> (or as poorly) to do "where stamp > (now() - '1 hour' :: interval)". Ahh. In 7.2, that was seen as a "calculate a lot" expression, so I ended up with a seq scan. The subselect kludge was enough for it to compute it once, and figure out that it could use the index. >> In 7.3.2, I get a sequential scan! Ugh! It takes about 15 seconds. Tom> Can you force an indexscan by doing "set enable_seqscan to off"? Yes. That does it, but of course, I'd rather not change global status thingies that way. :) Tom> If not, the problem is likely a datatype mismatch as Josh suggested. Tom> If you can, then the problem is that the planner doesn't think this is Tom> selective enough to justify an indexscan. (I'm kinda surprised that Tom> 7.2 wouldn't have thought the same.) The difficulty is that since Tom> now() isn't a constant, the planner doesn't know what value the stamp Tom> column will get compared to, and so it has to fall back on a default Tom> selectivity estimate that will not be favorable to an indexscan. So this changed between 7.2 and 7.3? What's odd is that even writing a function didn't help: add function ago(interval) returns timestamp with time zone stable language 'sql' as 'select now() - $1'; I thought the addition of the "stable" keyword would make the return value be the same as a constant. It's not enough, apparently. Is that a bug, that a stable function and a constant are planned differently? Tom> If that's your problem, the answer is to add a dummy condition to turn Tom> the query into a range scan. This should work: Tom> where stamp > (now() - '1 hour'::interval) Tom> and stamp < (now() + '1 hour'::interval); Tom> The planner still doesn't know the selectivity of the now() conditions, Tom> but its default estimate for a range query is lots tighter than for Tom> a one-sided inequality. It should be willing to indexscan this way. Ahh! It does! explain select stamp from requests where stamp between now() - '1 hour'::interval and now(); does in fact give me an index scan by default. Thanks for the workaround. This planner stuff always seems like black magic. :) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Re: constant scalar subselect no longer equivalent to constant?
From
merlyn@stonehenge.com (Randal L. Schwartz)
Date:
>>>>> "Joshua" == Joshua Moore-Oliva <josh@chatgris.com> writes: Joshua> The fast that constant timestamp works seems to further prove Joshua> my theory since then you have timestamp without time zone. Joshua> I'd be interested to know if this is the solution :) I misspoke if that was implied. It is indeed "timestamp with timestone". -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
merlyn@stonehenge.com (Randal L. Schwartz) writes: > So this changed between 7.2 and 7.3? Well, no, it didn't change. AFAIR the only relevant difference is that 7.3 classifies now() as stable whereas 7.2 didn't have that concept and so had to classify it as noncachable ("volatile" in 7.3 terminology). What your sub-select was doing for you in 7.2 was hiding the volatile function --- the planner's decision about whether an expression is safe to compare to an index doesn't look inside subselects. (Which is arguably a bug, but so far I haven't dared change it because I knew people were using exactly this hack...) In 7.3 that's not necessary because now() isn't considered volatile anyway. But in either version, what the planner is seeing is WHERE indexedcolumn > unknownvalue which is going to force it to fall back on a default estimate for the number of rows retrieved. If you were actually scanning the whole table, you'd not have gotten an indexscan in either version. But the presence of the LIMIT causes the planner to prefer a fast-start plan, rather than the seqscan-and-sort you'd have gotten without any LIMIT. I'm guessing that the situation with your table is right on the edge of where the planner will flip between indexscan and seqscan for this query. How much difference is there in the estimated cost when you force indexscan or seqscan? It could be that the change in plan is just due to some small tweak in the cost models between 7.2 and 7.3. > What's odd is that even writing a function didn't help: > add function ago(interval) returns timestamp with time zone > stable > language 'sql' > as 'select now() - $1'; > I thought the addition of the "stable" keyword would make the return > value be the same as a constant. No, you'd have to claim it to be immutable to make the planner treat it as a constant. Stable means that the function value will hold still long enough to be safe to use as an indexscan qualifier (ie, that actually evaluating it at each row of a seqscan would give the same semantic result as evaluating it only once for use in an indexscan). It does not make it okay to pre-evaluate such a function during planning. now() is pretty much the prototypical example of why we invented the "stable" category: it doesn't change within a query, but that doesn't make it safe for the planner to pre-evaluate it. Plans can live across transactions, so the value of now() at runtime isn't necessarily what the planner would have gotten. regards, tom lane