Thread: Odd Sort/Limit/Max Problem
Folks, Consider this performance quandry brought to me by Elein, which I can replcate in 7.2.3 and in 7.4 devel: case_clients is a medium-large table with about 110,000 rows. The field date_resolved is a timestamp field which is indexed and allows nulls (in fact, is null for 40% of entries). First, as expected, a regular aggregate is slow: jwnet=> explain analyze select max(date_resolved) from case_clients; NOTICE: QUERY PLAN: Aggregate (cost=3076.10..3076.10 rows=1 width=4) (actual time=484.24..484.24 rows=1 loops=1) -> Seq Scan on case_clients (cost=0.00..2804.48 rows=108648 width=4) (actual time=0.08..379.81 rows=108648 loops=1) Total runtime: 484.44 msec So we use the workaround standard for PostgreSQL: jwnet=> explain analyze select date_resolved from case_clients order by date_resolved desc limit 1; NOTICE: QUERY PLAN: Limit (cost=0.00..1.50 rows=1 width=4) (actual time=0.22..0.23 rows=1 loops=1) -> Index Scan Backward using idx_caseclients_resolved on case_clients (cost=0.00..163420.59 rows=108648 width=4) (actual time=0.21..0.22 rows=2 loops=1) Total runtime: 0.33 msec ... which is fast, but returns NULL, since nulls sort to the bottom! So we add IS NOT NULL: jwnet=> explain analyze select date_resolved from case_clients where date_resolved is not null order by date_resolved desc limit 1; NOTICE: QUERY PLAN: Limit (cost=0.00..4.06 rows=1 width=4) (actual time=219.63..219.64 rows=1 loops=1) -> Index Scan Backward using idx_caseclients_resolved on case_clients (cost=0.00..163420.59 rows=40272 width=4) (actual time=219.62..219.62 rows=2 loops=1) Total runtime: 219.76 msec Aieee! Almost as slow as the aggregate! Now, none of those times is huge on this test database, but on a larger database (> 1million rows) the performance problem is much worse. For some reason, the backward index scan seems to have to transverse all of the NULLs before selecting a value. I find this peculiar, as I was under the impression that NULLs were not indexed. What's going on here? -- -Josh Berkus Aglio Database Solutions San Francisco
On Fri, 13 Dec 2002, Josh Berkus wrote: > First, as expected, a regular aggregate is slow: > So we use the workaround standard for PostgreSQL: > > ... which is fast, but returns NULL, since nulls sort to the bottom! So we > add IS NOT NULL: > > jwnet=> explain analyze select date_resolved from case_clients where > date_resolved is not null order by date_resolved desc limit 1; > NOTICE: QUERY PLAN: > > Limit (cost=0.00..4.06 rows=1 width=4) (actual time=219.63..219.64 rows=1 > loops=1) > -> Index Scan Backward using idx_caseclients_resolved on case_clients > (cost=0.00..163420.59 rows=40272 width=4) (actual time=219.62..219.62 rows=2 > loops=1) > Total runtime: 219.76 msec > > Aieee! Almost as slow as the aggregate! I'd suggest trying a partial index on date_resolved where date_resolve is not null. In my simple tests on about 200,000 rows of ints where 50% are null that sort of index cut the runtime on my machine from 407.66 msec to 0.15 msec.
Josh Berkus <josh@agliodbs.com> writes: > Now, none of those times is huge on this test database, but on a larger > database (> 1million rows) the performance problem is much worse. For some > reason, the backward index scan seems to have to transverse all of the NULLs > before selecting a value. Correct. You lose, if there are a lot of nulls. Unfortunately, the "IS NOT NULL" clause isn't considered an indexable operator and so the indexscan has no idea that it shouldn't return the null rows. If it could just traverse past them in the index, this example wouldn't be so bad, but it goes out and fetches the heap rows before discarding 'em :-( > I find this peculiar, as I was under the > impression that NULLs were not indexed. Not correct. btrees index NULLs, as they must do in order to have correct behavior for multicolumn indexes. I think it would work to instead do something like select date_resolved from case_clients where date_resolved < 'infinity' order by date_resolved desc limit 1; since then the indexscan will get a qualifier condition that will allow it to discard the nulls. In fact, I think this will even prevent having to traverse past the nulls in the index --- the original form starts the indexscan at the index end, but this should do a btree descent search to exactly the place you want. Note that the where-clause has to match the scan direction (> or >= for ASC, < or <= for DESC) so that it looks like a "start here" condition to btree. regards, tom lane
Tom Lane kirjutas L, 14.12.2002 kell 01:24: > Josh Berkus <josh@agliodbs.com> writes: > > Now, none of those times is huge on this test database, but on a larger > > database (> 1million rows) the performance problem is much worse. For some > > reason, the backward index scan seems to have to transverse all of the NULLs > > before selecting a value. > > Correct. You lose, if there are a lot of nulls. Unfortunately, the > "IS NOT NULL" clause isn't considered an indexable operator and so the > indexscan has no idea that it shouldn't return the null rows. If it > could just traverse past them in the index, this example wouldn't be so > bad, but it goes out and fetches the heap rows before discarding 'em :-( > > > I find this peculiar, as I was under the > > impression that NULLs were not indexed. > > Not correct. btrees index NULLs, as they must do in order to have > correct behavior for multicolumn indexes. I've heard this befoe, but this is something I've never understood - why do you have to index _single_ null's in order to behave correctly for multi-column index. Is it that postgres thinks that tuple of several nulls is the same as null ? Is it just that nulls need to have an ordering and that this fact has somehow leaked down to actually being stored in the index ? I don't have anything against nulls being indexed - in a table where nulls have about the same frequency as other values it may actually be useful (if indexes were used to find IS NULL tuples) -- Hannu Krosing <hannu@tm.ee>
Hannu Krosing <hannu@tm.ee> writes: > Tom Lane kirjutas L, 14.12.2002 kell 01:24: >> Not correct. btrees index NULLs, as they must do in order to have >> correct behavior for multicolumn indexes. > I've heard this befoe, but this is something I've never understood - why > do you have to index _single_ null's in order to behave correctly for > multi-column index. Well, you don't absolutely *have to* index individual nulls, but you do have to support nulls in index entries. The example that motivates this is create table foo (f1 int, f2 int); create index fooi on foo(f1,f2); ... fill table ... select * from foo where f1 = 42; The planner is entitled to implement this as an indexscan using fooi's first column (and ignoring its lower-order column(s)). Now if fooi does not index rows in which f2 is null, you lose, because it may omit rows with f1 = 42 that should have been found by the indexscan. So it *has to* be able to store index entries like (42, NULL). For btree's purposes, the easiest implementation is to say that NULL is an ordinary index entry with a definable sort position (which we chose to define as "after all non-NULL values"). There's no particular value in having a special case for all-NULL index entries, so we don't. GiST is not able to handle all-NULL index entries, so it uses the rule "index all rows in which the first index column is not NULL". This still meets the planner's constraint because we never do an indexscan that uses only lower-order index columns. hash and rtree don't support NULL index entries, but they don't support multicolumn indexes either, so the constraint doesn't apply. > I don't have anything against nulls being indexed - in a table where > nulls have about the same frequency as other values it may actually be > useful (if indexes were used to find IS NULL tuples) At least for btree, it would be nice to someday allow IS NULL as an indexable operator. I haven't thought very hard about how to do that; shoehorning it into the operator class structure looks like it'd be a horrid mess, so it'd probably require some creative klugery :-( > Is it just that nulls need to have an ordering and that this fact has > somehow leaked down to actually being stored in the index ? No, more the other way around: btree assigns an ordering to NULLs because it must do so in order to know where to put them in the index. This is an artifact of btree that happens to "leak upward" ... regards, tom lane