Re: Question about (probably wrong) index scan cost for conditional indexes - Mailing list pgsql-general
From | Alex Lai |
---|---|
Subject | Re: Question about (probably wrong) index scan cost for conditional indexes |
Date | |
Msg-id | 4F2990C9.7020901@sesda2.com Whole thread Raw |
In response to | Re: Question about (probably wrong) index scan cost for conditional indexes (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-general |
Tom Lane wrote: > Maxim Boguk <maxim.boguk@gmail.com> writes: > >> I know there is issue with statistics over intarrays (it was there >> very long time and sometime it's complicating things a lot). >> > > >> However, the 100x cost difference between: >> SELECT * from test order by id limit 100; (over "primary key (id)" btree index) >> Limit (cost=0.00..3.43 rows=100 width=37) >> vs >> SELECT * from test where sections && '{2}' order by value limit 100; >> (over "test_value_in2section_key on test(value) where sections && >> '{2}'" btree index) >> Limit (cost=0.00..539.29 rows=100 width=37) >> seems wrong for me. >> > > [ shrug... ] It really is the fault of the bad rowcount estimate. > The cost estimates for the complete indexscans are reasonable. However, > in one case you've got LIMIT thinking that it will fetch the first 100 > out of 1000000 index entries, so it divides the indexscan cost estimate > by 10000, and gets something reasonably accurate. In the other case, > LIMIT thinks it's going to fetch the first 100 out of 1000 index > entries, so it divides the indexscan cost estimate by 10, and comes out > with something not very accurate. If the rowcount estimate for && had > been correct, those numbers would be much more alike. > > >> Both queries performs the absolutely same task: fetch 100 entries from >> the table based on the ideally suitable index (no post >> processing/filtering were done at all... just return 100 sorted tuples >> based on single index scan). >> > > Well, you know that and I know that, but the exposed cost and rowcount > estimates for the IndexScan plan node imply something else entirely: > that the cost-per-tuple-fetched is a lot higher in the one case than > the other. The LIMIT estimator has no reason, or method, to second > guess that. > > >> And even if I drop the intarray index completely, than I still have a >> wrong plan (bitmap scan + sort), because planner cost for the index >> scan over conditional index 100 more the it should be. >> (e.g. there is still an issue even in absence of the intarray index). >> > > Yeah, because it's not about the index, it's about the selectivity of > the && operator. That estimate is wrong regardless of whether there > are any indexes involved. > > >> Is absence of frequency statistics over intarrays somehow linked to >> the wrong planner cost estimates for conditional index scan? >> > > Well, we lack both the statistics and an operator selectivity function > that would know what to do with them. Just a small matter of > programming ... > > regards, tom lane > > Tom, I had the same situation in one of my query. Use the subquery can speed up almost by 100 times faster. explain analyse select FileId as "FileId", ESDT as "ESDT",1 as "Position" from V_FileMeta_L3 where Archiveset = 61000 and ESDT= 'ESDT123' and Source = 'SOURCE1234' and ( (StartTime between '2012-01-28 05:59:57.000000Z'::timestamp - '-135 minutes'::interval and '2012-01-28 07:41:27.000000Z'::timestamp + '100 days'::interval) or (EndTime between '2012-01-28 05:59:57.000000Z'::timestamp - '-135 minutes'::interval and '2012-01-28 07:41:27.000000Z'::timestamp + '100 days'::interval) ) order by starttime limit 1; Limit (cost=0.00..15.20 rows=1 width=22) (actual time=200.048..200.048 rows=1 loops=1) -> Nested Loop (cost=0.00..117596.32 rows=7736 width=22) (actual time=200.046..200.046 rows=1 loops=1) -> Index Scan using ak_filemeta_l3_esdt_starttime_endtime on filemeta_l3 b (cost=0.00..77200.55 rows=7736 width=22) (actual time=199.986..199.989 rows=2 loops=1) Index Cond: ((esdt)::text = 'ROLPT'::text) Filter: (((source)::text = 'OMPS-NPP'::text) AND (((starttime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (starttime <= '2012- 05-07 07:41:27'::timestamp without time zone)) OR ((endtime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (endtime <= '2012-05-07 07:41:27'::ti mestamp without time zone)))) -> Index Scan using pk_filemeta_archiveset on filemeta_archiveset a (cost=0.00..5.21 rows=1 width=4) (actual time=0.025..0.025 rows=0 loops=2) Index Cond: ((a.fileid = b.fileid) AND (a.archiveset = 61000)) Total runtime: 200.102 ms (8 rows) explain analyse select FileId as "FileId", ESDT as "ESDT",1 as "Position" from V_FileMeta_L3 where FileId in (select fileid from V_FileMeta_L3 where Archiveset = 61000 and ESDT= 'ESDT123' and Source = 'SOUCE1234' and ( (StartTime between '2012-01-28 05:59:57.000000Z'::timestamp - '-135 minutes'::interval and '2012-01-28 07:41:27.000000Z'::timestamp + '100 days'::interval) or (EndTime between '2012-01-28 05:59:57.000000Z'::timestamp - '-135 minutes'::interval and '2012-01-28 07:41:27.000000Z'::timestamp + '100 days'::interval) ) order by starttime) limit 1; Limit (cost=53038.95..53039.41 rows=1 width=14) (actual time=2.502..2.502 rows=1 loops=1) -> Nested Loop (cost=53038.95..6661196.32 rows=14611464 width=14) (actual time=2.480..2.480 rows=1 loops=1) -> Nested Loop (cost=53038.95..53716.80 rows=2549631 width=18) (actual time=2.464..2.464 rows=1 loops=1) -> HashAggregate (cost=53038.95..53040.95 rows=200 width=4) (actual time=2.445..2.445 rows=1 loops=1) -> Sort (cost=52923.05..52942.37 rows=7727 width=12) (actual time=2.346..2.372 rows=43 loops=1) Sort Key: b.starttime Sort Method: quicksort Memory: 27kB -> Nested Loop (cost=209.87..52424.05 rows=7727 width=12) (actual time=0.358..2.251 rows=43 loops=1) -> Bitmap Heap Scan on filemeta_l3 b (cost=209.87..12077.66 rows=7727 width=12) (actual time=0.262..0.685 rows=108 loops=1) Recheck Cond: ((((esdt)::text = 'ROLPT'::text) AND (starttime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (starttime <= '2012-05-07 07:41:2 7'::timestamp without time zone)) OR (((esdt)::text = 'ROLPT'::text) AND (endtime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (endtime <= '2012-05-07 07:41:27'::timestamp without time zone))) Filter: ((source)::text = 'OMPS-NPP'::text) -> BitmapOr (cost=209.87..209.87 rows=9895 width=0) (actual time=0.223..0.223 rows=0 loops=1) -> Bitmap Index Scan on ak_filemeta_l3_esdt_starttime_endtime (cost=0.00..107.72 rows=4961 width=0) (actual time=0.127..0.127 rows=106 loops=1) Index Cond: (((esdt)::text = 'ROLPT'::text) AND (starttime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (starttime <= '2012-05-0 7 07:41:27'::timestamp without time zone)) -> Bitmap Index Scan on ak_filemeta_l3_esdt_endtime (cost=0.00..98.29 rows=4934 width=0) (actual time=0.093..0.093 rows=108 loops=1) Index Cond: (((esdt)::text = 'ROLPT'::text) AND (endtime >= '2012-01-28 08:14:57'::timestamp without time zone) AND (endtime <= '2012-05-07 07 :41:27'::timestamp without time zone)) -> Index Scan using pk_filemeta_archiveset on filemeta_archiveset a (cost=0.00..5.21 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=108) Index Cond: ((a.fileid = b.fileid) AND (a.archiveset = 61000)) -> Index Scan using pk_filemeta_l3 on filemeta_l3 b (cost=0.00..3.37 rows=1 width=14) (actual time=0.015..0.015 rows=1 loops=1) Index Cond: (b.fileid = a.fileid) -> Index Scan using ak_filemeta_archiveset_fileid on filemeta_archiveset a (cost=0.00..2.52 rows=6 width=4) (actual time=0.012..0.012 rows=1 loops=1) Index Cond: (a.fileid = b.fileid) Total runtime: 2.711 ms Hope this help. Best Regards, Alex Lai
pgsql-general by date: