Thread: half the query time in an unnecessary(?) sort?
If I have a freshly CLUSTERed table and queries that want to do a merge join, it seems to me that quite a bit of time is spent unnecessarily sorting the already-sorted table. An example such query I found in my log files is shown below. If I read the EXPLAIN ANALYZE output correctly, it's saying that roughly half the time (570-269 = 300 out of 670 ms) was spent sorting the already sorted data. ===================== \d entity_facids; Table "public.entity_facids" Column | Type | Modifiers -----------+-----------+----------- entity_id | integer | fac_ids | integer[] | Indexes: "entity_facids__entity_id" btree (entity_id) fli=# cluster entity_facids__entity_id on entity_facids; CLUSTER fli=# fli=# explain analyze select * from userfeatures.point_features join entity_facids using (entity_id) where featureid=118; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=9299.37..9738.34 rows=1078 width=117) (actual time=536.989..667.648 rows=2204 loops=1) Merge Cond: ("outer".entity_id = "inner".entity_id) -> Sort (cost=37.27..38.45 rows=471 width=85) (actual time=14.289..16.303 rows=2204 loops=1) Sort Key: point_features.entity_id -> Index Scan using point_features__featureid on point_features (cost=0.00..16.36 rows=471 width=85) (actualtime=0.030..9.360 rows=2204 loops=1) Index Cond: (featureid = 118) -> Sort (cost=9262.10..9475.02 rows=85168 width=36) (actual time=518.471..570.038 rows=59112 loops=1) Sort Key: entity_facids.entity_id -> Seq Scan on entity_facids (cost=0.00..2287.68 rows=85168 width=36) (actual time=0.093..268.679 rows=85168loops=1) Total runtime: 693.161 ms (10 rows) fli=# ==================== I understand that the optimizer can not in general know that a CLUSTERed table stays CLUSTERed when inserts or updates happen; but I was wondering if anyone has any clever ideas on how I can avoid this sort step. Perhaps in the future, could the table set a bit to remember it is freshly clustered, and clear that bit the first time any changes are even attempted in the table? Or, if not, would that be possible if Hannu Krosing's read-only-table idea http://archives.postgresql.org/pgsql-hackers/2005-04/msg00660.php happened?
Ron, > If I have a freshly CLUSTERed table and queries that want to do a > merge join, it seems to me that quite a bit of time is spent > unnecessarily sorting the already-sorted table. An example such > query I found in my log files is shown below. If I read the > EXPLAIN ANALYZE output correctly, it's saying that roughly half > the time (570-269 = 300 out of 670 ms) was spent sorting the > already sorted data. It still has to sort because the clustering isn't guarenteed to be 100%. However, such sorts should be very quick as they have little work to do. Looking at your analyze, though, I think it's not the sort that's taking the time as it is that the full sorted entity_id column won't fit in work_mem. Try increasing it? -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: [quoted out of order] >Ron, > > Looking at your analyze, though, I think it's not the sort that's taking the > time as it is that the full sorted entity_id column won't fit in work_mem. > Try increasing it? Yup, that indeed fixed this particular query since neither table was particularly large. > It still has to sort because the clustering isn't guarenteed to be 100%. I guess I was contemplating whether or not there are some conditions where it could be 100% (perhaps combined with Hannu's read only table speculation). > However, such sorts should be very quick as they have little work to do. True, so long as the table can fit in work-mem. For much larger tables IMHO it'd be nice to be able to simply do a seq-scan on them if there were some way of knowing that they were sorted.