Re: Plan for relatively simple query seems to be very inefficient - Mailing list pgsql-performance
From | Mischa |
---|---|
Subject | Re: Plan for relatively simple query seems to be very inefficient |
Date | |
Msg-id | 1112812553.42542c091d8ce@webmail.telus.net Whole thread Raw |
In response to | Plan for relatively simple query seems to be very inefficient (Arjen van der Meijden <acmmailing@vulcanus.its.tudelft.nl>) |
List | pgsql-performance |
Quoting Arjen van der Meijden <acmmailing@vulcanus.its.tudelft.nl>: > Hi list, > > I noticed on a forum a query taking a surprisingly large amount of time > in MySQL. Of course I wanted to prove PostgreSQL 8.0.1 could do it much > better. To my surprise PostgreSQL was ten times worse on the same > machine! And I don't understand why. > > I don't really need this query to be fast since I don't use it, but the > range-thing is not really an uncommon query I suppose. So I'm wondering > why it is so slow and this may point to a wrong plan being chosen or > generated. > > Here are table definitions: > > Table "public.postcodes" > Column | Type | Modifiers > -------------+---------------+----------- > postcode_id | smallint | not null > range_from | smallint | > range_till | smallint | > Indexes: > "postcodes_pkey" PRIMARY KEY, btree (postcode_id) > "range" UNIQUE, btree (range_from, range_till) > > Table "public.data_main" > Column | Type | Modifiers > --------+----------+----------- > userid | integer | not null > range | smallint | > Indexes: > "data_main_pkey" PRIMARY KEY, btree (userid) > > And here's the query I ran: > > SELECT COUNT(*) FROM > data_main AS dm, > postcodes AS p > WHERE dm.range BETWEEN p.range_from AND p.range_till I just posted an answer to this (via webcafe webmail; can't recall which pg-list), that might interest you. BTree indexes as they stand (multi-column, ...) answer what most people need for queries. Unfortunately, out-of-the-box, they have no good way of handling range queries. To compensate, you can use a small amount of kinky SQL. This is in the same line as the tricks used to implement hierarchic queries in relational SQL. [1] Create a table "widths"(wid int) of powers of 2, up to what will just cover max(range_till-range_from). Since your "range" column is a smallint, this table can have no more than 15 rows. You can get as fussy as you want about keeping this table to a minimum. [2] Change postcodes: ALTER TABLE postcodes ADD wid INT USING 2 ^ CEIL(LOG(range_from - range_till,2)); ALTER TABLE postcodes ADD start INT USING range_from - (range_from % wid); CREATE INDEX postcodes_wid_start_index ON (wid, start); ANALYZE postcodes; [4] Write your query as: SELECT COUNT(*) FROM data_main AS dm CROSS JOIN widths -- yes, CROSS JOIN. For once, it HELPS performance. JOIN postcodes AS p ON dm.wid = widths.wid AND dm.start = p.range - p.range % widths.wid WHERE dm.range BETWEEN p.range_from AND p.range_till This uses BTREE exact-match to make a tight restriction on which rows to check. YMMV, but this has worked even for multi-M table joins. -- "Dreams come true, not free."
pgsql-performance by date: