LIMIT confuses the planner (again) - Mailing list pgsql-performance
From | Kouber Saparev |
---|---|
Subject | LIMIT confuses the planner (again) |
Date | |
Msg-id | 4AC07714.1080909@saparev.com Whole thread Raw |
Responses |
Re: LIMIT confuses the planner (again)
(Robert Haas <robertmhaas@gmail.com>)
|
List | pgsql-performance |
Hello, I am using PostgreSQL 8.3.7 and I am experiencing an issue similar to the one I've already described some time ago: http://archives.postgresql.org/pgsql-performance/2009-02/msg00261.php Again, adding a LIMIT clause to a query, which is normally executing very fast thanks to an index, makes it perform slow, because the planner no longer uses the "correct" index. I have the following table: CREATE TABLE message ( message_sid SERIAL PRIMARY KEY, from_profile_sid INT NOT NULL REFERENCES profile, to_profile_sid INT NOT NULL REFERENCES profile, sender_has_deleted BOOLEAN NOT NULL DEFAULT FALSE, receiver_has_deleted BOOLEAN NOT NULL DEFAULT FALSE, body TEXT, datetime TIMESTAMP NOT NULL DEFAULT NOW() ); With the following conditional index: CREATE INDEX message_to_profile_idx ON message (to_profile_sid) WHERE NOT receiver_has_deleted; The query to obtain the list of received messages of a profile is simple and executes very fast, because of the index above: db=# EXPLAIN ANALYZE SELECT * FROM message WHERE to_profile_sid = -1 AND NOT receiver_has_deleted ORDER BY message_sid DESC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=11857.09..11866.19 rows=3640 width=277) (actual time=0.317..0.319 rows=15 loops=1) Sort Key: message_sid Sort Method: quicksort Memory: 32kB -> Bitmap Heap Scan on message (cost=106.44..11641.78 rows=3640 width=277) (actual time=0.096..0.271 rows=15 loops=1) Recheck Cond: ((to_profile_sid = (-1)) AND (NOT receiver_has_deleted)) -> Bitmap Index Scan on message_to_profile_idx (cost=0.00..105.53 rows=3640 width=0) (actual time=0.056..0.056 rows=21 loops=1) Index Cond: (to_profile_sid = (-1)) Total runtime: 0.383 ms (8 rows) Adding a LIMIT clause to exactly the same query slows its execution more than 20'000 times: db=# EXPLAIN ANALYZE SELECT * FROM message WHERE to_profile_sid = -1 AND NOT receiver_has_deleted ORDER BY message_sid DESC LIMIT 20; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..6513.60 rows=20 width=277) (actual time=0.617..6576.539 rows=15 loops=1) -> Index Scan Backward using message_pkey on message (cost=0.00..1185474.32 rows=3640 width=277) (actual time=0.617..6576.522 rows=15 loops=1) Filter: ((NOT receiver_has_deleted) AND (to_profile_sid = (-1))) Total runtime: 6576.572 ms (4 rows) Just as I was advised in my recent post, I've already increased the statistics of both fields all the way till 1000, analyzed the table and reindexed the index: ALTER TABLE message ALTER COLUMN to_profile_sid SET STATISTICS 1000; ALTER TABLE message ALTER COLUMN receiver_has_deleted SET STATISTICS 1000; ANALYZE message; REINDEX index message_to_profile_idx; This time, however, the steps above didn't affect the planner in any way, it still refuses to use the index "message_to_profile_idx" when a LIMIT is involved (for this particular value of to_profile_sid). Here's some statistical data: db=# SELECT COUNT(*) FROM message; count --------- 1312213 (1 row) db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1; count ------- 5604 (1 row) db=# SELECT COUNT(*) FROM message WHERE to_profile_sid = -1 AND NOT receiver_has_deleted; count ------- 15 (1 row) db=# SELECT COUNT(DISTINCT to_profile_sid) FROM message; count ------- 8596 (1 row) db=# SELECT AVG(length) FROM (SELECT to_profile_sid, COUNT(*) AS length FROM message GROUP BY to_profile_sid) AS freq; avg ---------------------- 152.6540251279664960 (1 row) db=# SELECT n_distinct FROM pg_stats WHERE tablename='message' AND attname='to_profile_sid'; n_distinct ------------ 6277 (1 row) Also, the value of -1 for "to_profile_sid" is second in the list of most_common_vals in pg_stats, but still I don't understand why a simple limit is blinding the planner for the "good" index. Any ideas? Regards, -- Kouber Saparev http://kouber.saparev.com/
pgsql-performance by date: