Thread: Problems with order by, limit, and indices
Hello, DB: PostgreSQL 7.1 (snap-06012001) OS: Linux I have the following table: slygreetings=> \d users Table "users" Attribute | Type | Modifier -------------+-----------+------------------------------------------------ id | integer | not null default nextval('users_id_seq'::text) sendername | text | senderlname | text | sendermail | text | sender_ip | inet | senderdate | timestamp | rcptname | text | rcptmail | text | rcpt_ip | inet | reason | text | rcptdate | timestamp | crypt | text | active | boolean | default 'false' variant_id | integer | info | text | Indices: users_crypt_key, users_id_key, users_rcptdate_vid_key Vacuum ouput: slygreetings=> vacuum verbose analyze users; NOTICE: --Relation users-- NOTICE: Pages 23572: Changed 0, reaped 0, Empty 0, New 0; Tup 660510: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 156, MaxLen 2032; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 3.23s/0.26u sec. NOTICE: Index users_id_key: Pages 1450; Tuples 660510. CPU 0.18s/0.80u sec. NOTICE: Index users_crypt_key: Pages 3162; Tuples 660510. CPU 0.43s/0.82u sec. NOTICE: Index users_rcptdate_vid_key: Pages 2321; Tuples 660510. CPU 0.29s/0.76u sec. NOTICE: --Relation pg_toast_18741-- NOTICE: Pages 8: Changed 0, reaped 0, Empty 0, New 0; Tup 35: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 647, MaxLen 2034; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. NOTICE: Index pg_toast_18741_idx: Pages 2; Tuples 35. CPU 0.00s/0.00u sec. NOTICE: Analyzing... VACUUM Index: slygreetings=> \d users_rcptdate_vid_key Index "users_rcptdate_vid_key" Attribute | Type ------------+----------- variant_id | integer active | boolean rcptdate | timestamp btree Statistics for variant_id (quite important, will see later) slygreetings=> select count(variant_id),variant_id from users group by variant_id; count | variant_id --------+------------ 11 | 1 545067 | 2 20080 | 3 95351 | 4 1 | 5 (5 rows) I need to get 60 last not active users for specific variant_id. Example with variant_id = 2 slygreetings=> explain select * from users where variant_id=2 AND active='f' order by rcptdate limit 60; NOTICE: QUERY PLAN: Limit (cost=77117.18..77117.18 rows=60 width=145) -> Sort (cost=77117.18..77117.18 rows=162640 width=145) -> Seq Scan on users (cost=0.00..33479.65 rows=162640 width=145) EXPLAIN Executor stats: EXECUTOR STATISTICS ! system usage stats: ! 40.441787 elapsed 19.090000 user 6.340000 system sec ! [280.840000 user 85.980000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/201 [2290/2233] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 23572 read, 0 written, buffer hit rate = 0.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written Example with variant_id = 5 slygreetings=> explain select * from users where variant_id=5 AND active='f' order by rcptdate limit 60; NOTICE: QUERY PLAN: Limit (cost=13005.10..13005.10 rows=60 width=145) -> Sort (cost=13005.10..13005.10 rows=3445 width=145) -> Index Scan using users_rcptdate_vid_key on users (cost=0.00..12658.35 rows=3445 width=145) EXPLAIN Executor stats: EXECUTOR STATISTICS ! system usage stats: ! 0.057452 elapsed 0.000000 user 0.000000 system sec ! [280.840000 user 85.980000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 0/0 [2290/2233] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 5 read, 0 written, buffer hit rate = 0.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written As far as you can see planner somehow ignores the possibility to use users_rcptdate_vid_key for both where quals, and order by. Can please someone help me with this? Is there any possibility to improve the situation (or maybe I misunderstood something). -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Denis Perchine <dyp@perchine.com> writes: > Example with variant_id = 2 > slygreetings=> explain select * from users where variant_id=2 AND active='f' > order by rcptdate limit 60; > NOTICE: QUERY PLAN: > Limit (cost=77117.18..77117.18 rows=60 width=145) > -> Sort (cost=77117.18..77117.18 rows=162640 width=145) > -> Seq Scan on users (cost=0.00..33479.65 rows=162640 width=145) This plan looks fine to me, considering that variant_id=2 is the vast majority of the table. An indexscan will be slower, except perhaps if you've recently CLUSTERed the table on this index. (If you don't believe me, try it with ENABLE_SEQSCAN set to OFF.) > Example with variant_id = 5 > slygreetings=> explain select * from users where variant_id=5 AND active='f' > order by rcptdate limit 60; > NOTICE: QUERY PLAN: > Limit (cost=13005.10..13005.10 rows=60 width=145) > -> Sort (cost=13005.10..13005.10 rows=3445 width=145) > -> Index Scan using users_rcptdate_vid_key on users (cost=0.00..12658.35 rows=3445 width=145) You could probably get a plan without the sort step if you said ... order by variant_id, rcptdate; regards, tom lane
> > Example with variant_id = 2 > > > > slygreetings=> explain select * from users where variant_id=2 AND > > active='f' order by rcptdate limit 60; > > NOTICE: QUERY PLAN: > > > > Limit (cost=77117.18..77117.18 rows=60 width=145) > > -> Sort (cost=77117.18..77117.18 rows=162640 width=145) > > -> Seq Scan on users (cost=0.00..33479.65 rows=162640 width=145) > > This plan looks fine to me, considering that variant_id=2 is the vast > majority of the table. An indexscan will be slower, except perhaps if > you've recently CLUSTERed the table on this index. (If you don't > believe me, try it with ENABLE_SEQSCAN set to OFF.) I would agree with you if there was no limit specified. As far as I can understand it is possible to traverse users_rcptdate_vid_key Forward, and get 60 tuples, than finish. And that tuples will be already sorted (index includes rcptdate also). > > Example with variant_id = 5 > > > > slygreetings=> explain select * from users where variant_id=5 AND > > active='f' order by rcptdate limit 60; > > NOTICE: QUERY PLAN: > > > > Limit (cost=13005.10..13005.10 rows=60 width=145) > > -> Sort (cost=13005.10..13005.10 rows=3445 width=145) > > -> Index Scan using users_rcptdate_vid_key on users > > (cost=0.00..12658.35 rows=3445 width=145) > > You could probably get a plan without the sort step if you said > ... order by variant_id, rcptdate; No way, it just get all tuples for the qual, sort them, and the limiting. That's horrible... slygreetings=> explain select * from users where variant_id=5 AND active='f' order by rcptdate,variant_id limit 60; NOTICE: QUERY PLAN: Limit (cost=13005.10..13005.10 rows=60 width=145) -> Sort (cost=13005.10..13005.10 rows=3445 width=145) -> Index Scan using users_rcptdate_vid_key on users (cost=0.00..12658.35 rows=3445 width=145) EXPLAIN slygreetings=> explain select * from users where variant_id=5 AND active='f' order by rcptdate,variant_id,active limit 60; NOTICE: QUERY PLAN: Limit (cost=13005.10..13005.10 rows=60 width=145) -> Sort (cost=13005.10..13005.10 rows=3445 width=145) -> Index Scan using users_rcptdate_vid_key on users (cost=0.00..12658.35 rows=3445 width=145) EXPLAIN -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Hi, another interesting thing... This is current 7.1. slygreetings=> explain select * from users where variant_id=5 AND active='f' order by rcptdate,variant_id,active limit 60; NOTICE: QUERY PLAN: Limit (cost=13005.10..13005.10 rows=60 width=145) -> Sort (cost=13005.10..13005.10 rows=3445 width=145) -> Index Scan using users_rcptdate_vid_key on users (cost=0.00..12658.35 rows=3445 width=145) EXPLAIN slygreetings=> set enable_sort to off; SET VARIABLE slygreetings=> explain select * from users where variant_id=5 AND active='f' order by rcptdate,variant_id,active limit 60; NOTICE: QUERY PLAN: Limit (cost=100013005.10..100013005.10 rows=60 width=145) -> Sort (cost=100013005.10..100013005.10 rows=3445 width=145) -> Index Scan using users_rcptdate_vid_key on users (cost=0.00..12658.35 rows=3445 width=145) EXPLAIN Cost is something really wierd.... Why? -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Denis Perchine <dyp@perchine.com> writes: >> You could probably get a plan without the sort step if you said >> ... order by variant_id, rcptdate; > No way, it just get all tuples for the qual, sort them, and the limiting. > That's horrible... > slygreetings=> explain select * from users where variant_id=5 AND active='f' > order by rcptdate,variant_id limit 60; > NOTICE: QUERY PLAN: > Limit (cost=13005.10..13005.10 rows=60 width=145) > -> Sort (cost=13005.10..13005.10 rows=3445 width=145) > -> Index Scan using users_rcptdate_vid_key on users (cost=0.00..12658.35 rows=3445 width=145) Now, that's not what I told you to do, is it? It works fine for me: regression=# create table users (variant_id int , active bool, rcptdate date); CREATE regression=# create index usersind on users( variant_id,rcptdate,active); CREATE regression=# explain select * from users where variant_id=5 AND active='f' regression-# order by rcptdate limit 1; NOTICE: QUERY PLAN: Limit (cost=8.22..8.22 rows=1 width=9) -> Sort (cost=8.22..8.22 rows=5 width=9) -> Index Scan using usersind on users (cost=0.00..8.16 rows=5 width=9) EXPLAIN regression=# explain select * from users where variant_id=5 AND active='f' regression-# order by variant_id,rcptdate limit 1; NOTICE: QUERY PLAN: Limit (cost=0.00..1.63 rows=1 width=9) -> Index Scan using usersind on users (cost=0.00..8.16 rows=5 width=9) EXPLAIN The specified sort order has to match the index if you hope to avoid a sort step. regards, tom lane
Denis Perchine <dyp@perchine.com> writes: > Cost is something really wierd.... Why? That's how the enable stuff works for plan types that can't be ignored completely: it just adds a big constant to the estimated cost. If there is no other alternative plan, you get the unwanted plan type anyway. regards, tom lane
> > slygreetings=> explain select * from users where variant_id=5 AND > > active='f' order by rcptdate,variant_id limit 60; > > NOTICE: QUERY PLAN: > > > > Limit (cost=13005.10..13005.10 rows=60 width=145) > > -> Sort (cost=13005.10..13005.10 rows=3445 width=145) > > -> Index Scan using users_rcptdate_vid_key on users > > (cost=0.00..12658.35 rows=3445 width=145) > > Now, that's not what I told you to do, is it? It works fine for me: > > regression=# create table users (variant_id int , active bool, rcptdate > date); CREATE > regression=# create index usersind on users( variant_id,rcptdate,active); > CREATE > regression=# explain select * from users where variant_id=5 AND active='f' > regression-# order by rcptdate limit 1; > NOTICE: QUERY PLAN: > > Limit (cost=8.22..8.22 rows=1 width=9) > -> Sort (cost=8.22..8.22 rows=5 width=9) > -> Index Scan using usersind on users (cost=0.00..8.16 rows=5 > width=9) > > EXPLAIN > regression=# explain select * from users where variant_id=5 AND active='f' > regression-# order by variant_id,rcptdate limit 1; > NOTICE: QUERY PLAN: > > Limit (cost=0.00..1.63 rows=1 width=9) > -> Index Scan using usersind on users (cost=0.00..8.16 rows=5 width=9) > > EXPLAIN > > The specified sort order has to match the index if you hope to avoid > a sort step. Dummy me... My aplogies for the time I stole... -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
I have a bunch of tuning questions, and I would appreciate any answers to any of them, in any order... :) Thanks in advance. 1. Clustering: Are the current docs corecct in saying that once you cluster on an index, subsequent inserts will _not_ respect that cluster? I only ask because a recent posting seemed to imply otherwise. 2. Speed difference between trigger functions in C vs PGPLSQL: Given that I will need to use identical SQL inside the C function as in the PLPGSQL, is there a big speed difference? 3. Query Execution Times: How do I find the number of seconds that a query takes to execute? In MySQL, that info is printed after the results. 4. Inserting w/ Indexes vs. Selecting w/o Indexes: Given an equal number of inserts and selects on a table, (a whole lot of both--about 10/second), is it better to take the penalty on inserts due to indexes or on selects do to lack thereof? 5. Does PG have partial indices yet? I only saw a little about this in the docs. 6. Does PG cache inserts when using Perl, when using `$statement_handle = $dbh->prepare( "insert into foo (bar) values (?)"); $statement_handle->execute();'? 7. How can one do If-Else chains in PL/PGSQL? If I am being braindead about this one, please ignore it. 8. If I get good answers to this, should I patch the FAQ and send it in for approval? Is there any other way to help (besides writing C, which I don't do very well...)? -- Webb Sprague Programmer O1 Communications