Thread: any way to use indexscan to get last X values with "order by Y limit X" clause?
any way to use indexscan to get last X values with "order by Y limit X" clause?
From
Tomaz Borstnar
Date:
Similar question was http://archives.postgresql.org/pgsql-admin/2002-05/msg00148.php, but google did not have answer for it. Here is the structure: Column | Type | Modifiers -------------+--------------------------+---------------------- id | integer | not null default '0' datestamp | timestamp with time zone | not null thread | integer | not null default '0' parent | integer | not null default '0' author | character(37) | not null default '' subject | character(255) | not null default '' email | character(200) | not null default '' attachment | character(64) | default '' host | character(50) | not null default '' email_reply | character(1) | not null default 'N' approved | character(1) | not null default 'N' msgid | character(100) | not null default '' modifystamp | integer | not null default '0' userid | integer | not null default '0' closed | smallint | default '0' Indexes: tjavendanpri_key primary key btree (id), tjavendan_approved btree (approved), tjavendan_author btree (author), tjavendan_datestamp btree (datestamp), tjavendan_modifystamp btree (modifystamp), tjavendan_msgid btree (msgid), tjavendan_parent btree (parent), tjavendan_subject btree (subject), tjavendan_thread btree (thread), tjavendan_userid btree (userid) Here is the query: SELECT thread, modifystamp, count(id) AS tcount, abstime(modifystamp) AS latest, max(id) as maxid FROM tjavendan WHERE approved='Y' GROUP BY thread, modifystamp ORDER BY modifystamp desc, thread desc limit 40 and explain analyze for it: krtjavendan34=> EXPLAIN ANALYZE SELECT thread, modifystamp, count(id) AS tcount, abstime(modifystamp) AS latest, max(id) as maxid FROM tjavendan WHERE approved='Y' GROUP BY thread, modifystamp ORDER BY modifystamp desc, thread desc limit 40; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=18419.78..18419.88 rows=40 width=12) (actual time=6735.06..6735.69 rows=40 loops=1) -> Sort (cost=18419.78..18441.34 rows=8626 width=12) (actual time=6735.04..6735.25 rows=41 loops=1) Sort Key: modifystamp, thread -> Aggregate (cost=16777.53..17855.84 rows=8626 width=12) (actual time=4605.01..6711.27 rows=2938 loops=1) -> Group (cost=16777.53..17424.52 rows=86265 width=12) (actual time=4604.85..6164.29 rows=86265 loops=1) -> Sort (cost=16777.53..16993.19 rows=86265 width=12) (actual time=4604.82..5130.14 rows=86265 loops=1) Sort Key: thread, modifystamp -> Seq Scan on tjavendan (cost=0.00..9705.31 rows=86265 width=12) (actual time=0.13..3369.28 rows=86265 loops=1) Filter: (approved = 'Y'::bpchar) Total runtime: 6741.12 msec (10 rows) This is on 7.3.3. Having backwards reading of index would really help here. Thanks in advance. Tomaz
Re: any way to use indexscan to get last X values with "order by Y limit X" clause?
From
"Shridhar Daithankar"
Date:
On 15 Jun 2003 at 16:26, Tomaz Borstnar wrote: > > Here is the structure: <snip> > approved | character(1) | not null default 'N' > msgid | character(100) | not null default '' > modifystamp | integer | not null default '0' > userid | integer | not null default '0' > closed | smallint | default '0' > Indexes: tjavendanpri_key primary key btree (id), > tjavendan_approved btree (approved), <snip> > Here is the query: > SELECT thread, modifystamp, count(id) AS tcount, abstime(modifystamp) AS > latest, max(id) as maxid FROM tjavendan WHERE approved='Y' GROUP BY > thread, modifystamp ORDER BY modifystamp desc, thread desc limit 40 Question. The field approved seems to have boolean values. If probability of having either of value is 50%, I doubt planner will use index anyway. Even assuming all possible values of a char variable, the choice isn't too much, say if you have 1M row. Correct me if I am wrong. Bye Shridhar -- Either one of us, by himself, is expendable. Both of us are not. -- Kirk, "The Devil in the Dark", stardate 3196.1
At 16:31 15.6.2003, Shridhar Daithankar wrote: >Question. The field approved seems to have boolean values. If probability of >having either of value is 50%, I doubt planner will use index anyway. True. It has Y or N only so index on approved is useless. But using index on ORDER BY part would help a lot since it knows to fetch last X ordered values. >Correct me if I am wrong. Unfortunately you are very right. I am not sure how to stuff modifystamp and thread into WHERE clause to make it use indexes on thread and/or modifystamp. So far I believe this would be the only way to use them, right? Tomaz
On Sun, 15 Jun 2003, Tomaz Borstnar wrote: > Similar question was > http://archives.postgresql.org/pgsql-admin/2002-05/msg00148.php, but google > did not have answer for it. > > Here is the structure: > > Column | Type | Modifiers > -------------+--------------------------+---------------------- > id | integer | not null default '0' > datestamp | timestamp with time zone | not null > thread | integer | not null default '0' > parent | integer | not null default '0' > author | character(37) | not null default '' > subject | character(255) | not null default '' > email | character(200) | not null default '' > attachment | character(64) | default '' > host | character(50) | not null default '' > email_reply | character(1) | not null default 'N' > approved | character(1) | not null default 'N' > msgid | character(100) | not null default '' > modifystamp | integer | not null default '0' > userid | integer | not null default '0' > closed | smallint | default '0' > Indexes: tjavendanpri_key primary key btree (id), > tjavendan_approved btree (approved), > tjavendan_author btree (author), > tjavendan_datestamp btree (datestamp), > tjavendan_modifystamp btree (modifystamp), > tjavendan_msgid btree (msgid), > tjavendan_parent btree (parent), > tjavendan_subject btree (subject), > tjavendan_thread btree (thread), > tjavendan_userid btree (userid) > > Here is the query: > SELECT thread, modifystamp, count(id) AS tcount, abstime(modifystamp) AS > latest, max(id) as maxid FROM tjavendan WHERE approved='Y' GROUP BY > thread, modifystamp ORDER BY modifystamp desc, thread desc limit 40 I'm not sure that it'd help since I don't think it'd realize that it doesn't actually need to completely do the group by due to the order by, but in any case, in the above, the sort orders are different for the group by and the order by and you'd really want a two column index on (probably) (modifystamp, thread) in order to get the best results on replacing a scan + sort.
Re: any way to use indexscan to get last X values with "order by Y limit X" clause?
From
Tom Lane
Date:
Tomaz Borstnar <tomaz.borstnar@over.net> writes: > SELECT thread, modifystamp, count(id) AS tcount, abstime(modifystamp) AS > latest, max(id) as maxid FROM tjavendan WHERE approved='Y' GROUP BY > thread, modifystamp ORDER BY modifystamp desc, thread desc limit 40 > Having backwards reading of index would really help here. The only way that a fast-start plan is useful is if there is a way to do it with no explicit sort steps at all. A sort step must read its entire input before it can produce any output, so you completely blow the chance of not reading the whole table as soon as there's any sorting. There are a couple of reasons why this query can't be done using only an initial indexscan to sort the data: 1. You don't have a suitable index. Neither an index on modifystamp alone nor an index on thread alone is of any use to produce a two-column ordering; you need a two-column index on (modifystamp, thread). 2. The GROUP BY and ORDER BY steps require different sort orders, and so even if an index satisfied one, there'd still be a sort needed for the other. This is partly your fault (writing the columns in different orders) and partly the system's fault: it's implicitly taking the GROUP BY entries to be equivalent to ORDER BY ASC, which is overspecification. I've applied the attached patch to CVS tip to cure the latter problem. With this, a two-column index, and compatible column ordering in ORDER BY and GROUP BY, I get a reasonable-looking fast-start plan. The patch will not apply exactly against 7.3 because there's a renamed function call in there, but you could make it work with a little effort. regards, tom lane *** src/backend/parser/analyze.c.orig Fri Jun 6 11:04:02 2003 --- src/backend/parser/analyze.c Sun Jun 15 12:05:34 2003 *************** *** 1787,1799 **** */ qry->havingQual = transformWhereClause(pstate, stmt->havingClause); ! qry->groupClause = transformGroupClause(pstate, ! stmt->groupClause, ! qry->targetList); ! qry->sortClause = transformSortClause(pstate, stmt->sortClause, qry->targetList); qry->distinctClause = transformDistinctClause(pstate, stmt->distinctClause, --- 1787,1804 ---- */ qry->havingQual = transformWhereClause(pstate, stmt->havingClause); ! /* ! * Transform sorting/grouping stuff. Do ORDER BY first because both ! * transformGroupClause and transformDistinctClause need the results. ! */ qry->sortClause = transformSortClause(pstate, stmt->sortClause, qry->targetList); + + qry->groupClause = transformGroupClause(pstate, + stmt->groupClause, + qry->targetList, + qry->sortClause); qry->distinctClause = transformDistinctClause(pstate, stmt->distinctClause, *** src/backend/parser/parse_clause.c.orig Fri Jun 6 11:04:02 2003 --- src/backend/parser/parse_clause.c Sun Jun 15 12:19:14 2003 *************** *** 1124,1130 **** * transform a GROUP BY clause */ List * ! transformGroupClause(ParseState *pstate, List *grouplist, List *targetlist) { List *glist = NIL, *gl; --- 1124,1131 ---- * transform a GROUP BY clause */ List * ! transformGroupClause(ParseState *pstate, List *grouplist, ! List *targetlist, List *sortClause) { List *glist = NIL, *gl; *************** *** 1132,1152 **** foreach(gl, grouplist) { TargetEntry *tle; tle = findTargetlistEntry(pstate, lfirst(gl), targetlist, GROUP_CLAUSE); /* avoid making duplicate grouplist entries */ ! if (!targetIsInSortList(tle, glist)) ! { ! GroupClause *grpcl = makeNode(GroupClause); ! ! grpcl->tleSortGroupRef = assignSortGroupRef(tle, targetlist); ! grpcl->sortop = ordering_oper_opid(tle->resdom->restype); ! ! glist = lappend(glist, grpcl); } } return glist; --- 1133,1173 ---- foreach(gl, grouplist) { TargetEntry *tle; + Oid ordering_op; + GroupClause *grpcl; tle = findTargetlistEntry(pstate, lfirst(gl), targetlist, GROUP_CLAUSE); /* avoid making duplicate grouplist entries */ ! if (targetIsInSortList(tle, glist)) ! continue; ! /* ! * If the GROUP BY clause matches the ORDER BY clause, we want to ! * adopt the ordering operators from the latter rather than using ! * the default ops. This allows "GROUP BY foo ORDER BY foo DESC" to ! * be done with only one sort step. Note we are assuming that any ! * user-supplied ordering operator will bring equal values together, ! * which is all that GROUP BY needs. ! */ ! if (sortClause && ! ((SortClause *) lfirst(sortClause))->tleSortGroupRef == ! tle->resdom->ressortgroupref) ! { ! ordering_op = ((SortClause *) lfirst(sortClause))->sortop; ! sortClause = lnext(sortClause); } + else + { + ordering_op = ordering_oper_opid(tle->resdom->restype); + sortClause = NIL; /* disregard ORDER BY once match fails */ + } + + grpcl = makeNode(GroupClause); + grpcl->tleSortGroupRef = assignSortGroupRef(tle, targetlist); + grpcl->sortop = ordering_op; + glist = lappend(glist, grpcl); } return glist; *** src/include/parser/parse_clause.h.orig Fri Mar 21 20:49:38 2003 --- src/include/parser/parse_clause.h Sun Jun 15 12:03:13 2003 *************** *** 22,28 **** extern bool interpretInhOption(InhOption inhOpt); extern Node *transformWhereClause(ParseState *pstate, Node *where); extern List *transformGroupClause(ParseState *pstate, List *grouplist, ! List *targetlist); extern List *transformSortClause(ParseState *pstate, List *orderlist, List *targetlist); extern List *transformDistinctClause(ParseState *pstate, List *distinctlist, --- 22,28 ---- extern bool interpretInhOption(InhOption inhOpt); extern Node *transformWhereClause(ParseState *pstate, Node *where); extern List *transformGroupClause(ParseState *pstate, List *grouplist, ! List *targetlist, List *sortClause); extern List *transformSortClause(ParseState *pstate, List *orderlist, List *targetlist); extern List *transformDistinctClause(ParseState *pstate, List *distinctlist,
At 18:53 15.6.2003, you wrote: >I've applied the attached patch to CVS tip to cure the latter problem. >With this, a two-column index, and compatible column ordering in ORDER >BY and GROUP BY, I get a reasonable-looking fast-start plan. The patch >will not apply exactly against 7.3 because there's a renamed function >call in there, but you could make it work with a little effort. You mean this: /* * ordering_oper_opid - convenience routine for oprid(ordering_oper()) * * This was formerly called any_ordering_op() */ A little later... WOW! 100 to 130 times faster on same dataset and additional index on (modifystamp,thread) which was not really useful before this patch! krtjavendan34=> EXPLAIN ANALYZE SELECT thread, modifystamp, count(id) AS tcount,abstime(modifystamp) AS latest, max(id) as maxid FROM tjavendan WHERE approved='Y' GROUP BY modifystamp, thread ORDER BY modifystamp desc, thread desc limit 40; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..97.13 rows=40 width=12) (actual time=1.07..48.71 rows=40 loops=1) -> Aggregate (cost=0.00..20947.38 rows=8626 width=12) (actual time=1.05..48.23 rows=41 loops=1) -> Group (cost=0.00..20516.06 rows=86265 width=12) (actual time=0.35..42.25 rows=843 loops=1) -> Index Scan Backward using tjavendan_modstamp_thrd on tjavendan (cost=0.00..20084.73 rows=86265 width=12) (actual time=0.34..31.29 rows=844 loops=1) Filter: (approved = 'Y'::bpchar) Total runtime: 50.20 msec (6 rows) Used to be between 5800 and 6741 msec before this patch! Thanks!
Here is the 7.3.3 patch as it might help others too... diff -rcN postgresql-7.3.3/src/backend/parser/analyze.c postgresql-7.3.3-grouporderby/src/backend/parser/analyze.c *** postgresql-7.3.3/src/backend/parser/analyze.c Thu Feb 13 23:50:09 2003 --- postgresql-7.3.3-grouporderby/src/backend/parser/analyze.c Mon Jun 16 00:13:05 2003 *************** *** 1667,1679 **** */ qry->havingQual = transformWhereClause(pstate, stmt->havingClause); ! qry->groupClause = transformGroupClause(pstate, ! stmt->groupClause, ! qry->targetList); qry->sortClause = transformSortClause(pstate, stmt->sortClause, qry->targetList); qry->distinctClause = transformDistinctClause(pstate, stmt->distinctClause, --- 1667,1682 ---- */ qry->havingQual = transformWhereClause(pstate, stmt->havingClause); ! /* ! * Transform sorting/grouping stuff. Do ORDER BY first because both ! * transformGroupClause and transformDistinctClause need the results. ! */ qry->sortClause = transformSortClause(pstate, stmt->sortClause, qry->targetList); + + qry->groupClause = transformGroupClause(pstate, stmt->groupClause, qry->targetList, qry->sortClause); qry->distinctClause = transformDistinctClause(pstate, stmt->distinctClause, diff -rcN postgresql-7.3.3/src/backend/parser/parse_clause.c postgresql-7.3.3-grouporderby/src/backend/parser/parse_clause.c *** postgresql-7.3.3/src/backend/parser/parse_clause.c Mon Dec 16 19:39:56 2002 --- postgresql-7.3.3-grouporderby/src/backend/parser/parse_clause.c Mon Jun 16 00:24:58 2003 *************** *** 1145,1151 **** * */ List * ! transformGroupClause(ParseState *pstate, List *grouplist, List *targetlist) { List *glist = NIL, *gl; --- 1145,1151 ---- * */ List * ! transformGroupClause(ParseState *pstate, List *grouplist, List *targetlist, List *sortClause) { List *glist = NIL, *gl; *************** *** 1153,1173 **** foreach(gl, grouplist) { TargetEntry *tle; tle = findTargetlistEntry(pstate, lfirst(gl), targetlist, GROUP_CLAUSE); /* avoid making duplicate grouplist entries */ ! if (!targetIsInSortList(tle, glist)) ! { ! GroupClause *grpcl = makeNode(GroupClause); ! ! grpcl->tleSortGroupRef = assignSortGroupRef(tle, targetlist); ! ! grpcl->sortop = any_ordering_op(tle->resdom->restype); ! ! glist = lappend(glist, grpcl); ! } } return glist; --- 1153,1193 ---- foreach(gl, grouplist) { TargetEntry *tle; + Oid ordering_op; + GroupClause *grpcl; tle = findTargetlistEntry(pstate, lfirst(gl), targetlist, GROUP_CLAUSE); /* avoid making duplicate grouplist entries */ ! if (targetIsInSortList(tle, glist)) ! continue; ! ! /* ! * If the GROUP BY clause matches the ORDER BY clause, we want to ! * adopt the ordering operators from the latter rather than using ! * the default ops. This allows "GROUP BY foo ORDER BY foo DESC" to ! * be done with only one sort step. Note we are assuming that any ! * user-supplied ordering operator will bring equal values together, ! * which is all that GROUP BY needs. ! */ ! if (sortClause && ! ((SortClause *) lfirst(sortClause))->tleSortGroupRef == ! tle->resdom->ressortgroupref) ! { ! ordering_op = ((SortClause *) lfirst(sortClause))->sortop; ! sortClause = lnext(sortClause); ! } ! else ! { ! ordering_op = any_ordering_op(tle->resdom->restype); ! sortClause = NIL; /* disregard ORDER BY once match fails */ ! } ! ! grpcl = makeNode(GroupClause); ! grpcl->tleSortGroupRef = assignSortGroupRef(tle, targetlist); ! grpcl->sortop = ordering_op; ! glist = lappend(glist, grpcl); } return glist; diff -rcN postgresql-7.3.3/src/include/parser/parse_clause.h postgresql-7.3.3-grouporderby/src/include/parser/parse_clause.h *** postgresql-7.3.3/src/include/parser/parse_clause.h Thu Jun 20 22:29:51 2002 --- postgresql-7.3.3-grouporderby/src/include/parser/parse_clause.h Mon Jun 16 00:08:43 2003 *************** *** 22,28 **** extern bool interpretInhOption(InhOption inhOpt); extern Node *transformWhereClause(ParseState *pstate, Node *where); extern List *transformGroupClause(ParseState *pstate, List *grouplist, ! List *targetlist); extern List *transformSortClause(ParseState *pstate, List *orderlist, List *targetlist); extern List *transformDistinctClause(ParseState *pstate, List *distinctlist, --- 22,28 ---- extern bool interpretInhOption(InhOption inhOpt); extern Node *transformWhereClause(ParseState *pstate, Node *where); extern List *transformGroupClause(ParseState *pstate, List *grouplist, ! List *targetlist, List *sortClause); extern List *transformSortClause(ParseState *pstate, List *orderlist, List *targetlist); extern List *transformDistinctClause(ParseState *pstate, List *distinctlist,
On 16 Jun 2003 at 0:37, Tomaz Borstnar wrote: > > krtjavendan34=> EXPLAIN ANALYZE SELECT thread, modifystamp, count(id) AS > tcount,abstime(modifystamp) AS latest, max(id) as maxid FROM tjavendan > WHERE approved='Y' GROUP BY modifystamp, thread ORDER BY modifystamp desc, > thread desc limit 40; > QUERY > PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Limit (cost=0.00..97.13 rows=40 width=12) (actual time=1.07..48.71 > rows=40 loops=1) > -> Aggregate (cost=0.00..20947.38 rows=8626 width=12) (actual > time=1.05..48.23 rows=41 loops=1) > -> Group (cost=0.00..20516.06 rows=86265 width=12) (actual > time=0.35..42.25 rows=843 loops=1) > -> Index Scan Backward using tjavendan_modstamp_thrd on > tjavendan (cost=0.00..20084.73 rows=86265 width=12) (actual > time=0.34..31.29 rows=844 loops=1) > Filter: (approved = 'Y'::bpchar) > Total runtime: 50.20 msec > (6 rows) > > Used to be between 5800 and 6741 msec before this patch! Good that the patch works for you. But as I see there is an improvement in plan. Not nitpicking but what does actual performance difference between system before patch and after patch? Bye Shridhar -- QOTD: "In the shopping mall of the mind, he's in the toy department."
At 08:15 16.6.2003, Shridhar Daithankar wrote: > > Total runtime: 50.20 msec > > Used to be between 5800 and 6741 msec before this patch! > >Good that the patch works for you. But as I see there is an improvement in >plan. Not nitpicking but what does actual performance difference between >system >before patch and after patch? A lot since this is query to get list of last active threads sorted by last modified date. With times less than 300ms you mostly do not notice slower query as there could be other factors to affect the speed like network delays and such. But people on fast links will notice that it takes a bit long to display list of threads - especially when the system is using PHP accelerator and compression. So this really means major increase of performance for real situation - forum with over 85 000 messages where you get rid of full scan and 2 full sorts to display list of msgs which happens a lot. You can always use some query/page caching things, but then people start to post duplicates, because they think the message did not make it into the database. Tomaz
On 16 Jun 2003 at 9:08, Tomaz Borstnar wrote: > So this really means major increase of performance for real situation - > forum with over 85 000 messages where you get rid of full scan and 2 full > sorts to display list of msgs which happens a lot. You can always use some > query/page caching things, but then people start to post duplicates, > because they think the message did not make it into the database. OTOH, I was thinking of your original problem. If you could have two identical tables, one to store incoming posts and other to store approved posts, that should be lot more simpler. Of course you need to vacuum much more if you are deleting from in queue but the kind of database you are handling, me need not tell you about vacuum.. Just a though.. Bye Shridhar -- Drew's Law of Highway Biology: The first bug to hit a clean windshield lands directly in front of your eyes.