Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1 - Mailing list pgsql-performance
From | Bruce Momjian |
---|---|
Subject | Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1 |
Date | |
Msg-id | 200803121916.m2CJGdQ24439@momjian.us Whole thread Raw |
In response to | Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1 ("Heikki Linnakangas" <heikki@enterprisedb.com>) |
List | pgsql-performance |
Heikki Linnakangas wrote: > Bruce, would you please add this to the 8.4 patch queue so we remember > to look at this later? OK, added to queue, but Tom's patch queue comment is: This is useless since it does not represent a complete patch; the provided code calls a lot of Greenplum-private routines that weren't provided. It's not even reviewable let alone a candidate to apply. --------------------------------------------------------------------------- > > It didn't occur to me that we can do that in the degenerate case when > there's just a single node below the Append. A more general solution > would be to check if the pathkeys of all the child nodes match, and do a > "merge append" similar to a merge join. > > Luke Lonergan wrote: > > Below is a patch against 8.2.4 (more or less), Heikki can you take a look at > > it? > > > > This enables the use of index scan of a child table by recognizing sort > > order of the append node. Kurt Harriman did the work. > > > > - Luke > > > > Index: cdb-pg/src/backend/optimizer/path/indxpath.c > > =================================================================== > > RCS file: > > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > > er/path/indxpath.c,v > > diff -u -N -r1.22 -r1.22.2.1 > > --- cdb-pg/src/backend/optimizer/path/indxpath.c 25 Apr 2007 22:07:21 > > -0000 1.22 > > +++ cdb-pg/src/backend/optimizer/path/indxpath.c 10 Aug 2007 03:41:15 > > -0000 1.22.2.1 > > @@ -379,8 +379,51 @@ > > index_pathkeys = build_index_pathkeys(root, index, > > ForwardScanDirection, > > true); > > - useful_pathkeys = truncate_useless_pathkeys(root, rel, > > - index_pathkeys); > > + /* > > + * CDB: For appendrel child, pathkeys contain Var nodes in > > terms > > + * of the child's baserel. Transform the pathkey list to refer > > to > > + * columns of the appendrel. > > + */ > > + if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL) > > + { > > + AppendRelInfo *appinfo = NULL; > > + RelOptInfo *appendrel = NULL; > > + ListCell *appcell; > > + CdbPathLocus notalocus; > > + > > + /* Find the appendrel of which this baserel is a child. */ > > + foreach(appcell, root->append_rel_list) > > + { > > + appinfo = (AppendRelInfo *)lfirst(appcell); > > + if (appinfo->child_relid == rel->relid) > > + break; > > + } > > + Assert(appinfo); > > + appendrel = find_base_rel(root, appinfo->parent_relid); > > + > > + /* > > + * The pathkey list happens to have the same format as the > > + * partitioning key of a Hashed locus, so by disguising it > > + * we can use cdbpathlocus_pull_above_projection() to do > > the > > + * transformation. > > + */ > > + CdbPathLocus_MakeHashed(¬alocus, index_pathkeys); > > + notalocus = > > + cdbpathlocus_pull_above_projection(root, > > + notalocus, > > + rel->relids, > > + rel->reltargetlist, > > + > > appendrel->reltargetlist, > > + appendrel->relid); > > + if (CdbPathLocus_IsHashed(notalocus)) > > + index_pathkeys = truncate_useless_pathkeys(root, > > appendrel, > > + > > notalocus.partkey); > > + else > > + index_pathkeys = NULL; > > + } > > + > > + useful_pathkeys = truncate_useless_pathkeys(root, rel, > > + index_pathkeys); > > } > > else > > useful_pathkeys = NIL; > > Index: cdb-pg/src/backend/optimizer/path/pathkeys.c > > =================================================================== > > RCS file: > > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > > er/path/pathkeys.c,v > > diff -u -N -r1.18 -r1.18.2.1 > > --- cdb-pg/src/backend/optimizer/path/pathkeys.c 30 Apr 2007 05:44:07 > > -0000 1.18 > > +++ cdb-pg/src/backend/optimizer/path/pathkeys.c 10 Aug 2007 03:41:15 > > -0000 1.18.2.1 > > @@ -1403,55 +1403,53 @@ > > { > > PathKeyItem *item; > > Expr *newexpr; > > + AttrNumber targetindex; > > > > Assert(pathkey); > > > > - /* Use constant expr if available. Will be at head of list. */ > > - if (CdbPathkeyEqualsConstant(pathkey)) > > + /* Find an expr that we can rewrite to use the projected columns. */ > > + item = cdbpullup_findPathKeyItemInTargetList(pathkey, > > + relids, > > + targetlist, > > + &targetindex); // OUT > > + > > + /* If not found, see if the equiv class contains a constant expr. */ > > + if (!item && > > + CdbPathkeyEqualsConstant(pathkey)) > > { > > item = (PathKeyItem *)linitial(pathkey); > > newexpr = (Expr *)copyObject(item->key); > > } > > > > - /* New vars for old! */ > > - else > > - { > > - AttrNumber targetindex; > > + /* Fail if no usable expr. */ > > + else if (!item) > > + return NULL; > > > > - /* Find an expr that we can rewrite to use the projected columns. > > */ > > - item = cdbpullup_findPathKeyItemInTargetList(pathkey, > > - relids, > > - targetlist, > > - &targetindex); // OUT > > - if (!item) > > - return NULL; > > + /* If found matching targetlist item, make a Var that references it. */ > > + else if (targetindex > 0) > > + newexpr = (Expr *)cdbpullup_makeVar(newrelid, > > + targetindex, > > + newvarlist, > > + (Expr *)item->key); > > > > - /* If found matching targetlist item, make a Var that references > > it. */ > > - if (targetindex > 0) > > - newexpr = (Expr *)cdbpullup_makeVar(newrelid, > > - targetindex, > > - newvarlist, > > - (Expr *)item->key); > > + /* Replace expr's Var nodes with new ones referencing the targetlist. > > */ > > + else > > + newexpr = cdbpullup_expr((Expr *)item->key, > > + targetlist, > > + newvarlist, > > + newrelid); > > > > - /* Replace expr's Var nodes with new ones referencing the > > targetlist. */ > > - else > > - newexpr = cdbpullup_expr((Expr *)item->key, > > - targetlist, > > - newvarlist, > > - newrelid); > > + /* Pull up RelabelType node too, unless tlist expr has right type. */ > > + if (IsA(item->key, RelabelType)) > > + { > > + RelabelType *oldrelabel = (RelabelType *)item->key; > > > > - /* Pull up RelabelType node too, unless tlist expr has right type. > > */ > > - if (IsA(item->key, RelabelType)) > > - { > > - RelabelType *oldrelabel = (RelabelType *)item->key; > > - > > - if (oldrelabel->resulttype != exprType((Node *)newexpr) || > > - oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) > > - newexpr = (Expr *)makeRelabelType(newexpr, > > - oldrelabel->resulttype, > > - oldrelabel->resulttypmod, > > - > > oldrelabel->relabelformat); > > - } > > + if (oldrelabel->resulttype != exprType((Node *)newexpr) || > > + oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) > > + newexpr = (Expr *)makeRelabelType(newexpr, > > + oldrelabel->resulttype, > > + oldrelabel->resulttypmod, > > + oldrelabel->relabelformat); > > } > > Insist(newexpr); > > > > Index: cdb-pg/src/backend/optimizer/util/pathnode.c > > =================================================================== > > RCS file: > > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > > er/util/pathnode.c,v > > diff -u -N -r1.52.2.4 -r1.52.2.5 > > --- cdb-pg/src/backend/optimizer/util/pathnode.c 5 Aug 2007 23:06:44 > > -0000 1.52.2.4 > > +++ cdb-pg/src/backend/optimizer/util/pathnode.c 10 Aug 2007 03:41:15 > > -0000 1.52.2.5 > > @@ -1563,7 +1563,15 @@ > > pathnode->path.rescannable = false; > > } > > > > - return pathnode; > > + /* > > + * CDB: If there is exactly one subpath, its ordering is preserved. > > + * Child rel's pathkey exprs are already expressed in terms of the > > + * columns of the parent appendrel. See find_usable_indexes(). > > + */ > > + if (list_length(subpaths) == 1) > > + pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys; > > + > > + return pathnode; > > } > > > > /* > > > > > > On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > > > >> Anton wrote: > >>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; > >>>>> QUERY PLAN > >>>> ---------------------------------------------------------------------------- > >>>> ----------------------------- > >>>>> Limit (cost=824637.69..824637.69 rows=1 width=32) > >>>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) > >>>>> Sort Key: public.n_traf.date_time > >>>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) > >>>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) > >>>>> -> Seq Scan on n_traf (cost=0.00..22.30 > >>>>> rows=1230 width=32) > >>>>> -> Seq Scan on n_traf_y2007m01 n_traf > >>>>> (cost=0.00..22.30 rows=1230 width=32) > >>> ... > >>>>> -> Seq Scan on n_traf_y2007m12 n_traf > >>>>> (cost=0.00..22.30 rows=1230 width=32) > >>>>> (18 rows) > >>>>> > >>>>> Why it no uses indexes at all? > >>>>> ------------------------------------------- > >>>> I'm no expert but I'd guess that the the planner doesn't know which > >>>> partition holds the latest time so it has to read them all. > >>> Agree. But why it not uses indexes when it reading them? > >> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." > >> below the append node. Therefore it needs to fetch all rows from all the > >> tables, and the fastest way to do that is a seq scan. > > > > > > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-performance by date: