Re: FW: was [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1 - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: FW: was [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1
Date
Msg-id 200708281544.l7SFi3l26126@momjian.us
Whole thread Raw
In response to FW: was [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1  ("Luke Lonergan" <llonergan@greenplum.com>)
List pgsql-hackers
This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Luke Lonergan wrote:
> Below is a patch against Greenplum Database that fixes the problem.
> 
> - Luke
> 
> ------ Forwarded Message
> From: Luke Lonergan <llonergan@greenplum.com>
> Date: Fri, 24 Aug 2007 09:25:53 -0700
> To: Heikki Linnakangas <heikki@enterprisedb.com>, Anton <anton200@gmail.com>
> Cc: <pgsql-performance@postgresql.org>
> Conversation: [PERFORM] partitioned table and ORDER BY indexed_field DESC
> LIMIT 1
> Subject: Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC
> LIMIT 1
> 
> Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
> it?
> 
> This enables the use of index scan of a child table by recognizing sort
> order of the append node.  Kurt Harriman did the work.
> 
> - Luke
> 
> Index: cdb-pg/src/backend/optimizer/path/indxpath.c
> ===================================================================
> RCS file: 
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/path/indxpath.c,v
> diff -u -N -r1.22 -r1.22.2.1
> --- cdb-pg/src/backend/optimizer/path/indxpath.c    25 Apr 2007 22:07:21
> -0000    1.22
> +++ cdb-pg/src/backend/optimizer/path/indxpath.c    10 Aug 2007 03:41:15
> -0000    1.22.2.1
> @@ -379,8 +379,51 @@
>              index_pathkeys = build_index_pathkeys(root, index,
>                                                    ForwardScanDirection,
>                                                    true);
> -            useful_pathkeys = truncate_useless_pathkeys(root, rel,
> -                                                        index_pathkeys);
> +            /*
> +             * CDB: For appendrel child, pathkeys contain Var nodes in
> terms 
> +             * of the child's baserel.  Transform the pathkey list to refer
> to 
> +             * columns of the appendrel.
> +             */
> +            if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
> +            {
> +                AppendRelInfo  *appinfo = NULL;
> +                RelOptInfo     *appendrel = NULL;
> +                ListCell       *appcell;
> +                CdbPathLocus    notalocus;
> +
> +                /* Find the appendrel of which this baserel is a child. */
> +                foreach(appcell, root->append_rel_list)
> +                {
> +                    appinfo = (AppendRelInfo *)lfirst(appcell);
> +                    if (appinfo->child_relid == rel->relid)
> +                        break;
> +                }
> +                Assert(appinfo);
> +                appendrel = find_base_rel(root, appinfo->parent_relid);
> +
> +                /*
> +                 * The pathkey list happens to have the same format as the
> +                 * partitioning key of a Hashed locus, so by disguising it
> +                 * we can use cdbpathlocus_pull_above_projection() to do
> the 
> +                 * transformation.
> +                 */
> +                CdbPathLocus_MakeHashed(¬alocus, index_pathkeys);
> +                notalocus =
> +                    cdbpathlocus_pull_above_projection(root,
> +                                                       notalocus,
> +                                                       rel->relids,
> +                                                       rel->reltargetlist,
> +                  
> appendrel->reltargetlist,
> +                                                       appendrel->relid);
> +                if (CdbPathLocus_IsHashed(notalocus))
> +                    index_pathkeys = truncate_useless_pathkeys(root,
> appendrel,
> +                  
> notalocus.partkey);
> +                else
> +                    index_pathkeys = NULL;
> +            }
> +
> +            useful_pathkeys = truncate_useless_pathkeys(root, rel,
> +                                                        index_pathkeys);
>          }
>          else
>              useful_pathkeys = NIL;
> Index: cdb-pg/src/backend/optimizer/path/pathkeys.c
> ===================================================================
> RCS file: 
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/path/pathkeys.c,v
> diff -u -N -r1.18 -r1.18.2.1
> --- cdb-pg/src/backend/optimizer/path/pathkeys.c    30 Apr 2007 05:44:07
> -0000    1.18
> +++ cdb-pg/src/backend/optimizer/path/pathkeys.c    10 Aug 2007 03:41:15
> -0000    1.18.2.1
> @@ -1403,55 +1403,53 @@
>  {
>      PathKeyItem    *item;
>      Expr           *newexpr;
> +    AttrNumber  targetindex;
>  
>      Assert(pathkey);
>  
> -    /* Use constant expr if available.  Will be at head of list. */
> -    if (CdbPathkeyEqualsConstant(pathkey))
> +    /* Find an expr that we can rewrite to use the projected columns. */
> +    item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> +                                                 relids,
> +                                                 targetlist,
> +                                                 &targetindex); // OUT
> +    
> +    /* If not found, see if the equiv class contains a constant expr. */
> +    if (!item &&
> +        CdbPathkeyEqualsConstant(pathkey))
>      {
>          item = (PathKeyItem *)linitial(pathkey);
>          newexpr = (Expr *)copyObject(item->key);
>      }
>  
> -    /* New vars for old! */
> -    else
> -    {
> -        AttrNumber  targetindex;
> +    /* Fail if no usable expr. */
> +    else if (!item)
> +        return NULL;
>  
> -        /* Find an expr that we can rewrite to use the projected columns.
> */
> -        item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> -                                                     relids,
> -                                                     targetlist,
> -                                                     &targetindex); // OUT
> -        if (!item)
> -            return NULL;
> +    /* If found matching targetlist item, make a Var that references it. */
> +    else if (targetindex > 0)
> +        newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> +                                            targetindex,
> +                                            newvarlist,
> +                                            (Expr *)item->key);
>  
> -        /* If found matching targetlist item, make a Var that references
> it. */
> -        if (targetindex > 0)
> -            newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> -                                                targetindex,
> -                                                newvarlist,
> -                                                (Expr *)item->key);
> +    /* Replace expr's Var nodes with new ones referencing the targetlist.
> */
> +    else
> +        newexpr = cdbpullup_expr((Expr *)item->key,
> +                                 targetlist,
> +                                 newvarlist,
> +                                 newrelid);
>  
> -        /* Replace expr's Var nodes with new ones referencing the
> targetlist. */
> -        else
> -            newexpr = cdbpullup_expr((Expr *)item->key,
> -                                     targetlist,
> -                                     newvarlist,
> -                                     newrelid);
> +    /* Pull up RelabelType node too, unless tlist expr has right type. */
> +    if (IsA(item->key, RelabelType))
> +    {
> +        RelabelType    *oldrelabel = (RelabelType *)item->key;
>  
> -        /* Pull up RelabelType node too, unless tlist expr has right type.
> */
> -        if (IsA(item->key, RelabelType))
> -        {
> -            RelabelType    *oldrelabel = (RelabelType *)item->key;
> -
> -            if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> -                oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> -                newexpr = (Expr *)makeRelabelType(newexpr,
> -                                                  oldrelabel->resulttype,
> -                                                  oldrelabel->resulttypmod,
> -                  
> oldrelabel->relabelformat);
> -        }
> +        if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> +            oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> +            newexpr = (Expr *)makeRelabelType(newexpr,
> +                                              oldrelabel->resulttype,
> +                                              oldrelabel->resulttypmod,
> +                                              oldrelabel->relabelformat);
>      }
>      Insist(newexpr);
>  
> Index: cdb-pg/src/backend/optimizer/util/pathnode.c
> ===================================================================
> RCS file: 
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/util/pathnode.c,v
> diff -u -N -r1.52.2.4 -r1.52.2.5
> --- cdb-pg/src/backend/optimizer/util/pathnode.c    5 Aug 2007 23:06:44
> -0000    1.52.2.4
> +++ cdb-pg/src/backend/optimizer/util/pathnode.c    10 Aug 2007 03:41:15
> -0000    1.52.2.5
> @@ -1563,7 +1563,15 @@
>              pathnode->path.rescannable = false;
>      }
>  
> -    return pathnode;
> +    /* 
> +     * CDB: If there is exactly one subpath, its ordering is preserved.
> +     * Child rel's pathkey exprs are already expressed in terms of the
> +     * columns of the parent appendrel.  See find_usable_indexes().
> +     */
> +    if (list_length(subpaths) == 1)
> +        pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys;
> +    
> +    return pathnode;
>  }
>  
>  /*
> 
> 
> On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:
> 
> > Anton wrote:
> >>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
> >>>>                                                QUERY PLAN
> >>> ----------------------------------------------------------------------------
> >>> -----------------------------
> >>>> Limit  (cost=824637.69..824637.69 rows=1 width=32)
> >>>>    ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
> >>>>          Sort Key: public.n_traf.date_time
> >>>>          ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
> >>>>                ->  Append  (cost= 0.00..100877.99 rows=5643499 width=32)
> >>>>                      ->  Seq Scan on n_traf  (cost=0.00..22.30
> >>>> rows=1230 width=32)
> >>>>                      ->  Seq Scan on n_traf_y2007m01 n_traf
> >>>> (cost=0.00..22.30 rows=1230 width=32)
> >> ...
> >>>>                      ->  Seq Scan on n_traf_y2007m12 n_traf
> >>>> (cost=0.00..22.30 rows=1230 width=32)
> >>>> (18 rows)
> >>>> 
> >>>> Why it no uses indexes at all?
> >>>> -------------------------------------------
> >>> I'm no expert but I'd guess that the the planner doesn't know which
> >>> partition holds the latest time so it has to read them all.
> >> 
> >> Agree. But why it not uses indexes when it reading them?
> > 
> > The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> > below the append node. Therefore it needs to fetch all rows from all the
> > tables, and the fastest way to do that is a seq scan.
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
> 
>                 http://www.postgresql.org/about/donate
> 
> ------ End of Forwarded Message
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Dave Page
Date:
Subject: Re: Insufficient attention to security in contrib (mostly)
Next
From: "Heikki Linnakangas"
Date:
Subject: Re: Testing the other tsearch dictionaries