Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1 - Mailing list pgsql-performance

From Heikki Linnakangas
Subject Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1
Date
Msg-id 46D3F6E0.8010005@enterprisedb.com
Whole thread Raw
In response to Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1  ("Luke Lonergan" <llonergan@greenplum.com>)
Responses Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1
List pgsql-performance
Bruce, would you please add this to the 8.4 patch queue so we remember
to look at this later?

It didn't occur to me that we can do that in the degenerate case when
there's just a single node below the Append. A more general solution
would be to check if the pathkeys of all the child nodes match, and do a
"merge append" similar to a merge join.

Luke Lonergan wrote:
> Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
> it?
>
> This enables the use of index scan of a child table by recognizing sort
> order of the append node.  Kurt Harriman did the work.
>
> - Luke
>
> Index: cdb-pg/src/backend/optimizer/path/indxpath.c
> ===================================================================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/path/indxpath.c,v
> diff -u -N -r1.22 -r1.22.2.1
> --- cdb-pg/src/backend/optimizer/path/indxpath.c    25 Apr 2007 22:07:21
> -0000    1.22
> +++ cdb-pg/src/backend/optimizer/path/indxpath.c    10 Aug 2007 03:41:15
> -0000    1.22.2.1
> @@ -379,8 +379,51 @@
>              index_pathkeys = build_index_pathkeys(root, index,
>                                                    ForwardScanDirection,
>                                                    true);
> -            useful_pathkeys = truncate_useless_pathkeys(root, rel,
> -                                                        index_pathkeys);
> +            /*
> +             * CDB: For appendrel child, pathkeys contain Var nodes in
> terms
> +             * of the child's baserel.  Transform the pathkey list to refer
> to
> +             * columns of the appendrel.
> +             */
> +            if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
> +            {
> +                AppendRelInfo  *appinfo = NULL;
> +                RelOptInfo     *appendrel = NULL;
> +                ListCell       *appcell;
> +                CdbPathLocus    notalocus;
> +
> +                /* Find the appendrel of which this baserel is a child. */
> +                foreach(appcell, root->append_rel_list)
> +                {
> +                    appinfo = (AppendRelInfo *)lfirst(appcell);
> +                    if (appinfo->child_relid == rel->relid)
> +                        break;
> +                }
> +                Assert(appinfo);
> +                appendrel = find_base_rel(root, appinfo->parent_relid);
> +
> +                /*
> +                 * The pathkey list happens to have the same format as the
> +                 * partitioning key of a Hashed locus, so by disguising it
> +                 * we can use cdbpathlocus_pull_above_projection() to do
> the
> +                 * transformation.
> +                 */
> +                CdbPathLocus_MakeHashed(¬alocus, index_pathkeys);
> +                notalocus =
> +                    cdbpathlocus_pull_above_projection(root,
> +                                                       notalocus,
> +                                                       rel->relids,
> +                                                       rel->reltargetlist,
> +
> appendrel->reltargetlist,
> +                                                       appendrel->relid);
> +                if (CdbPathLocus_IsHashed(notalocus))
> +                    index_pathkeys = truncate_useless_pathkeys(root,
> appendrel,
> +
> notalocus.partkey);
> +                else
> +                    index_pathkeys = NULL;
> +            }
> +
> +            useful_pathkeys = truncate_useless_pathkeys(root, rel,
> +                                                        index_pathkeys);
>          }
>          else
>              useful_pathkeys = NIL;
> Index: cdb-pg/src/backend/optimizer/path/pathkeys.c
> ===================================================================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/path/pathkeys.c,v
> diff -u -N -r1.18 -r1.18.2.1
> --- cdb-pg/src/backend/optimizer/path/pathkeys.c    30 Apr 2007 05:44:07
> -0000    1.18
> +++ cdb-pg/src/backend/optimizer/path/pathkeys.c    10 Aug 2007 03:41:15
> -0000    1.18.2.1
> @@ -1403,55 +1403,53 @@
>  {
>      PathKeyItem    *item;
>      Expr           *newexpr;
> +    AttrNumber  targetindex;
>
>      Assert(pathkey);
>
> -    /* Use constant expr if available.  Will be at head of list. */
> -    if (CdbPathkeyEqualsConstant(pathkey))
> +    /* Find an expr that we can rewrite to use the projected columns. */
> +    item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> +                                                 relids,
> +                                                 targetlist,
> +                                                 &targetindex); // OUT
> +
> +    /* If not found, see if the equiv class contains a constant expr. */
> +    if (!item &&
> +        CdbPathkeyEqualsConstant(pathkey))
>      {
>          item = (PathKeyItem *)linitial(pathkey);
>          newexpr = (Expr *)copyObject(item->key);
>      }
>
> -    /* New vars for old! */
> -    else
> -    {
> -        AttrNumber  targetindex;
> +    /* Fail if no usable expr. */
> +    else if (!item)
> +        return NULL;
>
> -        /* Find an expr that we can rewrite to use the projected columns.
> */
> -        item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> -                                                     relids,
> -                                                     targetlist,
> -                                                     &targetindex); // OUT
> -        if (!item)
> -            return NULL;
> +    /* If found matching targetlist item, make a Var that references it. */
> +    else if (targetindex > 0)
> +        newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> +                                            targetindex,
> +                                            newvarlist,
> +                                            (Expr *)item->key);
>
> -        /* If found matching targetlist item, make a Var that references
> it. */
> -        if (targetindex > 0)
> -            newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> -                                                targetindex,
> -                                                newvarlist,
> -                                                (Expr *)item->key);
> +    /* Replace expr's Var nodes with new ones referencing the targetlist.
> */
> +    else
> +        newexpr = cdbpullup_expr((Expr *)item->key,
> +                                 targetlist,
> +                                 newvarlist,
> +                                 newrelid);
>
> -        /* Replace expr's Var nodes with new ones referencing the
> targetlist. */
> -        else
> -            newexpr = cdbpullup_expr((Expr *)item->key,
> -                                     targetlist,
> -                                     newvarlist,
> -                                     newrelid);
> +    /* Pull up RelabelType node too, unless tlist expr has right type. */
> +    if (IsA(item->key, RelabelType))
> +    {
> +        RelabelType    *oldrelabel = (RelabelType *)item->key;
>
> -        /* Pull up RelabelType node too, unless tlist expr has right type.
> */
> -        if (IsA(item->key, RelabelType))
> -        {
> -            RelabelType    *oldrelabel = (RelabelType *)item->key;
> -
> -            if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> -                oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> -                newexpr = (Expr *)makeRelabelType(newexpr,
> -                                                  oldrelabel->resulttype,
> -                                                  oldrelabel->resulttypmod,
> -
> oldrelabel->relabelformat);
> -        }
> +        if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> +            oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> +            newexpr = (Expr *)makeRelabelType(newexpr,
> +                                              oldrelabel->resulttype,
> +                                              oldrelabel->resulttypmod,
> +                                              oldrelabel->relabelformat);
>      }
>      Insist(newexpr);
>
> Index: cdb-pg/src/backend/optimizer/util/pathnode.c
> ===================================================================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/util/pathnode.c,v
> diff -u -N -r1.52.2.4 -r1.52.2.5
> --- cdb-pg/src/backend/optimizer/util/pathnode.c    5 Aug 2007 23:06:44
> -0000    1.52.2.4
> +++ cdb-pg/src/backend/optimizer/util/pathnode.c    10 Aug 2007 03:41:15
> -0000    1.52.2.5
> @@ -1563,7 +1563,15 @@
>              pathnode->path.rescannable = false;
>      }
>
> -    return pathnode;
> +    /*
> +     * CDB: If there is exactly one subpath, its ordering is preserved.
> +     * Child rel's pathkey exprs are already expressed in terms of the
> +     * columns of the parent appendrel.  See find_usable_indexes().
> +     */
> +    if (list_length(subpaths) == 1)
> +        pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys;
> +
> +    return pathnode;
>  }
>
>  /*
>
>
> On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:
>
>> Anton wrote:
>>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>>>>>                                                QUERY PLAN
>>>> ----------------------------------------------------------------------------
>>>> -----------------------------
>>>>> Limit  (cost=824637.69..824637.69 rows=1 width=32)
>>>>>    ->  Sort  (cost=824637.69..838746.44 rows=5643499 width=32)
>>>>>          Sort Key: public.n_traf.date_time
>>>>>          ->  Result  (cost=0.00..100877.99 rows=5643499 width=32)
>>>>>                ->  Append  (cost= 0.00..100877.99 rows=5643499 width=32)
>>>>>                      ->  Seq Scan on n_traf  (cost=0.00..22.30
>>>>> rows=1230 width=32)
>>>>>                      ->  Seq Scan on n_traf_y2007m01 n_traf
>>>>> (cost=0.00..22.30 rows=1230 width=32)
>>> ...
>>>>>                      ->  Seq Scan on n_traf_y2007m12 n_traf
>>>>> (cost=0.00..22.30 rows=1230 width=32)
>>>>> (18 rows)
>>>>>
>>>>> Why it no uses indexes at all?
>>>>> -------------------------------------------
>>>> I'm no expert but I'd guess that the the planner doesn't know which
>>>> partition holds the latest time so it has to read them all.
>>> Agree. But why it not uses indexes when it reading them?
>> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
>> below the append node. Therefore it needs to fetch all rows from all the
>> tables, and the fastest way to do that is a seq scan.
>
>


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

pgsql-performance by date:

Previous
From: Willo van der Merwe
Date:
Subject: Re: Performance issue
Next
From: Robins
Date:
Subject: Performance across multiple schemas