Thread: any way to use indexscan to get last X values with "order by Y limit X" clause?

Similar question was
http://archives.postgresql.org/pgsql-admin/2002-05/msg00148.php, but google
did not have answer for it.

Here is the structure:

    Column    |           Type           |      Modifiers
-------------+--------------------------+----------------------
  id          | integer                  | not null default '0'
  datestamp   | timestamp with time zone | not null
  thread      | integer                  | not null default '0'
  parent      | integer                  | not null default '0'
  author      | character(37)            | not null default ''
  subject     | character(255)           | not null default ''
  email       | character(200)           | not null default ''
  attachment  | character(64)            | default ''
  host        | character(50)            | not null default ''
  email_reply | character(1)             | not null default 'N'
  approved    | character(1)             | not null default 'N'
  msgid       | character(100)           | not null default ''
  modifystamp | integer                  | not null default '0'
  userid      | integer                  | not null default '0'
  closed      | smallint                 | default '0'
Indexes: tjavendanpri_key primary key btree (id),
          tjavendan_approved btree (approved),
          tjavendan_author btree (author),
          tjavendan_datestamp btree (datestamp),
          tjavendan_modifystamp btree (modifystamp),
          tjavendan_msgid btree (msgid),
          tjavendan_parent btree (parent),
          tjavendan_subject btree (subject),
          tjavendan_thread btree (thread),
          tjavendan_userid btree (userid)

Here is the query:
SELECT thread, modifystamp, count(id) AS tcount, abstime(modifystamp) AS
latest, max(id) as maxid FROM tjavendan WHERE approved='Y'  GROUP BY
thread, modifystamp ORDER BY modifystamp desc, thread desc limit 40

and explain analyze for it:

krtjavendan34=> EXPLAIN ANALYZE SELECT thread, modifystamp, count(id) AS
tcount, abstime(modifystamp) AS latest, max(id) as maxid FROM tjavendan
WHERE approved='Y'  GROUP BY thread, modifystamp ORDER BY modifystamp desc,
thread desc limit 40;
                                                                   QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=18419.78..18419.88 rows=40 width=12) (actual
time=6735.06..6735.69 rows=40 loops=1)
    ->  Sort  (cost=18419.78..18441.34 rows=8626 width=12) (actual
time=6735.04..6735.25 rows=41 loops=1)
          Sort Key: modifystamp, thread
          ->  Aggregate  (cost=16777.53..17855.84 rows=8626 width=12)
(actual time=4605.01..6711.27 rows=2938 loops=1)
                ->  Group  (cost=16777.53..17424.52 rows=86265 width=12)
(actual time=4604.85..6164.29 rows=86265 loops=1)
                      ->  Sort  (cost=16777.53..16993.19 rows=86265
width=12) (actual time=4604.82..5130.14 rows=86265 loops=1)
                            Sort Key: thread, modifystamp
                            ->  Seq Scan on tjavendan  (cost=0.00..9705.31
rows=86265 width=12) (actual time=0.13..3369.28 rows=86265 loops=1)
                                  Filter: (approved = 'Y'::bpchar)
  Total runtime: 6741.12 msec
(10 rows)

This is on 7.3.3.

Having backwards reading of index would really help here.

Thanks in advance.

Tomaz



Re: any way to use indexscan to get last X values with "order by Y limit X" clause?

From
"Shridhar Daithankar"
Date:
On 15 Jun 2003 at 16:26, Tomaz Borstnar wrote:

>
> Here is the structure:
<snip>
>   approved    | character(1)             | not null default 'N'
>   msgid       | character(100)           | not null default ''
>   modifystamp | integer                  | not null default '0'
>   userid      | integer                  | not null default '0'
>   closed      | smallint                 | default '0'
> Indexes: tjavendanpri_key primary key btree (id),
>           tjavendan_approved btree (approved),

<snip>

> Here is the query:
> SELECT thread, modifystamp, count(id) AS tcount, abstime(modifystamp) AS
> latest, max(id) as maxid FROM tjavendan WHERE approved='Y'  GROUP BY
> thread, modifystamp ORDER BY modifystamp desc, thread desc limit 40

Question. The field approved seems to have boolean values. If probability of
having either of value is 50%, I doubt planner will use index anyway.

Even assuming all possible values of a char variable, the choice isn't too
much, say if you have 1M row.

Correct me if I am wrong.

Bye
 Shridhar

--
Either one of us, by himself, is expendable.  Both of us are not.        -- Kirk,
"The Devil in the Dark", stardate 3196.1


Re: any way to use indexscan to get last X values

From
Tomaz Borstnar
Date:
At 16:31 15.6.2003, Shridhar Daithankar wrote:

>Question. The field approved seems to have boolean values. If probability of
>having either of value is 50%, I doubt planner will use index anyway.

True. It has Y or N only so index on approved is useless. But using index
on ORDER BY part would help a lot since it knows to fetch last X ordered
values.

>Correct me if I am wrong.
Unfortunately you are very right.

I am not sure how to stuff modifystamp and thread into WHERE clause to make
it use indexes on thread and/or modifystamp. So far I believe this would be
the only way to use them, right?

Tomaz



Re: any way to use indexscan to get last X values with

From
Stephan Szabo
Date:
On Sun, 15 Jun 2003, Tomaz Borstnar wrote:

> Similar question was
> http://archives.postgresql.org/pgsql-admin/2002-05/msg00148.php, but google
> did not have answer for it.
>
> Here is the structure:
>
>     Column    |           Type           |      Modifiers
> -------------+--------------------------+----------------------
>   id          | integer                  | not null default '0'
>   datestamp   | timestamp with time zone | not null
>   thread      | integer                  | not null default '0'
>   parent      | integer                  | not null default '0'
>   author      | character(37)            | not null default ''
>   subject     | character(255)           | not null default ''
>   email       | character(200)           | not null default ''
>   attachment  | character(64)            | default ''
>   host        | character(50)            | not null default ''
>   email_reply | character(1)             | not null default 'N'
>   approved    | character(1)             | not null default 'N'
>   msgid       | character(100)           | not null default ''
>   modifystamp | integer                  | not null default '0'
>   userid      | integer                  | not null default '0'
>   closed      | smallint                 | default '0'
> Indexes: tjavendanpri_key primary key btree (id),
>           tjavendan_approved btree (approved),
>           tjavendan_author btree (author),
>           tjavendan_datestamp btree (datestamp),
>           tjavendan_modifystamp btree (modifystamp),
>           tjavendan_msgid btree (msgid),
>           tjavendan_parent btree (parent),
>           tjavendan_subject btree (subject),
>           tjavendan_thread btree (thread),
>           tjavendan_userid btree (userid)
>
> Here is the query:
> SELECT thread, modifystamp, count(id) AS tcount, abstime(modifystamp) AS
> latest, max(id) as maxid FROM tjavendan WHERE approved='Y'  GROUP BY
> thread, modifystamp ORDER BY modifystamp desc, thread desc limit 40

I'm not sure that it'd help since I don't think it'd realize that it
doesn't actually need to completely do the group by due to the order by,
but in any case, in the above, the sort orders are different for the group
by and the order by and you'd really want a two column index on (probably)
(modifystamp, thread) in order to get the best results on replacing a
scan + sort.



Tomaz Borstnar <tomaz.borstnar@over.net> writes:
> SELECT thread, modifystamp, count(id) AS tcount, abstime(modifystamp) AS
> latest, max(id) as maxid FROM tjavendan WHERE approved='Y'  GROUP BY
> thread, modifystamp ORDER BY modifystamp desc, thread desc limit 40

> Having backwards reading of index would really help here.

The only way that a fast-start plan is useful is if there is a way to do
it with no explicit sort steps at all.  A sort step must read its entire
input before it can produce any output, so you completely blow the
chance of not reading the whole table as soon as there's any sorting.

There are a couple of reasons why this query can't be done using only an
initial indexscan to sort the data:

1. You don't have a suitable index.  Neither an index on modifystamp
alone nor an index on thread alone is of any use to produce a two-column
ordering; you need a two-column index on (modifystamp, thread).

2. The GROUP BY and ORDER BY steps require different sort orders, and so
even if an index satisfied one, there'd still be a sort needed for the
other.  This is partly your fault (writing the columns in different
orders) and partly the system's fault: it's implicitly taking the GROUP
BY entries to be equivalent to ORDER BY ASC, which is overspecification.

I've applied the attached patch to CVS tip to cure the latter problem.
With this, a two-column index, and compatible column ordering in ORDER
BY and GROUP BY, I get a reasonable-looking fast-start plan.  The patch
will not apply exactly against 7.3 because there's a renamed function
call in there, but you could make it work with a little effort.

            regards, tom lane


*** src/backend/parser/analyze.c.orig    Fri Jun  6 11:04:02 2003
--- src/backend/parser/analyze.c    Sun Jun 15 12:05:34 2003
***************
*** 1787,1799 ****
       */
      qry->havingQual = transformWhereClause(pstate, stmt->havingClause);

!     qry->groupClause = transformGroupClause(pstate,
!                                             stmt->groupClause,
!                                             qry->targetList);
!
      qry->sortClause = transformSortClause(pstate,
                                            stmt->sortClause,
                                            qry->targetList);

      qry->distinctClause = transformDistinctClause(pstate,
                                                    stmt->distinctClause,
--- 1787,1804 ----
       */
      qry->havingQual = transformWhereClause(pstate, stmt->havingClause);

!     /*
!      * Transform sorting/grouping stuff.  Do ORDER BY first because both
!      * transformGroupClause and transformDistinctClause need the results.
!      */
      qry->sortClause = transformSortClause(pstate,
                                            stmt->sortClause,
                                            qry->targetList);
+
+     qry->groupClause = transformGroupClause(pstate,
+                                             stmt->groupClause,
+                                             qry->targetList,
+                                             qry->sortClause);

      qry->distinctClause = transformDistinctClause(pstate,
                                                    stmt->distinctClause,
*** src/backend/parser/parse_clause.c.orig    Fri Jun  6 11:04:02 2003
--- src/backend/parser/parse_clause.c    Sun Jun 15 12:19:14 2003
***************
*** 1124,1130 ****
   *      transform a GROUP BY clause
   */
  List *
! transformGroupClause(ParseState *pstate, List *grouplist, List *targetlist)
  {
      List       *glist = NIL,
                 *gl;
--- 1124,1131 ----
   *      transform a GROUP BY clause
   */
  List *
! transformGroupClause(ParseState *pstate, List *grouplist,
!                      List *targetlist, List *sortClause)
  {
      List       *glist = NIL,
                 *gl;
***************
*** 1132,1152 ****
      foreach(gl, grouplist)
      {
          TargetEntry *tle;

          tle = findTargetlistEntry(pstate, lfirst(gl),
                                    targetlist, GROUP_CLAUSE);

          /* avoid making duplicate grouplist entries */
!         if (!targetIsInSortList(tle, glist))
!         {
!             GroupClause *grpcl = makeNode(GroupClause);
!
!             grpcl->tleSortGroupRef = assignSortGroupRef(tle, targetlist);

!             grpcl->sortop = ordering_oper_opid(tle->resdom->restype);
!
!             glist = lappend(glist, grpcl);
          }
      }

      return glist;
--- 1133,1173 ----
      foreach(gl, grouplist)
      {
          TargetEntry *tle;
+         Oid            ordering_op;
+         GroupClause *grpcl;

          tle = findTargetlistEntry(pstate, lfirst(gl),
                                    targetlist, GROUP_CLAUSE);

          /* avoid making duplicate grouplist entries */
!         if (targetIsInSortList(tle, glist))
!             continue;

!         /*
!          * If the GROUP BY clause matches the ORDER BY clause, we want to
!          * adopt the ordering operators from the latter rather than using
!          * the default ops.  This allows "GROUP BY foo ORDER BY foo DESC" to
!          * be done with only one sort step.  Note we are assuming that any
!          * user-supplied ordering operator will bring equal values together,
!          * which is all that GROUP BY needs.
!          */
!         if (sortClause &&
!             ((SortClause *) lfirst(sortClause))->tleSortGroupRef ==
!             tle->resdom->ressortgroupref)
!         {
!             ordering_op = ((SortClause *) lfirst(sortClause))->sortop;
!             sortClause = lnext(sortClause);
          }
+         else
+         {
+             ordering_op = ordering_oper_opid(tle->resdom->restype);
+             sortClause = NIL;    /* disregard ORDER BY once match fails */
+         }
+
+         grpcl = makeNode(GroupClause);
+         grpcl->tleSortGroupRef = assignSortGroupRef(tle, targetlist);
+         grpcl->sortop = ordering_op;
+         glist = lappend(glist, grpcl);
      }

      return glist;
*** src/include/parser/parse_clause.h.orig    Fri Mar 21 20:49:38 2003
--- src/include/parser/parse_clause.h    Sun Jun 15 12:03:13 2003
***************
*** 22,28 ****
  extern bool interpretInhOption(InhOption inhOpt);
  extern Node *transformWhereClause(ParseState *pstate, Node *where);
  extern List *transformGroupClause(ParseState *pstate, List *grouplist,
!                      List *targetlist);
  extern List *transformSortClause(ParseState *pstate, List *orderlist,
                      List *targetlist);
  extern List *transformDistinctClause(ParseState *pstate, List *distinctlist,
--- 22,28 ----
  extern bool interpretInhOption(InhOption inhOpt);
  extern Node *transformWhereClause(ParseState *pstate, Node *where);
  extern List *transformGroupClause(ParseState *pstate, List *grouplist,
!                      List *targetlist, List *sortClause);
  extern List *transformSortClause(ParseState *pstate, List *orderlist,
                      List *targetlist);
  extern List *transformDistinctClause(ParseState *pstate, List *distinctlist,

Re: any way to use indexscan to get last X values

From
Tomaz Borstnar
Date:
At 18:53 15.6.2003, you wrote:
>I've applied the attached patch to CVS tip to cure the latter problem.
>With this, a two-column index, and compatible column ordering in ORDER
>BY and GROUP BY, I get a reasonable-looking fast-start plan.  The patch
>will not apply exactly against 7.3 because there's a renamed function
>call in there, but you could make it work with a little effort.

You mean this:
/*
  * ordering_oper_opid - convenience routine for oprid(ordering_oper())
  *
  * This was formerly called any_ordering_op()
  */

A little later...

WOW!

100 to 130 times faster on same dataset and additional index on
(modifystamp,thread) which was not really useful before this patch!



krtjavendan34=>  EXPLAIN ANALYZE SELECT thread, modifystamp, count(id) AS
tcount,abstime(modifystamp) AS latest, max(id) as maxid FROM tjavendan
WHERE approved='Y'  GROUP BY modifystamp, thread ORDER BY modifystamp desc,
thread desc limit 40;
                                                                                QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..97.13 rows=40 width=12) (actual time=1.07..48.71
rows=40 loops=1)
    ->  Aggregate  (cost=0.00..20947.38 rows=8626 width=12) (actual
time=1.05..48.23 rows=41 loops=1)
          ->  Group  (cost=0.00..20516.06 rows=86265 width=12) (actual
time=0.35..42.25 rows=843 loops=1)
                ->  Index Scan Backward using tjavendan_modstamp_thrd on
tjavendan  (cost=0.00..20084.73 rows=86265 width=12) (actual
time=0.34..31.29 rows=844 loops=1)
                      Filter: (approved = 'Y'::bpchar)
  Total runtime: 50.20 msec
(6 rows)

Used to be between 5800 and 6741 msec before this patch!

Thanks!





Re: any way to use indexscan to get last X values

From
Tomaz Borstnar
Date:
Here is the 7.3.3 patch as it might help others too...

diff -rcN postgresql-7.3.3/src/backend/parser/analyze.c
postgresql-7.3.3-grouporderby/src/backend/parser/analyze.c
*** postgresql-7.3.3/src/backend/parser/analyze.c    Thu Feb 13 23:50:09 2003
--- postgresql-7.3.3-grouporderby/src/backend/parser/analyze.c    Mon Jun 16
00:13:05 2003
***************
*** 1667,1679 ****
        */
       qry->havingQual = transformWhereClause(pstate, stmt->havingClause);

!     qry->groupClause = transformGroupClause(pstate,
!                                             stmt->groupClause,
!                                             qry->targetList);

       qry->sortClause = transformSortClause(pstate,
                                             stmt->sortClause,
                                             qry->targetList);

       qry->distinctClause = transformDistinctClause(pstate,
                                                     stmt->distinctClause,
--- 1667,1682 ----
        */
       qry->havingQual = transformWhereClause(pstate, stmt->havingClause);

!        /*
!         * Transform sorting/grouping stuff.  Do ORDER BY first because both
!         * transformGroupClause and transformDistinctClause need the results.
!         */

       qry->sortClause = transformSortClause(pstate,
                                             stmt->sortClause,
                                             qry->targetList);
+
+        qry->groupClause = transformGroupClause(pstate, stmt->groupClause,
qry->targetList, qry->sortClause);

       qry->distinctClause = transformDistinctClause(pstate,
                                                     stmt->distinctClause,
diff -rcN postgresql-7.3.3/src/backend/parser/parse_clause.c
postgresql-7.3.3-grouporderby/src/backend/parser/parse_clause.c
*** postgresql-7.3.3/src/backend/parser/parse_clause.c    Mon Dec 16 19:39:56 2002
--- postgresql-7.3.3-grouporderby/src/backend/parser/parse_clause.c    Mon Jun
16 00:24:58 2003
***************
*** 1145,1151 ****
    *
    */
   List *
! transformGroupClause(ParseState *pstate, List *grouplist, List *targetlist)
   {
       List       *glist = NIL,
                  *gl;
--- 1145,1151 ----
    *
    */
   List *
! transformGroupClause(ParseState *pstate, List *grouplist, List
*targetlist, List *sortClause)
   {
       List       *glist = NIL,
                  *gl;
***************
*** 1153,1173 ****
       foreach(gl, grouplist)
       {
           TargetEntry *tle;

           tle = findTargetlistEntry(pstate, lfirst(gl),
                                     targetlist, GROUP_CLAUSE);

           /* avoid making duplicate grouplist entries */
!         if (!targetIsInSortList(tle, glist))
!         {
!             GroupClause *grpcl = makeNode(GroupClause);
!
!             grpcl->tleSortGroupRef = assignSortGroupRef(tle, targetlist);
!
!             grpcl->sortop = any_ordering_op(tle->resdom->restype);
!
!             glist = lappend(glist, grpcl);
!         }
       }

       return glist;
--- 1153,1193 ----
       foreach(gl, grouplist)
       {
           TargetEntry *tle;
+                Oid  ordering_op;
+                GroupClause *grpcl;

           tle = findTargetlistEntry(pstate, lfirst(gl),
                                     targetlist, GROUP_CLAUSE);

           /* avoid making duplicate grouplist entries */
!                if (targetIsInSortList(tle, glist))
!                        continue;
!
!                /*
!                 * If the GROUP BY clause matches the ORDER BY clause, we
want to
!                 * adopt the ordering operators from the latter rather
than using
!                 * the default ops.  This allows "GROUP BY foo ORDER BY
foo DESC" to
!                 * be done with only one sort step.  Note we are assuming
that any
!                 * user-supplied ordering operator will bring equal values
together,
!                 * which is all that GROUP BY needs.
!                 */
!                if (sortClause &&
!                        ((SortClause *)
lfirst(sortClause))->tleSortGroupRef ==
!                        tle->resdom->ressortgroupref)
!                {
!                        ordering_op = ((SortClause *)
lfirst(sortClause))->sortop;
!                        sortClause = lnext(sortClause);
!                 }
!                else
!                {
!                        ordering_op = any_ordering_op(tle->resdom->restype);
!                        sortClause = NIL;       /* disregard ORDER BY once
match fails */
!                }
!
!                grpcl = makeNode(GroupClause);
!                grpcl->tleSortGroupRef = assignSortGroupRef(tle, targetlist);
!                grpcl->sortop = ordering_op;
!                glist = lappend(glist, grpcl);
       }

       return glist;
diff -rcN postgresql-7.3.3/src/include/parser/parse_clause.h
postgresql-7.3.3-grouporderby/src/include/parser/parse_clause.h
*** postgresql-7.3.3/src/include/parser/parse_clause.h    Thu Jun 20 22:29:51 2002
--- postgresql-7.3.3-grouporderby/src/include/parser/parse_clause.h    Mon Jun
16 00:08:43 2003
***************
*** 22,28 ****
   extern bool interpretInhOption(InhOption inhOpt);
   extern Node *transformWhereClause(ParseState *pstate, Node *where);
   extern List *transformGroupClause(ParseState *pstate, List *grouplist,
!                      List *targetlist);
   extern List *transformSortClause(ParseState *pstate, List *orderlist,
                       List *targetlist);
   extern List *transformDistinctClause(ParseState *pstate, List *distinctlist,
--- 22,28 ----
   extern bool interpretInhOption(InhOption inhOpt);
   extern Node *transformWhereClause(ParseState *pstate, Node *where);
   extern List *transformGroupClause(ParseState *pstate, List *grouplist,
!                      List *targetlist, List *sortClause);
   extern List *transformSortClause(ParseState *pstate, List *orderlist,
                       List *targetlist);
   extern List *transformDistinctClause(ParseState *pstate, List *distinctlist,



Re: any way to use indexscan to get last X values

From
"Shridhar Daithankar"
Date:
On 16 Jun 2003 at 0:37, Tomaz Borstnar wrote:
>
> krtjavendan34=>  EXPLAIN ANALYZE SELECT thread, modifystamp, count(id) AS
> tcount,abstime(modifystamp) AS latest, max(id) as maxid FROM tjavendan
> WHERE approved='Y'  GROUP BY modifystamp, thread ORDER BY modifystamp desc,
> thread desc limit 40;
>                                                                                 QUERY
> PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=0.00..97.13 rows=40 width=12) (actual time=1.07..48.71
> rows=40 loops=1)
>     ->  Aggregate  (cost=0.00..20947.38 rows=8626 width=12) (actual
> time=1.05..48.23 rows=41 loops=1)
>           ->  Group  (cost=0.00..20516.06 rows=86265 width=12) (actual
> time=0.35..42.25 rows=843 loops=1)
>                 ->  Index Scan Backward using tjavendan_modstamp_thrd on
> tjavendan  (cost=0.00..20084.73 rows=86265 width=12) (actual
> time=0.34..31.29 rows=844 loops=1)
>                       Filter: (approved = 'Y'::bpchar)
>   Total runtime: 50.20 msec
> (6 rows)
>
> Used to be between 5800 and 6741 msec before this patch!

Good that the patch works for you. But as I see there is an improvement in
plan. Not nitpicking but what does actual performance difference between system
before patch and after patch?

Bye
 Shridhar

--
QOTD:    "In the shopping mall of the mind, he's in the toy department."


Re: any way to use indexscan to get last X values

From
Tomaz Borstnar
Date:
At 08:15 16.6.2003, Shridhar Daithankar wrote:
> >   Total runtime: 50.20 msec
> > Used to be between 5800 and 6741 msec before this patch!
>
>Good that the patch works for you. But as I see there is an improvement in
>plan. Not nitpicking but what does actual performance difference between
>system
>before patch and after patch?

A lot since this is query to get list of last active threads sorted by last
modified date. With times less than 300ms you mostly do not  notice slower
query as there could be other factors to affect the speed like network
delays and such. But people on fast links will notice that it takes a bit
long to display list of threads - especially when the system is using PHP
accelerator and compression.

So this really means major increase of performance for real situation -
forum with over 85 000 messages where you get rid of full scan and 2 full
sorts to display list of msgs which happens a lot. You can always use some
query/page caching things, but then people start to post duplicates,
because they think the message did not make it into the database.

Tomaz



Re: any way to use indexscan to get last X values

From
"Shridhar Daithankar"
Date:
On 16 Jun 2003 at 9:08, Tomaz Borstnar wrote:
> So this really means major increase of performance for real situation -
> forum with over 85 000 messages where you get rid of full scan and 2 full
> sorts to display list of msgs which happens a lot. You can always use some
> query/page caching things, but then people start to post duplicates,
> because they think the message did not make it into the database.

OTOH, I was thinking of your original problem. If you could have two identical
tables, one to store incoming posts and other to store approved posts, that
should be lot more simpler.

Of course you need to vacuum much more if you are deleting from in queue but
the kind of database you are handling, me need not tell you about vacuum..

Just a though..


Bye
 Shridhar

--
Drew's Law of Highway Biology:    The first bug to hit a clean windshield lands
directly in front    of your eyes.