Thread: [PATCH] add CLUSTER table ORDER BY index

[PATCH] add CLUSTER table ORDER BY index

From
Holger Schurig
Date:
The following table add's a new variant of the CLUSTER command. The old
variants are preserved, as suggested in the TODO entry.

Things I changed:

* The grammar
* psql help text
* psql tab-completion, it favours now CLUSTER table ORDER BY index"
* two uses of CLUSTER in the regression, so that both the old and
  new syntax get checked

Things to consider:

* not yet in the documentation
* psql should probably no longer emit "CLUSTER index ON table"


Index: src/doc/TODO
===================================================================
*** src.orig/doc/TODO    2007-03-27 21:18:01.000000000 +0200
--- src/doc/TODO    2007-03-27 21:18:26.000000000 +0200
***************
*** 624,631 ****

      o %Add VERBOSE option to report tables as they are processed,
        like VACUUM VERBOSE
-     o Add more logical syntax CLUSTER table ORDER BY index;
-       support current syntax for backward compatibility


  * COPY
--- 624,629 ----
Index: src/doc/src/FAQ/TODO.html
===================================================================
*** src.orig/doc/src/FAQ/TODO.html    2007-03-27 21:18:01.000000000 +0200
--- src/doc/src/FAQ/TODO.html    2007-03-27 21:18:26.000000000 +0200
***************
*** 558,565 ****
  </p>
      </li><li>%Add VERBOSE option to report tables as they are processed,
            like VACUUM VERBOSE
-     </li><li>Add more logical syntax CLUSTER table ORDER BY index;
-           support current syntax for backward compatibility
    </li></ul>
    </li><li>COPY
    <ul>
--- 558,563 ----
Index: src/src/backend/parser/gram.y
===================================================================
*** src.orig/src/backend/parser/gram.y    2007-03-27 21:18:01.000000000 +0200
--- src/src/backend/parser/gram.y    2007-03-27 21:18:26.000000000 +0200
***************
*** 209,215 ****

  %type <str>        relation_name copy_file_name
                  database_name access_method_clause access_method attr_name
!                 index_name name file_name

  %type <list>    func_name handler_name qual_Op qual_all_Op subquery_Op
                  opt_class opt_validator
--- 209,215 ----

  %type <str>        relation_name copy_file_name
                  database_name access_method_clause access_method attr_name
!                 index_name name file_name opt_cluster_order_by

  %type <list>    func_name handler_name qual_Op qual_all_Op subquery_Op
                  opt_class opt_validator
***************
*** 5327,5332 ****
--- 5327,5333 ----
   *
   *        QUERY:
   *                cluster <index_name> on <qualified_name>
+  *                cluster <qualified_name> ORDER BY <index_name>
   *                cluster <qualified_name>
   *                cluster
   *
***************
*** 5340,5350 ****
                     n->indexname = $2;
                     $$ = (Node*)n;
                  }
!             | CLUSTER qualified_name
                  {
                     ClusterStmt *n = makeNode(ClusterStmt);
                     n->relation = $2;
!                    n->indexname = NULL;
                     $$ = (Node*)n;
                  }
              | CLUSTER
--- 5341,5351 ----
                     n->indexname = $2;
                     $$ = (Node*)n;
                  }
!             | CLUSTER qualified_name opt_cluster_order_by
                  {
                     ClusterStmt *n = makeNode(ClusterStmt);
                     n->relation = $2;
!                    n->indexname = $3;
                     $$ = (Node*)n;
                  }
              | CLUSTER
***************
*** 5356,5361 ****
--- 5357,5368 ----
                  }
          ;

+ opt_cluster_order_by:
+             ORDER BY index_name            { $$ = $3; }
+             | /*EMPTY*/                { $$ = NULL; }
+         ;
+
+
  /*****************************************************************************
   *
   *        QUERY:
Index: src/src/bin/psql/sql_help.h
===================================================================
*** src.orig/src/bin/psql/sql_help.h    2007-03-27 21:18:01.000000000 +0200
--- src/src/bin/psql/sql_help.h    2007-03-27 21:18:26.000000000 +0200
***************
*** 119,125 ****

      { "CLUSTER",
        N_("cluster a table according to an index"),
!       N_("CLUSTER indexname ON tablename\nCLUSTER tablename\nCLUSTER") },

      { "COMMENT",
        N_("define or change the comment of an object"),
--- 119,125 ----

      { "CLUSTER",
        N_("cluster a table according to an index"),
!       N_("CLUSTER indexname ON tablename\nCLUSTER tablename [ORDER BY indexname]\nCLUSTER") },

      { "COMMENT",
        N_("define or change the comment of an object"),
Index: src/src/bin/psql/tab-complete.c
===================================================================
*** src.orig/src/bin/psql/tab-complete.c    2007-03-27 21:18:01.000000000 +0200
--- src/src/bin/psql/tab-complete.c    2007-03-27 21:18:26.000000000 +0200
***************
*** 822,832 ****

          COMPLETE_WITH_LIST(list_COLUMNALTER);
      }
!     else if (pg_strcasecmp(prev3_wd, "TABLE") == 0 &&
!              pg_strcasecmp(prev_wd, "CLUSTER") == 0)
          COMPLETE_WITH_CONST("ON");
      else if (pg_strcasecmp(prev4_wd, "TABLE") == 0 &&
-              pg_strcasecmp(prev2_wd, "CLUSTER") == 0 &&
               pg_strcasecmp(prev_wd, "ON") == 0)
      {
          completion_info_charp = prev3_wd;
--- 822,830 ----

          COMPLETE_WITH_LIST(list_COLUMNALTER);
      }
!     else if (pg_strcasecmp(prev3_wd, "TABLE") == 0)
          COMPLETE_WITH_CONST("ON");
      else if (pg_strcasecmp(prev4_wd, "TABLE") == 0 &&
               pg_strcasecmp(prev_wd, "ON") == 0)
      {
          completion_info_charp = prev3_wd;
***************
*** 929,952 ****

      /*
       * If the previous word is CLUSTER and not without produce list of
!      * indexes.
       */
      else if (pg_strcasecmp(prev_wd, "CLUSTER") == 0 &&
               pg_strcasecmp(prev2_wd, "WITHOUT") != 0)
!         COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_indexes, NULL);
!     /* If we have CLUSTER <sth>, then add "ON" */
      else if (pg_strcasecmp(prev2_wd, "CLUSTER") == 0 &&
!              pg_strcasecmp(prev_wd, "ON") != 0)
!         COMPLETE_WITH_CONST("ON");

      /*
!      * If we have CLUSTER <sth> ON, then add the correct tablename as well.
       */
!     else if (pg_strcasecmp(prev3_wd, "CLUSTER") == 0 &&
!              pg_strcasecmp(prev_wd, "ON") == 0)
      {
!         completion_info_charp = prev2_wd;
!         COMPLETE_WITH_QUERY(Query_for_table_owning_index);
      }

  /* COMMENT */
--- 927,952 ----

      /*
       * If the previous word is CLUSTER and not without produce list of
!      * tables
       */
      else if (pg_strcasecmp(prev_wd, "CLUSTER") == 0 &&
               pg_strcasecmp(prev2_wd, "WITHOUT") != 0)
!         COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
!     /* If we have CLUSTER <sth>, then add "ORDER BY" */
      else if (pg_strcasecmp(prev2_wd, "CLUSTER") == 0 &&
!              pg_strcasecmp(prev_wd, "ON") != 0) {
!         COMPLETE_WITH_CONST("ORDER BY");
!     }

      /*
!      * If we have CLUSTER <sth> ORDER BY, then add the index as well.
       */
!     else if (pg_strcasecmp(prev4_wd, "CLUSTER") == 0 &&
!              pg_strcasecmp(prev_wd, "BY") == 0 &&
!              pg_strcasecmp(prev2_wd, "ORDER") == 0)
      {
!         completion_info_charp = prev3_wd;
!         COMPLETE_WITH_QUERY(Query_for_index_of_table);
      }

  /* COMMENT */
Index: src/src/test/regress/expected/cluster.out
===================================================================
*** src.orig/src/test/regress/expected/cluster.out    2007-03-27 21:18:01.000000000 +0200
--- src/src/test/regress/expected/cluster.out    2007-03-27 21:18:26.000000000 +0200
***************
*** 329,335 ****
  CLUSTER clstr_2;
  ERROR:  there is no previously clustered index for table "clstr_2"
  CLUSTER clstr_1_pkey ON clstr_1;
! CLUSTER clstr_2_pkey ON clstr_2;
  SELECT * FROM clstr_1 UNION ALL
    SELECT * FROM clstr_2 UNION ALL
    SELECT * FROM clstr_3;
--- 329,335 ----
  CLUSTER clstr_2;
  ERROR:  there is no previously clustered index for table "clstr_2"
  CLUSTER clstr_1_pkey ON clstr_1;
! CLUSTER clstr_2 ORDER BY clstr_2_pkey;
  SELECT * FROM clstr_1 UNION ALL
    SELECT * FROM clstr_2 UNION ALL
    SELECT * FROM clstr_3;
Index: src/src/test/regress/sql/cluster.sql
===================================================================
*** src.orig/src/test/regress/sql/cluster.sql    2007-03-27 21:18:01.000000000 +0200
--- src/src/test/regress/sql/cluster.sql    2007-03-27 21:18:26.000000000 +0200
***************
*** 122,128 ****
  CLUSTER clstr_2;

  CLUSTER clstr_1_pkey ON clstr_1;
! CLUSTER clstr_2_pkey ON clstr_2;
  SELECT * FROM clstr_1 UNION ALL
    SELECT * FROM clstr_2 UNION ALL
    SELECT * FROM clstr_3;
--- 122,128 ----
  CLUSTER clstr_2;

  CLUSTER clstr_1_pkey ON clstr_1;
! CLUSTER clstr_2 ORDER BY clstr_2_pkey;
  SELECT * FROM clstr_1 UNION ALL
    SELECT * FROM clstr_2 UNION ALL
    SELECT * FROM clstr_3;

Re: [PATCH] add CLUSTER table ORDER BY index

From
Gregory Stark
Date:
"Holger Schurig" <holgerschurig@gmx.de> writes:

> * psql tab-completion, it favours now CLUSTER table ORDER BY index"

It occurs to me (sorry that I didn't think of this earlier) that if we're
going to use "ORDER BY" it really ought to take a list columns. It would be
more consistent with what ORDER BY means in queries and one day we may want to
add support for ordering by arbitrary lists of columns even if no index
exists.

It may be reasonable to allow both to coexist and just have an arbitrary rule
that if an index of the specified name exists takes precedence over any
columns. I've never seen anyone name their indexes in a way that would
conflict with a column anyways.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: [PATCH] add CLUSTER table ORDER BY index

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Holger Schurig" <holgerschurig@gmx.de> writes:
>> * psql tab-completion, it favours now CLUSTER table ORDER BY index"

> It occurs to me (sorry that I didn't think of this earlier) that if we're
> going to use "ORDER BY" it really ought to take a list columns.

Surely you jest.  The point is to be ordered the same as the index, no?

            regards, tom lane

Re: [PATCH] add CLUSTER table ORDER BY index

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
>> "Holger Schurig" <holgerschurig@gmx.de> writes:
>>> * psql tab-completion, it favours now CLUSTER table ORDER BY index"
>
>> It occurs to me (sorry that I didn't think of this earlier) that if we're
>> going to use "ORDER BY" it really ought to take a list columns.
>
> Surely you jest.  The point is to be ordered the same as the index, no?

There's some narrow corner cases where it makes sense to CLUSTER without
an index:

* You're going to build an index with the same order after clustering.
It's cheaper to sort the data first and then create index, than to build
index, sort data, and rebuild index.

* You're doing a lot of large sort + merge joins. Sorts are cheaper if
the data is already in order. One might ask, though, why don't you just
create an index then...

* You're using CLUSTER as a VACUUM FULL replacement, and there's no
handy index to sort with. (It'd be better if we had a VACUUM FULL that
rewrites the table like CLUSTER, though)

Though I don't think we're implementing "CLUSTER table ORDER BY col1,
col2" anytime soon, ORDER BY does imply that a list of columns is to
follow. How about "CLUSTER table USING index"?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [PATCH] add CLUSTER table ORDER BY index

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> Though I don't think we're implementing "CLUSTER table ORDER BY col1,
> col2" anytime soon, ORDER BY does imply that a list of columns is to
> follow. How about "CLUSTER table USING index"?

+1 ... AFAIR there was 0 discussion of the exact syntax before,
so I don't feel wedded to ORDER BY.

            regards, tom lane

Re: [PATCH] add CLUSTER table ORDER BY index

From
Holger Schurig
Date:
> +1 ... AFAIR there was 0 discussion of the exact syntax before,
> so I don't feel wedded to ORDER BY.

A changed patch comes with the next e-mail.

I can not create a patch for "CLUSTER table USING col1,col2,col3",
because I'm not yet deep into postgresql and don't have the time
for that. I just thought that the skill level needed for the TODO-
item was in my range :-)