Home > mailing lists
Re: hashed crosstab - Mailing list pgsql-patches

From	Bruce Momjian
Subject	Re: hashed crosstab
Date	March 17, 2003 19:26:28
Msg-id	200303180026.h2I0QFh00593@candle.pha.pa.us Whole thread Raw
In response to	hashed crosstab (Joe Conway <mail@joeconway.com>)
List	pgsql-patches
Tree view
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------


Joe Conway wrote:
> Attached is an update to contrib/tablefunc. It implements a new hashed
> version of crosstab. This fixes a major deficiency in real-world use of
> the original version. Easiest to undestand with an illustration:
>
> Data:
> -------------------------------------------------------------------
> select * from cth;
>   id | rowid |        rowdt        |   attribute    |      val
> ----+-------+---------------------+----------------+---------------
>    1 | test1 | 2003-03-01 00:00:00 | temperature    | 42
>    2 | test1 | 2003-03-01 00:00:00 | test_result    | PASS
>    3 | test1 | 2003-03-01 00:00:00 | volts          | 2.6987
>    4 | test2 | 2003-03-02 00:00:00 | temperature    | 53
>    5 | test2 | 2003-03-02 00:00:00 | test_result    | FAIL
>    6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
>    7 | test2 | 2003-03-02 00:00:00 | volts          | 3.1234
> (7 rows)
>
> Original crosstab:
> -------------------------------------------------------------------
> SELECT * FROM crosstab(
>    'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
> AS c(rowid text, temperature text, test_result text, test_startdate
> text, volts text);
>   rowid | temperature | test_result | test_startdate | volts
> -------+-------------+-------------+----------------+--------
>   test1 | 42          | PASS        | 2.6987         |
>   test2 | 53          | FAIL        | 01 March 2003  | 3.1234
> (2 rows)
>
> Hashed crosstab:
> -------------------------------------------------------------------
> SELECT * FROM crosstab(
>    'SELECT rowid, attribute, val FROM cth ORDER BY 1',
>    'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> AS c(rowid text, temperature int4, test_result text, test_startdate
> timestamp, volts float8);
>   rowid | temperature | test_result |   test_startdate    | volts
> -------+-------------+-------------+---------------------+--------
>   test1 |          42 | PASS        |                     | 2.6987
>   test2 |          53 | FAIL        | 2003-03-01 00:00:00 | 3.1234
> (2 rows)
>
> Notice that the original crosstab slides data over to the left in the
> result tuple when it encounters missing data. In order to work around
> this you have to be make your source sql do all sorts of contortions
> (cartesian join of distinct rowid with distinct attribute; left join
> that back to the real source data). The new version avoids this by
> building a hash table using a second distinct attribute query.
>
> The new version also allows for "extra" columns (see the README) and
> allows the result columns to be coerced into differing datatypes if they
> are suitable (as shown above).
>
> In testing a "real-world" data set (69 distinct rowid's, 27 distinct
> categories/attributes, multiple missing data points) I saw about a
> 5-fold improvement in execution time (from about 2200 ms old, to 440 ms
> new).
>
> I left the original version intact because: 1) BC, 2) it is probably
> slightly faster if you know that you have no missing attributes.
>
> README and regression test adjustments included. If there are no
> objections, please apply.
>
> Thanks,
>
> Joe

> Index: contrib/tablefunc/README.tablefunc
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/README.tablefunc,v
> retrieving revision 1.5
> diff -c -r1.5 README.tablefunc
> *** contrib/tablefunc/README.tablefunc    23 Nov 2002 01:54:09 -0000    1.5
> --- contrib/tablefunc/README.tablefunc    3 Mar 2003 03:37:39 -0000
> ***************
> *** 333,338 ****
> --- 333,457 ----
>   ==================================================================
>   Name
>
> + crosstab(text, text) - returns a set of row_name, extra, and
> +                       category value columns
> +
> + Synopsis
> +
> + crosstab(text source_sql, text category_sql)
> +
> + Inputs
> +
> +   source_sql
> +
> +     A SQL statement which produces the source set of data. The SQL statement
> +     must return one row_name column, one category column, and one value
> +     column. It may also have one or more "extra" columns.
> +
> +     The row_name column must be first. The category and value columns
> +     must be the last two columns, in that order. "extra" columns must be
> +     columns 2 through (N - 2), where N is the total number of columns.
> +
> +     The "extra" columns are assumed to be the same for all rows with the
> +     same row_name. The values returned are copied from the first row
> +     with a given row_name and subsequent values of these columns are ignored
> +     until row_name changes.
> +
> +     e.g. source_sql must produce a set something like:
> +          SELECT row_name, extra_col, cat, value FROM foo;
> +
> +              row_name    extra_col   cat    value
> +             ----------+------------+-----+---------
> +               row1         extra1    cat1    val1
> +               row1         extra1    cat2    val2
> +               row1         extra1    cat4    val4
> +               row2         extra2    cat1    val5
> +               row2         extra2    cat2    val6
> +               row2         extra2    cat3    val7
> +               row2         extra2    cat4    val8
> +
> +   category_sql
> +
> +     A SQL statement which produces the distinct set of categories. The SQL
> +     statement must return one category column only. category_sql must produce
> +     at least one result row or an error will be generated. category_sql
> +     must not produce duplicate categories or an error will be generated.
> +
> +     e.g. SELECT DISTINCT cat FROM foo;
> +
> +               cat
> +             -------
> +               cat1
> +               cat2
> +               cat3
> +               cat4
> +
> + Outputs
> +
> +   Returns setof record, which must be defined with a column definition
> +   in the FROM clause of the SELECT statement, e.g.:
> +
> +     SELECT * FROM crosstab(source_sql, cat_sql)
> +     AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
> +
> +     the example crosstab function produces a set something like:
> +                       <== values  columns ==>
> +            row_name   extra   cat1   cat2   cat3   cat4
> +            ---------+-------+------+------+------+------
> +              row1     extra1  val1   val2          val4
> +              row2     extra2  val5   val6   val7   val8
> +
> + Notes
> +
> +   1. source_sql must be ordered by row_name (column 1).
> +
> +   2. The number of values columns is determined at run-time. The
> +      column definition provided in the FROM clause must provide for
> +      the correct number of columns of the proper data types.
> +
> +   3. Missing values (i.e. not enough adjacent rows of same row_name to
> +      fill the number of result values columns) are filled in with nulls.
> +
> +   4. Extra values (i.e. source rows with category not found in category_sql
> +      result) are skipped.
> +
> +   5. Rows with a null row_name column are skipped.
> +
> +
> + Example usage
> +
> + create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
> + insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42');
> + insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS');
> + insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987');
> + insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
> + insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234');
> +
> + SELECT * FROM crosstab
> + (
> +   'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT attribute FROM cth ORDER BY 1'
> + )
> + AS
> + (
> +        rowid text,
> +        rowdt timestamp,
> +        temperature int4,
> +        test_result text,
> +        test_startdate timestamp,
> +        volts float8
> + );
> +  rowid |          rowdt           | temperature | test_result |      test_startdate      | volts
> + -------+--------------------------+-------------+-------------+--------------------------+--------
> +  test1 | Sat Mar 01 00:00:00 2003 |          42 | PASS        |                          | 2.6987
> +  test2 | Sun Mar 02 00:00:00 2003 |          53 | FAIL        | Sat Mar 01 00:00:00 2003 | 3.1234
> + (2 rows)
> +
> + ==================================================================
> + Name
> +
>   connectby(text, text, text, text, int[, text]) - returns a set
>       representing a hierarchy (tree structure)
>
> Index: contrib/tablefunc/tablefunc.c
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/tablefunc.c,v
> retrieving revision 1.11
> diff -c -r1.11 tablefunc.c
> *** contrib/tablefunc/tablefunc.c    23 Nov 2002 01:54:09 -0000    1.11
> --- contrib/tablefunc/tablefunc.c    3 Mar 2003 02:57:50 -0000
> ***************
> *** 39,44 ****
> --- 39,49 ----
>
>   #include "tablefunc.h"
>
> + static int load_categories_hash(char *cats_sql, MemoryContext per_query_ctx);
> + static Tuplestorestate *get_crosstab_tuplestore(char *sql,
> +                                                 int num_categories,
> +                                                 TupleDesc tupdesc,
> +                                                 MemoryContext per_query_ctx);
>   static void validateConnectbyTupleDesc(TupleDesc tupdesc, bool show_branch);
>   static bool compatCrosstabTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2);
>   static bool compatConnectbyTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2);
> ***************
> *** 95,100 ****
> --- 100,166 ----
>   /* sign, 10 digits, '\0' */
>   #define INT32_STRLEN    12
>
> + /* hash table support */
> + static HTAB *crosstab_HashTable;
> +
> + /* The information we cache about loaded procedures */
> + typedef struct crosstab_cat_desc
> + {
> +     char       *catname;
> +     int            attidx;    /* zero based */
> + }    crosstab_cat_desc;
> +
> + #define MAX_CATNAME_LEN            NAMEDATALEN
> + #define INIT_CATS                64
> +
> + #define crosstab_HashTableLookup(CATNAME, CATDESC) \
> + do { \
> +     crosstab_HashEnt *hentry; char key[MAX_CATNAME_LEN]; \
> +     \
> +     MemSet(key, 0, MAX_CATNAME_LEN); \
> +     snprintf(key, MAX_CATNAME_LEN - 1, "%s", CATNAME); \
> +     hentry = (crosstab_HashEnt*) hash_search(crosstab_HashTable, \
> +                                          key, HASH_FIND, NULL); \
> +     if (hentry) \
> +         CATDESC = hentry->catdesc; \
> +     else \
> +         CATDESC = NULL; \
> + } while(0)
> +
> + #define crosstab_HashTableInsert(CATDESC) \
> + do { \
> +     crosstab_HashEnt *hentry; bool found; char key[MAX_CATNAME_LEN]; \
> +     \
> +     MemSet(key, 0, MAX_CATNAME_LEN); \
> +     snprintf(key, MAX_CATNAME_LEN - 1, "%s", CATDESC->catname); \
> +     hentry = (crosstab_HashEnt*) hash_search(crosstab_HashTable, \
> +                                          key, HASH_ENTER, &found); \
> +     if (hentry == NULL) \
> +         elog(ERROR, "out of memory in crosstab_HashTable"); \
> +     if (found) \
> +         elog(ERROR, "trying to use a category name more than once"); \
> +     hentry->catdesc = CATDESC; \
> + } while(0)
> +
> + #define crosstab_HashTableDelete(CATNAME) \
> + do { \
> +     crosstab_HashEnt *hentry; char key[MAX_CATNAME_LEN]; \
> +     \
> +     MemSet(key, 0, MAX_CATNAME_LEN); \
> +     snprintf(key, MAX_CATNAME_LEN - 1, "%s", CATNAME); \
> +     hentry = (crosstab_HashEnt*) hash_search(crosstab_HashTable, \
> +                                          key, HASH_REMOVE, NULL); \
> +     if (hentry == NULL) \
> +         elog(WARNING, "trying to delete function name that does not exist."); \
> + } while(0)
> +
> + /* hash table */
> + typedef struct crosstab_hashent
> + {
> +     char                internal_catname[MAX_CATNAME_LEN];
> +     crosstab_cat_desc  *catdesc;
> + } crosstab_HashEnt;
> +
>   /*
>    * normal_rand - return requested number of random values
>    * with a Gaussian (Normal) distribution.
> ***************
> *** 593,598 ****
> --- 659,999 ----
>   }
>
>   /*
> +  * crosstab_hash - reimplement crosstab as materialized function and
> +  * properly deal with missing values (i.e. don't pack remaining
> +  * values to the left)
> +  *
> +  * crosstab - create a crosstab of rowids and values columns from a
> +  * SQL statement returning one rowid column, one category column,
> +  * and one value column.
> +  *
> +  * e.g. given sql which produces:
> +  *
> +  *            rowid    cat        value
> +  *            ------+-------+-------
> +  *            row1    cat1    val1
> +  *            row1    cat2    val2
> +  *            row1    cat4    val4
> +  *            row2    cat1    val5
> +  *            row2    cat2    val6
> +  *            row2    cat3    val7
> +  *            row2    cat4    val8
> +  *
> +  * crosstab returns:
> +  *                    <===== values columns =====>
> +  *            rowid    cat1    cat2    cat3    cat4
> +  *            ------+-------+-------+-------+-------
> +  *            row1    val1    val2    null    val4
> +  *            row2    val5    val6    val7    val8
> +  *
> +  * NOTES:
> +  * 1. SQL result must be ordered by 1.
> +  * 2. The number of values columns depends on the tuple description
> +  *      of the function's declared return type.
> +  * 2. Missing values (i.e. missing category) are filled in with nulls.
> +  * 3. Extra values (i.e. not in category results) are skipped.
> +  */
> + PG_FUNCTION_INFO_V1(crosstab_hash);
> + Datum
> + crosstab_hash(PG_FUNCTION_ARGS)
> + {
> +     char               *sql = GET_STR(PG_GETARG_TEXT_P(0));
> +     char               *cats_sql = GET_STR(PG_GETARG_TEXT_P(1));
> +     ReturnSetInfo       *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> +     TupleDesc            tupdesc;
> +     MemoryContext        per_query_ctx;
> +     MemoryContext        oldcontext;
> +     int                    num_categories;
> +
> +     /* check to see if caller supports us returning a tuplestore */
> +     if (!rsinfo || !(rsinfo->allowedModes & SFRM_Materialize))
> +         elog(ERROR, "crosstab: materialize mode required, but it is not "
> +              "allowed in this context");
> +
> +     per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
> +     oldcontext = MemoryContextSwitchTo(per_query_ctx);
> +
> +     /* get the requested return tuple description */
> +     tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
> +
> +     /*
> +      * Check to make sure we have a reasonable tuple descriptor
> +      *
> +      * Note we will attempt to coerce the values into whatever
> +      * the return attribute type is and depend on the "in"
> +      * function to complain if needed.
> +      */
> +     if (tupdesc->natts < 2)
> +         elog(ERROR, "crosstab: query-specified return tuple and " \
> +                     "crosstab function are not compatible");
> +
> +     /* load up the categories hash table */
> +     num_categories = load_categories_hash(cats_sql, per_query_ctx);
> +
> +     /* let the caller know we're sending back a tuplestore */
> +     rsinfo->returnMode = SFRM_Materialize;
> +
> +     /* now go build it */
> +     rsinfo->setResult = get_crosstab_tuplestore(sql,
> +                                                 num_categories,
> +                                                 tupdesc,
> +                                                 per_query_ctx);
> +
> +     /*
> +      * SFRM_Materialize mode expects us to return a NULL Datum. The actual
> +      * tuples are in our tuplestore and passed back through
> +      * rsinfo->setResult. rsinfo->setDesc is set to the tuple description
> +      * that we actually used to build our tuples with, so the caller can
> +      * verify we did what it was expecting.
> +      */
> +     rsinfo->setDesc = tupdesc;
> +     MemoryContextSwitchTo(oldcontext);
> +
> +     return (Datum) 0;
> + }
> +
> + /*
> +  * load up the categories hash table
> +  */
> + static int
> + load_categories_hash(char *cats_sql, MemoryContext per_query_ctx)
> + {
> +     HASHCTL            ctl;
> +     int                ret;
> +     int                proc;
> +     MemoryContext    SPIcontext;
> +     int                num_categories = 0;
> +
> +     /* initialize the category hash table */
> +     ctl.keysize = MAX_CATNAME_LEN;
> +     ctl.entrysize = sizeof(crosstab_HashEnt);
> +
> +     /*
> +      * use INIT_CATS, defined above as a guess of how
> +      * many hash table entries to create, initially
> +      */
> +     crosstab_HashTable = hash_create("crosstab hash", INIT_CATS, &ctl, HASH_ELEM);
> +
> +     /* Connect to SPI manager */
> +     if ((ret = SPI_connect()) < 0)
> +         elog(ERROR, "load_categories_hash: SPI_connect returned %d", ret);
> +
> +     /* Retrieve the category name rows */
> +     ret = SPI_exec(cats_sql, 0);
> +     num_categories = proc = SPI_processed;
> +
> +     /* Check for qualifying tuples */
> +     if ((ret == SPI_OK_SELECT) && (proc > 0))
> +     {
> +         SPITupleTable  *spi_tuptable = SPI_tuptable;
> +         TupleDesc        spi_tupdesc = spi_tuptable->tupdesc;
> +         int                i;
> +
> +         /*
> +          * The provided categories SQL query must always return one column:
> +          * category - the label or identifier for each column
> +          */
> +         if (spi_tupdesc->natts != 1)
> +             elog(ERROR, "load_categories_hash: provided categories SQL must " \
> +                         "return 1 column of at least one row");
> +
> +         for (i = 0; i < proc; i++)
> +         {
> +             crosstab_cat_desc  *catdesc;
> +             char               *catname;
> +             HeapTuple            spi_tuple;
> +
> +             /* get the next sql result tuple */
> +             spi_tuple = spi_tuptable->vals[i];
> +
> +             /* get the category from the current sql result tuple */
> +             catname = SPI_getvalue(spi_tuple, spi_tupdesc, 1);
> +
> +             SPIcontext = MemoryContextSwitchTo(per_query_ctx);
> +
> +             catdesc = (crosstab_cat_desc *) palloc(sizeof(crosstab_cat_desc));
> +             catdesc->catname = catname;
> +             catdesc->attidx = i;
> +
> +             /* Add the proc description block to the hashtable */
> +             crosstab_HashTableInsert(catdesc);
> +
> +             MemoryContextSwitchTo(SPIcontext);
> +         }
> +     }
> +     else
> +     {
> +         /* no qualifying tuples */
> +         SPI_finish();
> +         elog(ERROR, "load_categories_hash: provided categories SQL must " \
> +                     "return 1 column of at least one row");
> +     }
> +
> +     if (SPI_finish() != SPI_OK_FINISH)
> +         elog(ERROR, "load_categories_hash: SPI_finish() failed");
> +
> +     return num_categories;
> + }
> +
> + /*
> +  * create and populate the crosstab tuplestore using the provided source query
> +  */
> + static Tuplestorestate *
> + get_crosstab_tuplestore(char *sql,
> +                         int num_categories,
> +                         TupleDesc tupdesc,
> +                         MemoryContext per_query_ctx)
> + {
> +     Tuplestorestate       *tupstore;
> +     AttInMetadata       *attinmeta = TupleDescGetAttInMetadata(tupdesc);
> +     char              **values;
> +     HeapTuple            tuple;
> +     int                    ret;
> +     int                    proc;
> +     MemoryContext        SPIcontext;
> +
> +     /* initialize our tuplestore */
> +     tupstore = tuplestore_begin_heap(true, SortMem);
> +
> +     /* Connect to SPI manager */
> +     if ((ret = SPI_connect()) < 0)
> +         elog(ERROR, "get_crosstab_tuplestore: SPI_connect returned %d", ret);
> +
> +     /* Now retrieve the crosstab source rows */
> +     ret = SPI_exec(sql, 0);
> +     proc = SPI_processed;
> +
> +     /* Check for qualifying tuples */
> +     if ((ret == SPI_OK_SELECT) && (proc > 0))
> +     {
> +         SPITupleTable  *spi_tuptable = SPI_tuptable;
> +         TupleDesc        spi_tupdesc = spi_tuptable->tupdesc;
> +         int                ncols = spi_tupdesc->natts;
> +         char           *rowid;
> +         char           *lastrowid = NULL;
> +         int                i, j;
> +         int                result_ncols;
> +
> +         /*
> +          * The provided SQL query must always return at least three columns:
> +          *
> +          * 1. rowname    the label for each row - column 1 in the final result
> +          * 2. category  the label for each value-column in the final result
> +          * 3. value        the values used to populate the value-columns
> +          *
> +          * If there are more than three columns, the last two are taken as
> +          * "category" and "values". The first column is taken as "rowname".
> +          * Additional columns (2 thru N-2) are assumed the same for the same
> +          * "rowname", and are copied into the result tuple from the first
> +          * time we encounter a particular rowname.
> +          */
> +         if (ncols < 3)
> +             elog(ERROR, "get_crosstab_tuplestore: provided source SQL must " \
> +                         "return at least 3 columns; a rowid, a category, " \
> +                         "and a values column");
> +
> +         result_ncols = (ncols - 2) + num_categories;
> +
> +         /* Recheck to make sure we tuple descriptor still looks reasonable */
> +         if (tupdesc->natts != result_ncols)
> +             elog(ERROR, "get_crosstab_tuplestore: query-specified return " \
> +                         "tuple has %d columns but crosstab returns %d",
> +                          tupdesc->natts, result_ncols);
> +
> +         /* allocate space */
> +         values = (char **) palloc(result_ncols * sizeof(char *));
> +
> +         /* and make sure it's clear */
> +         memset(values, '\0', result_ncols * sizeof(char *));
> +
> +         for (i = 0; i < proc; i++)
> +         {
> +             HeapTuple            spi_tuple;
> +             crosstab_cat_desc  *catdesc;
> +             char               *catname;
> +
> +             /* get the next sql result tuple */
> +             spi_tuple = spi_tuptable->vals[i];
> +
> +             /* get the rowid from the current sql result tuple */
> +             rowid = SPI_getvalue(spi_tuple, spi_tupdesc, 1);
> +
> +             /* if rowid is null, skip this tuple entirely */
> +             if (rowid == NULL)
> +                 continue;
> +
> +             /*
> +              * if we're on a new output row, grab the column values up to
> +              * column N-2 now
> +              */
> +             if ((lastrowid == NULL) || (strcmp(rowid, lastrowid) != 0))
> +             {
> +                 /*
> +                  * a new row means we need to flush the old one first,
> +                  * unless we're on the very first row
> +                  */
> +                 if (lastrowid != NULL)
> +                 {
> +                     /* switch to appropriate context while storing the tuple */
> +                     SPIcontext = MemoryContextSwitchTo(per_query_ctx);
> +
> +                     /* rowid changed, flush the previous output row */
> +                     tuple = BuildTupleFromCStrings(attinmeta, values);
> +                     tuplestore_puttuple(tupstore, tuple);
> +                     for (j = 0; j < result_ncols; j++)
> +                         xpfree(values[j]);
> +
> +                     /* now reset the context */
> +                     MemoryContextSwitchTo(SPIcontext);
> +                 }
> +
> +                 values[0] = rowid;
> +                 for (j = 1; j < ncols - 2; j++)
> +                     values[j] = SPI_getvalue(spi_tuple, spi_tupdesc, j + 1);
> +             }
> +
> +             /* look up the category and fill in the appropriate column */
> +             catname = SPI_getvalue(spi_tuple, spi_tupdesc, ncols - 1);
> +
> +             if (catname != NULL)
> +             {
> +                 crosstab_HashTableLookup(catname, catdesc);
> +
> +                 if (catdesc)
> +                     values[catdesc->attidx + ncols - 2] =
> +                         SPI_getvalue(spi_tuple, spi_tupdesc, ncols);
> +             }
> +
> +             xpfree(lastrowid);
> +             lastrowid = pstrdup(rowid);
> +         }
> +
> +         /* switch to appropriate context while storing the tuple */
> +         SPIcontext = MemoryContextSwitchTo(per_query_ctx);
> +
> +         /* flush the last output row */
> +         tuple = BuildTupleFromCStrings(attinmeta, values);
> +         tuplestore_puttuple(tupstore, tuple);
> +
> +         /* now reset the context */
> +         MemoryContextSwitchTo(SPIcontext);
> +
> +     }
> +     else
> +     {
> +         /* no qualifying tuples */
> +         SPI_finish();
> +     }
> +
> +     if (SPI_finish() != SPI_OK_FINISH)
> +         elog(ERROR, "get_crosstab_tuplestore: SPI_finish() failed");
> +
> +     tuplestore_donestoring(tupstore);
> +
> +     return tupstore;
> + }
> +
> + /*
>    * connectby_text - produce a result set from a hierarchical (parent/child)
>    * table.
>    *
> ***************
> *** 668,674 ****
>       attinmeta = TupleDescGetAttInMetadata(tupdesc);
>
>       /* check to see if caller supports us returning a tuplestore */
> !     if (!rsinfo->allowedModes & SFRM_Materialize)
>           elog(ERROR, "connectby requires Materialize mode, but it is not "
>                "allowed in this context");
>
> --- 1069,1075 ----
>       attinmeta = TupleDescGetAttInMetadata(tupdesc);
>
>       /* check to see if caller supports us returning a tuplestore */
> !     if (!rsinfo || !(rsinfo->allowedModes & SFRM_Materialize))
>           elog(ERROR, "connectby requires Materialize mode, but it is not "
>                "allowed in this context");
>
> Index: contrib/tablefunc/tablefunc.h
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/tablefunc.h,v
> retrieving revision 1.4
> diff -c -r1.4 tablefunc.h
> *** contrib/tablefunc/tablefunc.h    4 Sep 2002 20:31:08 -0000    1.4
> --- contrib/tablefunc/tablefunc.h    2 Mar 2003 22:32:15 -0000
> ***************
> *** 34,39 ****
> --- 34,40 ----
>    */
>   extern Datum normal_rand(PG_FUNCTION_ARGS);
>   extern Datum crosstab(PG_FUNCTION_ARGS);
> + extern Datum crosstab_hash(PG_FUNCTION_ARGS);
>   extern Datum connectby_text(PG_FUNCTION_ARGS);
>
>   #endif   /* TABLEFUNC_H */
> Index: contrib/tablefunc/tablefunc.sql.in
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/tablefunc.sql.in,v
> retrieving revision 1.5
> diff -c -r1.5 tablefunc.sql.in
> *** contrib/tablefunc/tablefunc.sql.in    18 Oct 2002 18:41:21 -0000    1.5
> --- contrib/tablefunc/tablefunc.sql.in    2 Mar 2003 22:32:23 -0000
> ***************
> *** 52,57 ****
> --- 52,62 ----
>   AS 'MODULE_PATHNAME','crosstab'
>   LANGUAGE 'C' STABLE STRICT;
>
> + CREATE OR REPLACE FUNCTION crosstab(text,text)
> + RETURNS setof record
> + AS 'MODULE_PATHNAME','crosstab_hash'
> + LANGUAGE 'C' STABLE STRICT;
> +
>   CREATE OR REPLACE FUNCTION connectby(text,text,text,text,int,text)
>   RETURNS setof record
>   AS 'MODULE_PATHNAME','connectby_text'
> Index: contrib/tablefunc/expected/tablefunc.out
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/expected/tablefunc.out,v
> retrieving revision 1.5
> diff -c -r1.5 tablefunc.out
> *** contrib/tablefunc/expected/tablefunc.out    23 Nov 2002 01:54:09 -0000    1.5
> --- contrib/tablefunc/expected/tablefunc.out    3 Mar 2003 02:58:22 -0000
> ***************
> *** 123,128 ****
> --- 123,201 ----
>    test2 | val5 | val6 | val7 | val8
>   (2 rows)
>
> + --
> + -- hash based crosstab
> + --
> + create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
> + NOTICE:  CREATE TABLE will create implicit sequence 'cth_id_seq' for SERIAL column 'cth.id'
> + insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42');
> + insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS');
> + -- the next line is intentionally left commented and is therefore a "missing" attribute
> + -- insert into cth values(DEFAULT,'test1','01 March 2003','test_startdate','28 February 2003');
> + insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987');
> + insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
> + insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234');
> + -- return attributes as plain text
> + SELECT * FROM crosstab(
> +   'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature text, test_result text, test_startdate text, volts text);
> +  rowid |          rowdt           | temperature | test_result | test_startdate | volts
> + -------+--------------------------+-------------+-------------+----------------+--------
> +  test1 | Sat Mar 01 00:00:00 2003 | 42          | PASS        |                | 2.6987
> +  test2 | Sun Mar 02 00:00:00 2003 | 53          | FAIL        | 01 March 2003  | 3.1234
> + (2 rows)
> +
> + -- this time without rowdt
> + SELECT * FROM crosstab(
> +   'SELECT rowid, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, temperature text, test_result text, test_startdate text, volts text);
> +  rowid | temperature | test_result | test_startdate | volts
> + -------+-------------+-------------+----------------+--------
> +  test1 | 42          | PASS        |                | 2.6987
> +  test2 | 53          | FAIL        | 01 March 2003  | 3.1234
> + (2 rows)
> +
> + -- convert attributes to specific datatypes
> + SELECT * FROM crosstab(
> +   'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> +  rowid |          rowdt           | temperature | test_result |      test_startdate      | volts
> + -------+--------------------------+-------------+-------------+--------------------------+--------
> +  test1 | Sat Mar 01 00:00:00 2003 |          42 | PASS        |                          | 2.6987
> +  test2 | Sun Mar 02 00:00:00 2003 |          53 | FAIL        | Sat Mar 01 00:00:00 2003 | 3.1234
> + (2 rows)
> +
> + -- source query and category query out of sync
> + SELECT * FROM crosstab(
> +   'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT attribute FROM cth WHERE attribute IN (''temperature'',''test_result'',''test_startdate'') ORDER
BY1') 
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp);
> +  rowid |          rowdt           | temperature | test_result |      test_startdate
> + -------+--------------------------+-------------+-------------+--------------------------
> +  test1 | Sat Mar 01 00:00:00 2003 |          42 | PASS        |
> +  test2 | Sun Mar 02 00:00:00 2003 |          53 | FAIL        | Sat Mar 01 00:00:00 2003
> + (2 rows)
> +
> + -- if category query generates no rows, get expected error
> + SELECT * FROM crosstab(
> +   'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT attribute FROM cth WHERE attribute = ''a'' ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> + ERROR:  load_categories_hash: provided categories SQL must return 1 column of at least one row
> + -- if category query generates more than one column, get expected error
> + SELECT * FROM crosstab(
> +   'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT rowdt, attribute FROM cth ORDER BY 2')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> + ERROR:  load_categories_hash: provided categories SQL must return 1 column of at least one row
> + --
> + -- connectby
> + --
>   -- test connectby with text based hierarchy
>   CREATE TABLE connectby_text(keyid text, parent_keyid text);
>   \copy connectby_text from 'data/connectby_text.data'
> Index: contrib/tablefunc/sql/tablefunc.sql
> ===================================================================
> RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/sql/tablefunc.sql,v
> retrieving revision 1.6
> diff -c -r1.6 tablefunc.sql
> *** contrib/tablefunc/sql/tablefunc.sql    23 Nov 2002 01:54:09 -0000    1.6
> --- contrib/tablefunc/sql/tablefunc.sql    3 Mar 2003 02:51:45 -0000
> ***************
> *** 38,43 ****
> --- 38,98 ----
>   SELECT * FROM crosstab('SELECT rowid, attribute, val FROM ct where rowclass = ''group1'' ORDER BY 1,2;', 3) AS
c(rowidtext, att1 text, att2 text, att3 text); 
>   SELECT * FROM crosstab('SELECT rowid, attribute, val FROM ct where rowclass = ''group1'' ORDER BY 1,2;', 4) AS
c(rowidtext, att1 text, att2 text, att3 text, att4 text); 
>
> + --
> + -- hash based crosstab
> + --
> + create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
> + insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42');
> + insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS');
> + -- the next line is intentionally left commented and is therefore a "missing" attribute
> + -- insert into cth values(DEFAULT,'test1','01 March 2003','test_startdate','28 February 2003');
> + insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987');
> + insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL');
> + insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
> + insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234');
> +
> + -- return attributes as plain text
> + SELECT * FROM crosstab(
> +   'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature text, test_result text, test_startdate text, volts text);
> +
> + -- this time without rowdt
> + SELECT * FROM crosstab(
> +   'SELECT rowid, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, temperature text, test_result text, test_startdate text, volts text);
> +
> + -- convert attributes to specific datatypes
> + SELECT * FROM crosstab(
> +   'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT attribute FROM cth ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> +
> + -- source query and category query out of sync
> + SELECT * FROM crosstab(
> +   'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT attribute FROM cth WHERE attribute IN (''temperature'',''test_result'',''test_startdate'') ORDER
BY1') 
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp);
> +
> + -- if category query generates no rows, get expected error
> + SELECT * FROM crosstab(
> +   'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT attribute FROM cth WHERE attribute = ''a'' ORDER BY 1')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> +
> + -- if category query generates more than one column, get expected error
> + SELECT * FROM crosstab(
> +   'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
> +   'SELECT DISTINCT rowdt, attribute FROM cth ORDER BY 2')
> + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8);
> +
> +
> + --
> + -- connectby
> + --
> +
>   -- test connectby with text based hierarchy
>   CREATE TABLE connectby_text(keyid text, parent_keyid text);
>   \copy connectby_text from 'data/connectby_text.data'

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
pgsql-patches by date:
From: Bruce Momjian
Date: 17 March 2003, 19:26:14
Subject: Re: [GENERAL] postgres error reporting
From: Bruce Momjian
Date: 17 March 2003, 19:26:39
Subject: Re: ALTER SEQUENCE
Re: hashed crosstab - Mailing list pgsql-patches

Previous

Next