Re: hashed crosstab - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: hashed crosstab |
Date | |
Msg-id | 200303180026.h2I0QFh00593@candle.pha.pa.us Whole thread Raw |
In response to | hashed crosstab (Joe Conway <mail@joeconway.com>) |
List | pgsql-patches |
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches I will try to apply it within the next 48 hours. --------------------------------------------------------------------------- Joe Conway wrote: > Attached is an update to contrib/tablefunc. It implements a new hashed > version of crosstab. This fixes a major deficiency in real-world use of > the original version. Easiest to undestand with an illustration: > > Data: > ------------------------------------------------------------------- > select * from cth; > id | rowid | rowdt | attribute | val > ----+-------+---------------------+----------------+--------------- > 1 | test1 | 2003-03-01 00:00:00 | temperature | 42 > 2 | test1 | 2003-03-01 00:00:00 | test_result | PASS > 3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987 > 4 | test2 | 2003-03-02 00:00:00 | temperature | 53 > 5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL > 6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003 > 7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234 > (7 rows) > > Original crosstab: > ------------------------------------------------------------------- > SELECT * FROM crosstab( > 'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4) > AS c(rowid text, temperature text, test_result text, test_startdate > text, volts text); > rowid | temperature | test_result | test_startdate | volts > -------+-------------+-------------+----------------+-------- > test1 | 42 | PASS | 2.6987 | > test2 | 53 | FAIL | 01 March 2003 | 3.1234 > (2 rows) > > Hashed crosstab: > ------------------------------------------------------------------- > SELECT * FROM crosstab( > 'SELECT rowid, attribute, val FROM cth ORDER BY 1', > 'SELECT DISTINCT attribute FROM cth ORDER BY 1') > AS c(rowid text, temperature int4, test_result text, test_startdate > timestamp, volts float8); > rowid | temperature | test_result | test_startdate | volts > -------+-------------+-------------+---------------------+-------- > test1 | 42 | PASS | | 2.6987 > test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234 > (2 rows) > > Notice that the original crosstab slides data over to the left in the > result tuple when it encounters missing data. In order to work around > this you have to be make your source sql do all sorts of contortions > (cartesian join of distinct rowid with distinct attribute; left join > that back to the real source data). The new version avoids this by > building a hash table using a second distinct attribute query. > > The new version also allows for "extra" columns (see the README) and > allows the result columns to be coerced into differing datatypes if they > are suitable (as shown above). > > In testing a "real-world" data set (69 distinct rowid's, 27 distinct > categories/attributes, multiple missing data points) I saw about a > 5-fold improvement in execution time (from about 2200 ms old, to 440 ms > new). > > I left the original version intact because: 1) BC, 2) it is probably > slightly faster if you know that you have no missing attributes. > > README and regression test adjustments included. If there are no > objections, please apply. > > Thanks, > > Joe > Index: contrib/tablefunc/README.tablefunc > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/README.tablefunc,v > retrieving revision 1.5 > diff -c -r1.5 README.tablefunc > *** contrib/tablefunc/README.tablefunc 23 Nov 2002 01:54:09 -0000 1.5 > --- contrib/tablefunc/README.tablefunc 3 Mar 2003 03:37:39 -0000 > *************** > *** 333,338 **** > --- 333,457 ---- > ================================================================== > Name > > + crosstab(text, text) - returns a set of row_name, extra, and > + category value columns > + > + Synopsis > + > + crosstab(text source_sql, text category_sql) > + > + Inputs > + > + source_sql > + > + A SQL statement which produces the source set of data. The SQL statement > + must return one row_name column, one category column, and one value > + column. It may also have one or more "extra" columns. > + > + The row_name column must be first. The category and value columns > + must be the last two columns, in that order. "extra" columns must be > + columns 2 through (N - 2), where N is the total number of columns. > + > + The "extra" columns are assumed to be the same for all rows with the > + same row_name. The values returned are copied from the first row > + with a given row_name and subsequent values of these columns are ignored > + until row_name changes. > + > + e.g. source_sql must produce a set something like: > + SELECT row_name, extra_col, cat, value FROM foo; > + > + row_name extra_col cat value > + ----------+------------+-----+--------- > + row1 extra1 cat1 val1 > + row1 extra1 cat2 val2 > + row1 extra1 cat4 val4 > + row2 extra2 cat1 val5 > + row2 extra2 cat2 val6 > + row2 extra2 cat3 val7 > + row2 extra2 cat4 val8 > + > + category_sql > + > + A SQL statement which produces the distinct set of categories. The SQL > + statement must return one category column only. category_sql must produce > + at least one result row or an error will be generated. category_sql > + must not produce duplicate categories or an error will be generated. > + > + e.g. SELECT DISTINCT cat FROM foo; > + > + cat > + ------- > + cat1 > + cat2 > + cat3 > + cat4 > + > + Outputs > + > + Returns setof record, which must be defined with a column definition > + in the FROM clause of the SELECT statement, e.g.: > + > + SELECT * FROM crosstab(source_sql, cat_sql) > + AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text); > + > + the example crosstab function produces a set something like: > + <== values columns ==> > + row_name extra cat1 cat2 cat3 cat4 > + ---------+-------+------+------+------+------ > + row1 extra1 val1 val2 val4 > + row2 extra2 val5 val6 val7 val8 > + > + Notes > + > + 1. source_sql must be ordered by row_name (column 1). > + > + 2. The number of values columns is determined at run-time. The > + column definition provided in the FROM clause must provide for > + the correct number of columns of the proper data types. > + > + 3. Missing values (i.e. not enough adjacent rows of same row_name to > + fill the number of result values columns) are filled in with nulls. > + > + 4. Extra values (i.e. source rows with category not found in category_sql > + result) are skipped. > + > + 5. Rows with a null row_name column are skipped. > + > + > + Example usage > + > + create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text); > + insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42'); > + insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS'); > + insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987'); > + insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53'); > + insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL'); > + insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003'); > + insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234'); > + > + SELECT * FROM crosstab > + ( > + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT attribute FROM cth ORDER BY 1' > + ) > + AS > + ( > + rowid text, > + rowdt timestamp, > + temperature int4, > + test_result text, > + test_startdate timestamp, > + volts float8 > + ); > + rowid | rowdt | temperature | test_result | test_startdate | volts > + -------+--------------------------+-------------+-------------+--------------------------+-------- > + test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987 > + test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234 > + (2 rows) > + > + ================================================================== > + Name > + > connectby(text, text, text, text, int[, text]) - returns a set > representing a hierarchy (tree structure) > > Index: contrib/tablefunc/tablefunc.c > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/tablefunc.c,v > retrieving revision 1.11 > diff -c -r1.11 tablefunc.c > *** contrib/tablefunc/tablefunc.c 23 Nov 2002 01:54:09 -0000 1.11 > --- contrib/tablefunc/tablefunc.c 3 Mar 2003 02:57:50 -0000 > *************** > *** 39,44 **** > --- 39,49 ---- > > #include "tablefunc.h" > > + static int load_categories_hash(char *cats_sql, MemoryContext per_query_ctx); > + static Tuplestorestate *get_crosstab_tuplestore(char *sql, > + int num_categories, > + TupleDesc tupdesc, > + MemoryContext per_query_ctx); > static void validateConnectbyTupleDesc(TupleDesc tupdesc, bool show_branch); > static bool compatCrosstabTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2); > static bool compatConnectbyTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2); > *************** > *** 95,100 **** > --- 100,166 ---- > /* sign, 10 digits, '\0' */ > #define INT32_STRLEN 12 > > + /* hash table support */ > + static HTAB *crosstab_HashTable; > + > + /* The information we cache about loaded procedures */ > + typedef struct crosstab_cat_desc > + { > + char *catname; > + int attidx; /* zero based */ > + } crosstab_cat_desc; > + > + #define MAX_CATNAME_LEN NAMEDATALEN > + #define INIT_CATS 64 > + > + #define crosstab_HashTableLookup(CATNAME, CATDESC) \ > + do { \ > + crosstab_HashEnt *hentry; char key[MAX_CATNAME_LEN]; \ > + \ > + MemSet(key, 0, MAX_CATNAME_LEN); \ > + snprintf(key, MAX_CATNAME_LEN - 1, "%s", CATNAME); \ > + hentry = (crosstab_HashEnt*) hash_search(crosstab_HashTable, \ > + key, HASH_FIND, NULL); \ > + if (hentry) \ > + CATDESC = hentry->catdesc; \ > + else \ > + CATDESC = NULL; \ > + } while(0) > + > + #define crosstab_HashTableInsert(CATDESC) \ > + do { \ > + crosstab_HashEnt *hentry; bool found; char key[MAX_CATNAME_LEN]; \ > + \ > + MemSet(key, 0, MAX_CATNAME_LEN); \ > + snprintf(key, MAX_CATNAME_LEN - 1, "%s", CATDESC->catname); \ > + hentry = (crosstab_HashEnt*) hash_search(crosstab_HashTable, \ > + key, HASH_ENTER, &found); \ > + if (hentry == NULL) \ > + elog(ERROR, "out of memory in crosstab_HashTable"); \ > + if (found) \ > + elog(ERROR, "trying to use a category name more than once"); \ > + hentry->catdesc = CATDESC; \ > + } while(0) > + > + #define crosstab_HashTableDelete(CATNAME) \ > + do { \ > + crosstab_HashEnt *hentry; char key[MAX_CATNAME_LEN]; \ > + \ > + MemSet(key, 0, MAX_CATNAME_LEN); \ > + snprintf(key, MAX_CATNAME_LEN - 1, "%s", CATNAME); \ > + hentry = (crosstab_HashEnt*) hash_search(crosstab_HashTable, \ > + key, HASH_REMOVE, NULL); \ > + if (hentry == NULL) \ > + elog(WARNING, "trying to delete function name that does not exist."); \ > + } while(0) > + > + /* hash table */ > + typedef struct crosstab_hashent > + { > + char internal_catname[MAX_CATNAME_LEN]; > + crosstab_cat_desc *catdesc; > + } crosstab_HashEnt; > + > /* > * normal_rand - return requested number of random values > * with a Gaussian (Normal) distribution. > *************** > *** 593,598 **** > --- 659,999 ---- > } > > /* > + * crosstab_hash - reimplement crosstab as materialized function and > + * properly deal with missing values (i.e. don't pack remaining > + * values to the left) > + * > + * crosstab - create a crosstab of rowids and values columns from a > + * SQL statement returning one rowid column, one category column, > + * and one value column. > + * > + * e.g. given sql which produces: > + * > + * rowid cat value > + * ------+-------+------- > + * row1 cat1 val1 > + * row1 cat2 val2 > + * row1 cat4 val4 > + * row2 cat1 val5 > + * row2 cat2 val6 > + * row2 cat3 val7 > + * row2 cat4 val8 > + * > + * crosstab returns: > + * <===== values columns =====> > + * rowid cat1 cat2 cat3 cat4 > + * ------+-------+-------+-------+------- > + * row1 val1 val2 null val4 > + * row2 val5 val6 val7 val8 > + * > + * NOTES: > + * 1. SQL result must be ordered by 1. > + * 2. The number of values columns depends on the tuple description > + * of the function's declared return type. > + * 2. Missing values (i.e. missing category) are filled in with nulls. > + * 3. Extra values (i.e. not in category results) are skipped. > + */ > + PG_FUNCTION_INFO_V1(crosstab_hash); > + Datum > + crosstab_hash(PG_FUNCTION_ARGS) > + { > + char *sql = GET_STR(PG_GETARG_TEXT_P(0)); > + char *cats_sql = GET_STR(PG_GETARG_TEXT_P(1)); > + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; > + TupleDesc tupdesc; > + MemoryContext per_query_ctx; > + MemoryContext oldcontext; > + int num_categories; > + > + /* check to see if caller supports us returning a tuplestore */ > + if (!rsinfo || !(rsinfo->allowedModes & SFRM_Materialize)) > + elog(ERROR, "crosstab: materialize mode required, but it is not " > + "allowed in this context"); > + > + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory; > + oldcontext = MemoryContextSwitchTo(per_query_ctx); > + > + /* get the requested return tuple description */ > + tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc); > + > + /* > + * Check to make sure we have a reasonable tuple descriptor > + * > + * Note we will attempt to coerce the values into whatever > + * the return attribute type is and depend on the "in" > + * function to complain if needed. > + */ > + if (tupdesc->natts < 2) > + elog(ERROR, "crosstab: query-specified return tuple and " \ > + "crosstab function are not compatible"); > + > + /* load up the categories hash table */ > + num_categories = load_categories_hash(cats_sql, per_query_ctx); > + > + /* let the caller know we're sending back a tuplestore */ > + rsinfo->returnMode = SFRM_Materialize; > + > + /* now go build it */ > + rsinfo->setResult = get_crosstab_tuplestore(sql, > + num_categories, > + tupdesc, > + per_query_ctx); > + > + /* > + * SFRM_Materialize mode expects us to return a NULL Datum. The actual > + * tuples are in our tuplestore and passed back through > + * rsinfo->setResult. rsinfo->setDesc is set to the tuple description > + * that we actually used to build our tuples with, so the caller can > + * verify we did what it was expecting. > + */ > + rsinfo->setDesc = tupdesc; > + MemoryContextSwitchTo(oldcontext); > + > + return (Datum) 0; > + } > + > + /* > + * load up the categories hash table > + */ > + static int > + load_categories_hash(char *cats_sql, MemoryContext per_query_ctx) > + { > + HASHCTL ctl; > + int ret; > + int proc; > + MemoryContext SPIcontext; > + int num_categories = 0; > + > + /* initialize the category hash table */ > + ctl.keysize = MAX_CATNAME_LEN; > + ctl.entrysize = sizeof(crosstab_HashEnt); > + > + /* > + * use INIT_CATS, defined above as a guess of how > + * many hash table entries to create, initially > + */ > + crosstab_HashTable = hash_create("crosstab hash", INIT_CATS, &ctl, HASH_ELEM); > + > + /* Connect to SPI manager */ > + if ((ret = SPI_connect()) < 0) > + elog(ERROR, "load_categories_hash: SPI_connect returned %d", ret); > + > + /* Retrieve the category name rows */ > + ret = SPI_exec(cats_sql, 0); > + num_categories = proc = SPI_processed; > + > + /* Check for qualifying tuples */ > + if ((ret == SPI_OK_SELECT) && (proc > 0)) > + { > + SPITupleTable *spi_tuptable = SPI_tuptable; > + TupleDesc spi_tupdesc = spi_tuptable->tupdesc; > + int i; > + > + /* > + * The provided categories SQL query must always return one column: > + * category - the label or identifier for each column > + */ > + if (spi_tupdesc->natts != 1) > + elog(ERROR, "load_categories_hash: provided categories SQL must " \ > + "return 1 column of at least one row"); > + > + for (i = 0; i < proc; i++) > + { > + crosstab_cat_desc *catdesc; > + char *catname; > + HeapTuple spi_tuple; > + > + /* get the next sql result tuple */ > + spi_tuple = spi_tuptable->vals[i]; > + > + /* get the category from the current sql result tuple */ > + catname = SPI_getvalue(spi_tuple, spi_tupdesc, 1); > + > + SPIcontext = MemoryContextSwitchTo(per_query_ctx); > + > + catdesc = (crosstab_cat_desc *) palloc(sizeof(crosstab_cat_desc)); > + catdesc->catname = catname; > + catdesc->attidx = i; > + > + /* Add the proc description block to the hashtable */ > + crosstab_HashTableInsert(catdesc); > + > + MemoryContextSwitchTo(SPIcontext); > + } > + } > + else > + { > + /* no qualifying tuples */ > + SPI_finish(); > + elog(ERROR, "load_categories_hash: provided categories SQL must " \ > + "return 1 column of at least one row"); > + } > + > + if (SPI_finish() != SPI_OK_FINISH) > + elog(ERROR, "load_categories_hash: SPI_finish() failed"); > + > + return num_categories; > + } > + > + /* > + * create and populate the crosstab tuplestore using the provided source query > + */ > + static Tuplestorestate * > + get_crosstab_tuplestore(char *sql, > + int num_categories, > + TupleDesc tupdesc, > + MemoryContext per_query_ctx) > + { > + Tuplestorestate *tupstore; > + AttInMetadata *attinmeta = TupleDescGetAttInMetadata(tupdesc); > + char **values; > + HeapTuple tuple; > + int ret; > + int proc; > + MemoryContext SPIcontext; > + > + /* initialize our tuplestore */ > + tupstore = tuplestore_begin_heap(true, SortMem); > + > + /* Connect to SPI manager */ > + if ((ret = SPI_connect()) < 0) > + elog(ERROR, "get_crosstab_tuplestore: SPI_connect returned %d", ret); > + > + /* Now retrieve the crosstab source rows */ > + ret = SPI_exec(sql, 0); > + proc = SPI_processed; > + > + /* Check for qualifying tuples */ > + if ((ret == SPI_OK_SELECT) && (proc > 0)) > + { > + SPITupleTable *spi_tuptable = SPI_tuptable; > + TupleDesc spi_tupdesc = spi_tuptable->tupdesc; > + int ncols = spi_tupdesc->natts; > + char *rowid; > + char *lastrowid = NULL; > + int i, j; > + int result_ncols; > + > + /* > + * The provided SQL query must always return at least three columns: > + * > + * 1. rowname the label for each row - column 1 in the final result > + * 2. category the label for each value-column in the final result > + * 3. value the values used to populate the value-columns > + * > + * If there are more than three columns, the last two are taken as > + * "category" and "values". The first column is taken as "rowname". > + * Additional columns (2 thru N-2) are assumed the same for the same > + * "rowname", and are copied into the result tuple from the first > + * time we encounter a particular rowname. > + */ > + if (ncols < 3) > + elog(ERROR, "get_crosstab_tuplestore: provided source SQL must " \ > + "return at least 3 columns; a rowid, a category, " \ > + "and a values column"); > + > + result_ncols = (ncols - 2) + num_categories; > + > + /* Recheck to make sure we tuple descriptor still looks reasonable */ > + if (tupdesc->natts != result_ncols) > + elog(ERROR, "get_crosstab_tuplestore: query-specified return " \ > + "tuple has %d columns but crosstab returns %d", > + tupdesc->natts, result_ncols); > + > + /* allocate space */ > + values = (char **) palloc(result_ncols * sizeof(char *)); > + > + /* and make sure it's clear */ > + memset(values, '\0', result_ncols * sizeof(char *)); > + > + for (i = 0; i < proc; i++) > + { > + HeapTuple spi_tuple; > + crosstab_cat_desc *catdesc; > + char *catname; > + > + /* get the next sql result tuple */ > + spi_tuple = spi_tuptable->vals[i]; > + > + /* get the rowid from the current sql result tuple */ > + rowid = SPI_getvalue(spi_tuple, spi_tupdesc, 1); > + > + /* if rowid is null, skip this tuple entirely */ > + if (rowid == NULL) > + continue; > + > + /* > + * if we're on a new output row, grab the column values up to > + * column N-2 now > + */ > + if ((lastrowid == NULL) || (strcmp(rowid, lastrowid) != 0)) > + { > + /* > + * a new row means we need to flush the old one first, > + * unless we're on the very first row > + */ > + if (lastrowid != NULL) > + { > + /* switch to appropriate context while storing the tuple */ > + SPIcontext = MemoryContextSwitchTo(per_query_ctx); > + > + /* rowid changed, flush the previous output row */ > + tuple = BuildTupleFromCStrings(attinmeta, values); > + tuplestore_puttuple(tupstore, tuple); > + for (j = 0; j < result_ncols; j++) > + xpfree(values[j]); > + > + /* now reset the context */ > + MemoryContextSwitchTo(SPIcontext); > + } > + > + values[0] = rowid; > + for (j = 1; j < ncols - 2; j++) > + values[j] = SPI_getvalue(spi_tuple, spi_tupdesc, j + 1); > + } > + > + /* look up the category and fill in the appropriate column */ > + catname = SPI_getvalue(spi_tuple, spi_tupdesc, ncols - 1); > + > + if (catname != NULL) > + { > + crosstab_HashTableLookup(catname, catdesc); > + > + if (catdesc) > + values[catdesc->attidx + ncols - 2] = > + SPI_getvalue(spi_tuple, spi_tupdesc, ncols); > + } > + > + xpfree(lastrowid); > + lastrowid = pstrdup(rowid); > + } > + > + /* switch to appropriate context while storing the tuple */ > + SPIcontext = MemoryContextSwitchTo(per_query_ctx); > + > + /* flush the last output row */ > + tuple = BuildTupleFromCStrings(attinmeta, values); > + tuplestore_puttuple(tupstore, tuple); > + > + /* now reset the context */ > + MemoryContextSwitchTo(SPIcontext); > + > + } > + else > + { > + /* no qualifying tuples */ > + SPI_finish(); > + } > + > + if (SPI_finish() != SPI_OK_FINISH) > + elog(ERROR, "get_crosstab_tuplestore: SPI_finish() failed"); > + > + tuplestore_donestoring(tupstore); > + > + return tupstore; > + } > + > + /* > * connectby_text - produce a result set from a hierarchical (parent/child) > * table. > * > *************** > *** 668,674 **** > attinmeta = TupleDescGetAttInMetadata(tupdesc); > > /* check to see if caller supports us returning a tuplestore */ > ! if (!rsinfo->allowedModes & SFRM_Materialize) > elog(ERROR, "connectby requires Materialize mode, but it is not " > "allowed in this context"); > > --- 1069,1075 ---- > attinmeta = TupleDescGetAttInMetadata(tupdesc); > > /* check to see if caller supports us returning a tuplestore */ > ! if (!rsinfo || !(rsinfo->allowedModes & SFRM_Materialize)) > elog(ERROR, "connectby requires Materialize mode, but it is not " > "allowed in this context"); > > Index: contrib/tablefunc/tablefunc.h > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/tablefunc.h,v > retrieving revision 1.4 > diff -c -r1.4 tablefunc.h > *** contrib/tablefunc/tablefunc.h 4 Sep 2002 20:31:08 -0000 1.4 > --- contrib/tablefunc/tablefunc.h 2 Mar 2003 22:32:15 -0000 > *************** > *** 34,39 **** > --- 34,40 ---- > */ > extern Datum normal_rand(PG_FUNCTION_ARGS); > extern Datum crosstab(PG_FUNCTION_ARGS); > + extern Datum crosstab_hash(PG_FUNCTION_ARGS); > extern Datum connectby_text(PG_FUNCTION_ARGS); > > #endif /* TABLEFUNC_H */ > Index: contrib/tablefunc/tablefunc.sql.in > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/tablefunc.sql.in,v > retrieving revision 1.5 > diff -c -r1.5 tablefunc.sql.in > *** contrib/tablefunc/tablefunc.sql.in 18 Oct 2002 18:41:21 -0000 1.5 > --- contrib/tablefunc/tablefunc.sql.in 2 Mar 2003 22:32:23 -0000 > *************** > *** 52,57 **** > --- 52,62 ---- > AS 'MODULE_PATHNAME','crosstab' > LANGUAGE 'C' STABLE STRICT; > > + CREATE OR REPLACE FUNCTION crosstab(text,text) > + RETURNS setof record > + AS 'MODULE_PATHNAME','crosstab_hash' > + LANGUAGE 'C' STABLE STRICT; > + > CREATE OR REPLACE FUNCTION connectby(text,text,text,text,int,text) > RETURNS setof record > AS 'MODULE_PATHNAME','connectby_text' > Index: contrib/tablefunc/expected/tablefunc.out > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/expected/tablefunc.out,v > retrieving revision 1.5 > diff -c -r1.5 tablefunc.out > *** contrib/tablefunc/expected/tablefunc.out 23 Nov 2002 01:54:09 -0000 1.5 > --- contrib/tablefunc/expected/tablefunc.out 3 Mar 2003 02:58:22 -0000 > *************** > *** 123,128 **** > --- 123,201 ---- > test2 | val5 | val6 | val7 | val8 > (2 rows) > > + -- > + -- hash based crosstab > + -- > + create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text); > + NOTICE: CREATE TABLE will create implicit sequence 'cth_id_seq' for SERIAL column 'cth.id' > + insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42'); > + insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS'); > + -- the next line is intentionally left commented and is therefore a "missing" attribute > + -- insert into cth values(DEFAULT,'test1','01 March 2003','test_startdate','28 February 2003'); > + insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987'); > + insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53'); > + insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL'); > + insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003'); > + insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234'); > + -- return attributes as plain text > + SELECT * FROM crosstab( > + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT attribute FROM cth ORDER BY 1') > + AS c(rowid text, rowdt timestamp, temperature text, test_result text, test_startdate text, volts text); > + rowid | rowdt | temperature | test_result | test_startdate | volts > + -------+--------------------------+-------------+-------------+----------------+-------- > + test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987 > + test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | 01 March 2003 | 3.1234 > + (2 rows) > + > + -- this time without rowdt > + SELECT * FROM crosstab( > + 'SELECT rowid, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT attribute FROM cth ORDER BY 1') > + AS c(rowid text, temperature text, test_result text, test_startdate text, volts text); > + rowid | temperature | test_result | test_startdate | volts > + -------+-------------+-------------+----------------+-------- > + test1 | 42 | PASS | | 2.6987 > + test2 | 53 | FAIL | 01 March 2003 | 3.1234 > + (2 rows) > + > + -- convert attributes to specific datatypes > + SELECT * FROM crosstab( > + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT attribute FROM cth ORDER BY 1') > + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8); > + rowid | rowdt | temperature | test_result | test_startdate | volts > + -------+--------------------------+-------------+-------------+--------------------------+-------- > + test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987 > + test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234 > + (2 rows) > + > + -- source query and category query out of sync > + SELECT * FROM crosstab( > + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT attribute FROM cth WHERE attribute IN (''temperature'',''test_result'',''test_startdate'') ORDER BY1') > + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp); > + rowid | rowdt | temperature | test_result | test_startdate > + -------+--------------------------+-------------+-------------+-------------------------- > + test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | > + test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 > + (2 rows) > + > + -- if category query generates no rows, get expected error > + SELECT * FROM crosstab( > + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT attribute FROM cth WHERE attribute = ''a'' ORDER BY 1') > + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8); > + ERROR: load_categories_hash: provided categories SQL must return 1 column of at least one row > + -- if category query generates more than one column, get expected error > + SELECT * FROM crosstab( > + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT rowdt, attribute FROM cth ORDER BY 2') > + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8); > + ERROR: load_categories_hash: provided categories SQL must return 1 column of at least one row > + -- > + -- connectby > + -- > -- test connectby with text based hierarchy > CREATE TABLE connectby_text(keyid text, parent_keyid text); > \copy connectby_text from 'data/connectby_text.data' > Index: contrib/tablefunc/sql/tablefunc.sql > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/sql/tablefunc.sql,v > retrieving revision 1.6 > diff -c -r1.6 tablefunc.sql > *** contrib/tablefunc/sql/tablefunc.sql 23 Nov 2002 01:54:09 -0000 1.6 > --- contrib/tablefunc/sql/tablefunc.sql 3 Mar 2003 02:51:45 -0000 > *************** > *** 38,43 **** > --- 38,98 ---- > SELECT * FROM crosstab('SELECT rowid, attribute, val FROM ct where rowclass = ''group1'' ORDER BY 1,2;', 3) AS c(rowidtext, att1 text, att2 text, att3 text); > SELECT * FROM crosstab('SELECT rowid, attribute, val FROM ct where rowclass = ''group1'' ORDER BY 1,2;', 4) AS c(rowidtext, att1 text, att2 text, att3 text, att4 text); > > + -- > + -- hash based crosstab > + -- > + create table cth(id serial, rowid text, rowdt timestamp, attribute text, val text); > + insert into cth values(DEFAULT,'test1','01 March 2003','temperature','42'); > + insert into cth values(DEFAULT,'test1','01 March 2003','test_result','PASS'); > + -- the next line is intentionally left commented and is therefore a "missing" attribute > + -- insert into cth values(DEFAULT,'test1','01 March 2003','test_startdate','28 February 2003'); > + insert into cth values(DEFAULT,'test1','01 March 2003','volts','2.6987'); > + insert into cth values(DEFAULT,'test2','02 March 2003','temperature','53'); > + insert into cth values(DEFAULT,'test2','02 March 2003','test_result','FAIL'); > + insert into cth values(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003'); > + insert into cth values(DEFAULT,'test2','02 March 2003','volts','3.1234'); > + > + -- return attributes as plain text > + SELECT * FROM crosstab( > + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT attribute FROM cth ORDER BY 1') > + AS c(rowid text, rowdt timestamp, temperature text, test_result text, test_startdate text, volts text); > + > + -- this time without rowdt > + SELECT * FROM crosstab( > + 'SELECT rowid, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT attribute FROM cth ORDER BY 1') > + AS c(rowid text, temperature text, test_result text, test_startdate text, volts text); > + > + -- convert attributes to specific datatypes > + SELECT * FROM crosstab( > + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT attribute FROM cth ORDER BY 1') > + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8); > + > + -- source query and category query out of sync > + SELECT * FROM crosstab( > + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT attribute FROM cth WHERE attribute IN (''temperature'',''test_result'',''test_startdate'') ORDER BY1') > + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp); > + > + -- if category query generates no rows, get expected error > + SELECT * FROM crosstab( > + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT attribute FROM cth WHERE attribute = ''a'' ORDER BY 1') > + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8); > + > + -- if category query generates more than one column, get expected error > + SELECT * FROM crosstab( > + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', > + 'SELECT DISTINCT rowdt, attribute FROM cth ORDER BY 2') > + AS c(rowid text, rowdt timestamp, temperature int4, test_result text, test_startdate timestamp, volts float8); > + > + > + -- > + -- connectby > + -- > + > -- test connectby with text based hierarchy > CREATE TABLE connectby_text(keyid text, parent_keyid text); > \copy connectby_text from 'data/connectby_text.data' > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
pgsql-patches by date: