Thread: TEXT vs PG_NODE_TREE in system columns (cross column and expression statistics patch)
TEXT vs PG_NODE_TREE in system columns (cross column and expression statistics patch)
From
Boszormenyi Zoltan
Date:
Hi, attached is the WIP patch for cross-column statistics and extra expression statistics. My question is that why pg_node_tree is unusable as syscache attribute? I attempted to alias it as text in the patch but I get the following error if I try to use it by setting USE_SYSCACHE_FOR_SEARCH to 1 in selfuncs.c. Directly using the underlying pg_statistic3 doesn't cause an error. zozo=# select * from t1 where i+1 = 5; ERROR: could not determine which collation to use for string comparison HINT: Use the COLLATE clause to set the collation explicitly. The table looks like this: create table t1 (id serial primary key, t text, i integer); Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Attachment
Re: TEXT vs PG_NODE_TREE in system columns (cross column and expression statistics patch)
From
Tom Lane
Date:
Boszormenyi Zoltan <zb@cybertec.at> writes: > My question is that why pg_node_tree is unusable as > syscache attribute? I attempted to alias it as text in the patch > but I get the following error if I try to use it by setting > USE_SYSCACHE_FOR_SEARCH to 1 in selfuncs.c. > Directly using the underlying pg_statistic3 doesn't cause an error. I'm not sure what you're running into, but it doesn't matter because the design would be unworkable anyway. Expression text representations could be extremely long, too long to be usable as index keys. I don't believe either of the proposed indexes on the new catalogs are workable, actually, and the catalog definitions themselves seem a bit outre. Why are you setting it up so that stats on expressions and cross-column stats are mutually exclusive? The idea that's used currently is that we only compute stats on expressions that are indexed, so the OID/attnum of the index column can be used as a reference in pg_statistic. I don't see a strong need to deviate from that approach. regards, tom lane
Re: TEXT vs PG_NODE_TREE in system columns (cross column and expression statistics patch)
From
Alvaro Herrera
Date:
Excerpts from Boszormenyi Zoltan's message of jue abr 28 11:03:56 -0300 2011: > Hi, > > attached is the WIP patch for cross-column statistics and > extra expression statistics. > > My question is that why pg_node_tree is unusable as > syscache attribute? I attempted to alias it as text in the patch > but I get the following error if I try to use it by setting > USE_SYSCACHE_FOR_SEARCH to 1 in selfuncs.c. > Directly using the underlying pg_statistic3 doesn't cause an error. Two comments: 1. it seems that expression stats are mostly separate from cross-column stats; does it really make sense to submit the two in the same patch? 2. there are almost no code comments anywhere 3. (bonus) if you're going to copy/paste pg_attribute.h verbatim into the new files, please remove the bits you currently have in "#if 0". (Not to mention the fact that the new catalogs seem rather poorly named). -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: TEXT vs PG_NODE_TREE in system columns (cross column and expression statistics patch)
From
Alvaro Herrera
Date:
Excerpts from Boszormenyi Zoltan's message of jue abr 28 11:03:56 -0300 2011: > My question is that why pg_node_tree is unusable as > syscache attribute? I attempted to alias it as text in the patch > but I get the following error if I try to use it by setting > USE_SYSCACHE_FOR_SEARCH to 1 in selfuncs.c. > Directly using the underlying pg_statistic3 doesn't cause an error. > > zozo=# select * from t1 where i+1 = 5; > ERROR: could not determine which collation to use for string comparison > HINT: Use the COLLATE clause to set the collation explicitly. Maybe the pg_node_tree problem is a bug with the collation feature. If you could reproduce it in unpatched master, I'm sure it'd find a quick death. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: TEXT vs PG_NODE_TREE in system columns (cross column and expression statistics patch)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Boszormenyi Zoltan's message of jue abr 28 11:03:56 -0300 2011: >> ERROR: could not determine which collation to use for string comparison >> HINT: Use the COLLATE clause to set the collation explicitly. > Maybe the pg_node_tree problem is a bug with the collation feature. If > you could reproduce it in unpatched master, I'm sure it'd find a quick > death. Actually, I rather imagine it comes from this choice in catcache.c: /* Currently, there are no catcaches on collation-aware data types */ cache->cc_skey[i].sk_collation = InvalidOid; I'd be more worried about that if I thought it made any sense to use a pg_node_tree column as an index key, but I don't ... regards, tom lane
New WIP patch for cross column statistics Re: TEXT vs PG_NODE_TREE in system columns (cross column and expression statistics patch)
From
Boszormenyi Zoltan
Date:
Hi, 2011-04-28 17:20 keltezéssel, Alvaro Herrera írta: > Excerpts from Boszormenyi Zoltan's message of jue abr 28 11:03:56 -0300 2011: >> Hi, >> >> attached is the WIP patch for cross-column statistics and >> extra expression statistics. >> >> My question is that why pg_node_tree is unusable as >> syscache attribute? I attempted to alias it as text in the patch >> but I get the following error if I try to use it by setting >> USE_SYSCACHE_FOR_SEARCH to 1 in selfuncs.c. >> Directly using the underlying pg_statistic3 doesn't cause an error. > Two comments: > 1. it seems that expression stats are mostly separate from cross-column > stats; does it really make sense to submit the two in the same patch? > > 2. there are almost no code comments anywhere > > 3. (bonus) if you're going to copy/paste pg_attribute.h verbatim into > the new files, please remove the bits you currently have in "#if 0". > (Not to mention the fact that the new catalogs seem rather poorly > named). OK, we went to a different route this time. Here is what we came up with. Attached are two patches. attnum-int2vector.patch implements: - int2vector support routines and catalog entries for them - pg_statistic is modified so "staattnum int2" it converted to "staattnums int2vector". RemoveStatistics() is modified to take an array of AttrNumber and the length of it. - pg_attribute.attstattarget is moved to pg_statistic.statarget, pg_statistic gains a new "stavalid" bool field. Two support routines are added: AddStatistics() and InvalidateStatistics(). Entries in pg_statistic for table columns are created upon table creation and ALTER TABLE ADD COLUMN and maintained for the lifetime of the column. Exceptions are system tables: calling AddStatistics() for them during initdb is a Catch-22 when pg_statistic doesn't yet exist. For these, ANALYZE creates the pg_statistic record just as before. ALTER TABLE ALTER COLUMN SET DATA TYPE only invalidates the record by setting "stavalid" to false. - Factor out common code for getting the statistics tuple into a new function called validate_statistics(). cross-col-syntax.patch builds on the first patch and implements: CREATE CROSS COLUMN STATISTICS ON TABLE tabname (col, ...) [ WITH ( statistics_target ) ] ; DROP CROSS COLUMN STATISTICS ON TABLE tabname (col, ...) ; CREATE CROSS COLUMN STATISTICS ON INDEX idxname [ WITH ( statistics_target ) ] ; DROP CROSS COLUMN STATISTICS ON INDEX idxname ; and puts new records into pg_statistic with array_length(staattnums, 1) > 1. Note: this patch should record dependencies on the respective table or index and the fields but doesn't. The data structure for storing the N-dimension histogram is not yet decided. Comments? Best regards, Zoltán böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/