[HACKERS] user-defined numeric data types triggering ERROR: unsupported type - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | [HACKERS] user-defined numeric data types triggering ERROR: unsupported type |
Date | |
Msg-id | b68441b6-d18f-13ab-b43b-9a72188a4e02@2ndquadrant.com Whole thread Raw |
Responses |
Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type
|
List | pgsql-hackers |
Hi, while testing a custom data type FIXEDDECIMAL [1], implementing a numeric-like data type with limited range, I ran into a several issues that I suspect may not be entirely intentional / expected behavior. [1] https://github.com/2ndQuadrant/fixeddecimal Attached is a minimal subset of the extension SQL definition, which may be more convenient when looking into the issue. The most important issue is that when planning a simple query, the estimation of range queries on a column with the custom data type fails like this: test=# create table t (a fixeddecimal); CREATE TABLE test=# insert into t select random() from generate_series(1,100000); INSERT 0 100000 test=# analyze t; ANALYZE test=# select * from t where a > 0.9; ERROR: unsupported type: 16385 The error message here comes from convert_numeric_to_scalar, which gets called during histogram processing (ineq_histogram_selectivity) when approximating the histogram. convert_to_scalar does this: switch (valuetypeid) { ... case NUMERICOID: ... *scaledvalue = convert_numeric_to_scalar(value, valuetypid); *scaledlobound = convert_numeric_to_scalar(lobound, boundstypid); *scaledhibound = convert_numeric_to_scalar(hibound, boundstypid); return true; ... } The first call works fine, as the constant really is numeric (valuetypeid=1700). But the histogram boundaries are using the custom data type, causing the error as convert_numeric_to_scalar expects only a bunch of hard-coded data types. So it's pretty much guaranteed to fail with any user-defined data type. This seems a bit unfortunate :-( One solution would be to implement custom estimation function, replacing scalarltsel/scalargtsel. But that seems rather unnecessary, especially considering there is an implicit cast from fixeddecimal to numeric. Another thing is that when there's just an MCV, the estimation works just fine. So I see two basic ways to fix this: * Make convert_numeric_to_scalar smarter, so that it checks if there is an implicit cast to numeric, and fail only if it does not find one. * Make convert_to_scalar smarter, so that it does return false for unexpected data types, so that ineq_histogram_selectivity uses the default estimate of 0.5 (for that one bucket). Both options seem more favorable than what's happening currently. Is there anything I missed, making those fixes unacceptable? If anything, the fact that MCV estimates work while histogram does not makes this somewhat unpredictable - a change in the data distribution (or perhaps even just a different sample in ANALYZE) may result in sudden failures. I ran into one additional strange thing while investigating this. The attached SQL script defines two operator classes - fixeddecimal_ops and fixeddecimal_numeric_ops, defining (fixeddecimal,fixeddecimal) and (fixeddecimal,numeric) operators. Dropping one of those operator classes changes the estimates in a somewhat suspicious ways. When only fixeddecimal_ops is defined, we get this: test=# explain select * from t where a > 0.1; QUERY PLAN -------------------------------------------------------- Seq Scan on t (cost=0.00..1943.00 rows=33333 width=8) Filter: ((a)::numeric > 0.1) (2 rows) That is, we get the default estimate for inequality clauses, 33%. But when only fixeddecimal_numeric_ops, we get this: test=# explain select * from t where a > 0.1; QUERY PLAN -------------------------------------------------------- Seq Scan on t (cost=0.00..1693.00 rows=50000 width=8) Filter: (a > 0.1) (2 rows) That is, we get 50% estimate, because that's what scalarineqsel uses when it ineq_histogram_selectivity can't compute selectivity from a histogram for some reason. Wouldn't it make it more sense to use the default 33% estimate here? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: