Home > mailing lists

[HACKERS] user-defined numeric data types triggering ERROR: unsupported type - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	[HACKERS] user-defined numeric data types triggering ERROR: unsupported type
Date	September 19, 2017 18:59:34
Msg-id	b68441b6-d18f-13ab-b43b-9a72188a4e02@2ndquadrant.com Whole thread Raw
Responses	Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type
List	pgsql-hackers

Tree view

Hi,

while testing a custom data type FIXEDDECIMAL [1], implementing a
numeric-like data type with limited range, I ran into a several issues
that I suspect may not be entirely intentional / expected behavior.

[1] https://github.com/2ndQuadrant/fixeddecimal

Attached is a minimal subset of the extension SQL definition, which may
be more convenient when looking into the issue.

The most important issue is that when planning a simple query, the
estimation of range queries on a column with the custom data type fails
like this:

test=# create table t (a fixeddecimal);
CREATE TABLE
test=# insert into t select random() from generate_series(1,100000);
INSERT 0 100000
test=# analyze t;
ANALYZE
test=# select * from t where a > 0.9;
ERROR:  unsupported type: 16385

The error message here comes from convert_numeric_to_scalar, which gets
called during histogram processing (ineq_histogram_selectivity) when
approximating the histogram. convert_to_scalar does this:

switch (valuetypeid)
{
  ...
  case NUMERICOID:
  ...
    *scaledvalue = convert_numeric_to_scalar(value, valuetypid);
    *scaledlobound = convert_numeric_to_scalar(lobound, boundstypid);
    *scaledhibound = convert_numeric_to_scalar(hibound, boundstypid);
    return true;

  ...
}

The first call works fine, as the constant really is numeric
(valuetypeid=1700). But the histogram boundaries are using the custom
data type, causing the error as convert_numeric_to_scalar expects only a
bunch of hard-coded data types. So it's pretty much guaranteed to fail
with any user-defined data type.

This seems a bit unfortunate :-(

One solution would be to implement custom estimation function, replacing
scalarltsel/scalargtsel. But that seems rather unnecessary, especially
considering there is an implicit cast from fixeddecimal to numeric.
Another thing is that when there's just an MCV, the estimation works
just fine.

So I see two basic ways to fix this:

* Make convert_numeric_to_scalar smarter, so that it checks if there is
an implicit cast to numeric, and fail only if it does not find one.

* Make convert_to_scalar smarter, so that it does return false for
unexpected data types, so that ineq_histogram_selectivity uses the
default estimate of 0.5 (for that one bucket).

Both options seem more favorable than what's happening currently. Is
there anything I missed, making those fixes unacceptable?

If anything, the fact that MCV estimates work while histogram does not
makes this somewhat unpredictable - a change in the data distribution
(or perhaps even just a different sample in ANALYZE) may result in
sudden failures.


I ran into one additional strange thing while investigating this. The
attached SQL script defines two operator classes - fixeddecimal_ops and
fixeddecimal_numeric_ops, defining (fixeddecimal,fixeddecimal) and
(fixeddecimal,numeric) operators. Dropping one of those operator classes
changes the estimates in a somewhat suspicious ways.

When only fixeddecimal_ops is defined, we get this:

test=# explain select * from t where a > 0.1;
                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on t  (cost=0.00..1943.00 rows=33333 width=8)
   Filter: ((a)::numeric > 0.1)
(2 rows)

That is, we get the default estimate for inequality clauses, 33%. But
when only fixeddecimal_numeric_ops, we get this:

test=# explain select * from t where a > 0.1;
                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on t  (cost=0.00..1693.00 rows=50000 width=8)
   Filter: (a > 0.1)
(2 rows)

That is, we get 50% estimate, because that's what scalarineqsel uses
when it ineq_histogram_selectivity can't compute selectivity from a
histogram for some reason.

Wouldn't it make it more sense to use the default 33% estimate here?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

fixeddecimal-minimal.sql

pgsql-hackers by date:

From: Kyotaro HORIGUCHI
Date: 19 September 2017, 18:48:16
Subject: Re: [HACKERS] subscription worker signalling wal writer too much

From: Petr Jelinek
Date: 19 September 2017, 18:59:38
Subject: Re: [HACKERS] src/test/subscription/t/002_types.pl hanging onparticular environment

[HACKERS] user-defined numeric data types triggering ERROR: unsupported type - Mailing list pgsql-hackers

Attachment

Previous

Next