Thread: BUG #16510: Count Distinct with non distinct column in combination with string constants throws error

The following bug has been logged on the website:

Bug reference:      16510
Logged by:          Aceonline
Email address:      aceonline@gmx.de
PostgreSQL version: 11.7
Operating system:   Unix (bitnami/postgresql:11.7.0 image)
Description:

Minimal Reproduction:
CREATE TABLE public.testtable (
column1 int4 NOT NULL,
column2 varchar NULL
);

insert into public.testtable values(1, 'test1');
insert into public.testtable values(2, 'test1');

SELECT
COUNT( DISTINCT (testtable.column2, 'blub') )
FROM
public.testtable;


> On 25 Jun 2020, at 15:02, PG Bug reporting form <noreply@postgresql.org> wrote:

> SELECT
> COUNT( DISTINCT (testtable.column2, 'blub') )
> FROM
> public.testtable;

PostgreSQL doesn't know which datatype you expect 'blub' to be, as it isn't
related to the testtable relation in your query.  If you cast to the datatype
of your choice you will get the expected result.

postgres=# SELECT COUNT(DISTINCT(testtable.column2, 'blub')) FROM public.testtable;
ERROR:  could not identify a comparison function for type unknown
postgres=# SELECT COUNT(DISTINCT(testtable.column2, 'blub'::varchar)) FROM public.testtable;
 count
-------
     1
(1 row)

cheers ./daniel


Daniel Gustafsson <daniel@yesql.se> writes:
> PostgreSQL doesn't know which datatype you expect 'blub' to be, as it isn't
> related to the testtable relation in your query.  If you cast to the datatype
> of your choice you will get the expected result.

> postgres=# SELECT COUNT(DISTINCT(testtable.column2, 'blub')) FROM public.testtable;
> ERROR:  could not identify a comparison function for type unknown

The reason for this might be a little more obvious if you wrote the
implicit row constructor explicitly, ie

SELECT COUNT(DISTINCT ROW(testtable.column2, 'blub')) FROM public.testtable;

The row's datatype is indeterminate as-specified.

Perhaps there's room to argue that we should allow 'unknown' to decay to
'text' automatically in this context, but I'm not in a big hurry to do
that.  It seems better to make people be explicit about which datatype
they intend inside such complex, infrequently-used constructs.

            regards, tom lane



> On 25 Jun 2020, at 16:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The reason for this might be a little more obvious if you wrote the
> implicit row constructor explicitly, ie
>
> SELECT COUNT(DISTINCT ROW(testtable.column2, 'blub')) FROM public.testtable;

That's a good point, that's a clearer explanation.

> Perhaps there's room to argue that we should allow 'unknown' to decay to
> 'text' automatically in this context, but I'm not in a big hurry to do
> that.  It seems better to make people be explicit about which datatype
> they intend inside such complex, infrequently-used constructs.

Agreed, it sounds like something that will just work in most cases but run the
risk of introducing subtle bugs in the cases where it doesn't.

cheers ./daniel