Re: [sqlsmith] Failed assertion in joinrels.c - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [sqlsmith] Failed assertion in joinrels.c
Date
Msg-id 2086.1438444757@sss.pgh.pa.us
Whole thread Raw
In response to Re: [sqlsmith] Failed assertion in joinrels.c  (Andreas Seltenreich <seltenreich@gmx.de>)
Responses Re: [sqlsmith] Failed assertion in joinrels.c  (Andreas Seltenreich <seltenreich@gmx.de>)
Re: [sqlsmith] Failed assertion in joinrels.c  (Piotr Stefaniak <postgres@piotr-stefaniak.me>)
[sqlsmith] subplan variable reference / unassigned NestLoopParams (was: [sqlsmith] Failed assertion in joinrels.c)  (Andreas Seltenreich <seltenreich@gmx.de>)
Re: [sqlsmith] Failed assertion in joinrels.c  (Piotr Stefaniak <postgres@piotr-stefaniak.me>)
List pgsql-hackers
Andreas Seltenreich <seltenreich@gmx.de> writes:
> Tom Lane writes:
>> What concerns me more is that what you're finding is only cases that trip
>> an assertion sanity check.  It seems likely that you're also managing to
>> trigger other bugs with less drastic consequences, such as "could not
>> devise a query plan" failures or just plain wrong answers.

> Ja, some of these are logged as well[1], but most of them are really as
> undrastic as can get, and I was afraid reporting them would be more of a
> nuisance.

Well, I certainly think all of these represent bugs:

>      3 | ERROR:  plan should not reference subplan's variable
>      2 | ERROR:  failed to assign all NestLoopParams to plan nodes
>      1 | ERROR:  could not find pathkey item to sort

This I'm not sure about; it could be that the query gave conflicting
collation specifiers, but on the other hand we've definitely had bugs
with people forgetting to run assign_query_collations on subexpressions:

>   4646 | ERROR:  could not determine which collation to use for string comparison

This one's pretty darn odd, because 2619 is pg_statistic and not an index
at all:

>      4 | ERROR:  cache lookup failed for index 2619

These seem likely to be bugs as well, though maybe they are race
conditions during a DROP and not worth fixing:

>   1171 | ERROR:  cache lookup failed for index 16862
>    172 | ERROR:  cache lookup failed for index 257148
>     84 | ERROR:  could not find member 1(34520,34520) of opfamily 1976
>     55 | ERROR:  missing support function 1(34516,34516) in opfamily 1976
>     13 | ERROR:  could not find commutator for operator 34538
>      2 | ERROR:  cache lookup failed for index 12322

I would say anything of the sort that is repeatable definitely deserves
investigation, because even if it's an expectable error condition, we
should be throwing a more user-friendly error message.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Cleaning up missing ERRCODE assignments
Next
From: Kevin Grittner
Date:
Subject: Re: brin index vacuum versus transaction snapshots