Thread: [sqlsmith] Failed assertion in joinrels.c

[sqlsmith] Failed assertion in joinrels.c

From

Andreas Seltenreich

Date:

01 August 2015, 00:57:12

Hi,

sqlsmith triggered the following assertion in master (c188204).

TRAP: FailedAssertion("!(!bms_overlap(joinrelids, sjinfo->min_lefthand))", File: "joinrels.c", Line: 500)

As usual, the query is against the regression database.  It is rather
unwieldy… I wonder if I should stop working on new grammar rules and
instead work on some post-processing that prunes the AST as much as
possible while maintaining the failure mode.

regards,
andreas

select subq_647409.c0 as c0, subq_647409.c0 as c1
from public.customer as rel_4116461     left join public.clstr_tst_s as rel_4116555  left join
information_schema.columnsas rel_4116556  on (rel_4116555.rf_a = rel_4116556.ordinal_position )right join
pg_catalog.pg_rolesas rel_4116557on (rel_4116556.maximum_cardinality = rel_4116557.rolconnlimit )     on
(rel_4116461.passwd= rel_4116557.rolpassword )   left join (select    subq_647410.c8 as c0  from    public.char_tbl as
rel_4116611,   lateral (select      rel_4116612.name as c0,      rel_4116612.comment as c1,      rel_4116612.nslots as
c2,     rel_4116612.comment as c3,      rel_4116612.nslots as c4,      rel_4116612.nslots as c5,
rel_4116612.commentas c6,      rel_4116612.comment as c7,      rel_4116612.nslots as c8,      rel_4116612.nslots as c9,
    rel_4116612.nslots as c10    from      public.hub as rel_4116612    where rel_4116612.comment ~>=~
rel_4116612.comment   fetch first 116 rows only) as subq_647410  where (subq_647410.c7 !~~* subq_647410.c7)    or
((subq_647410.c3= subq_647410.c3)      and ((subq_647410.c7 ~* subq_647410.c6)    and (subq_647410.c7 @@
subq_647410.c7))) fetch first 152 rows only) as subq_647409     inner join public.int4_tbl as rel_4116661  inner join
public.shoeas rel_4116662  on (rel_4116661.f1 = rel_4116662.sh_avail )inner join public.rtest_vview3 as rel_4116663on
(rel_4116661.f1= rel_4116663.a )     on (subq_647409.c0 = rel_4116662.sh_avail )   on (rel_4116555.b = rel_4116661.f1 ) 
where ((rel_4116557.rolvaliduntil is NULL)   or (rel_4116663.b !~ rel_4116461.name)) or (rel_4116661.f1 is not NULL)
fetch first 80 rows only;

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

01 August 2015, 01:11:32

Andreas Seltenreich <seltenreich@gmx.de> writes:
> sqlsmith triggered the following assertion in master (c188204).

> TRAP: FailedAssertion("!(!bms_overlap(joinrelids, sjinfo->min_lefthand))", File: "joinrels.c", Line: 500)

Cool, I'll take a look.

> As usual, the query is against the regression database.  It is rather
> unwieldy… I wonder if I should stop working on new grammar rules and
> instead work on some post-processing that prunes the AST as much as
> possible while maintaining the failure mode.

Probably not really worth the trouble; I find it's usually easy to
produce a minimized test case after the failure cause is understood.

What concerns me more is that what you're finding is only cases that trip
an assertion sanity check.  It seems likely that you're also managing to
trigger other bugs with less drastic consequences, such as "could not
devise a query plan" failures or just plain wrong answers.  I'm not sure
how we could identify wrong answers automatically :-( but it might be
worth checking for XX000 SQLSTATE responses, since generally that should
be a can't-happen case.  (Or if it can happen, we need to change the
errcode.)
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Andreas Seltenreich

Date:

01 August 2015, 09:52:08

Tom Lane writes:

> What concerns me more is that what you're finding is only cases that trip
> an assertion sanity check.  It seems likely that you're also managing to
> trigger other bugs with less drastic consequences, such as "could not
> devise a query plan" failures or just plain wrong answers.

Ja, some of these are logged as well[1], but most of them are really as
undrastic as can get, and I was afraid reporting them would be more of a
nuisance.  I analysed a couple of the cache lookup failures, and they
all had a similar severreness than the example in the README[2].  The
operator ones I analysed seem due to intentionally broken operators in
the regression db.  The NestLoopParams and subplan reference one sound
interesting though…

> I'm not sure how we could identify wrong answers automatically :-(

Csmith isn't doing this either.  They discuss differential testing
though in their papers, i.e., comparing the results of different
products.  Maybe a simple metric like numbers of rows returned might
already be valuable for correctness checks.

I also thought about doing some sampling on the data and simulating
relational operations and check for witness tuples, but it is probably
not appropriate to start implementing a mini-rdbms on the client side.

> but it might be worth checking for XX000 SQLSTATE responses, since
> generally that should be a can't-happen case.  (Or if it can happen,
> we need to change the errcode.)

The sqlstate is currently missing in the reports because libpqxx is not
putting it in it's exceptions :-/.

regards,
Andreas

Footnotes:
[1]  smith=# select * from report24h;count |                                  error
-------+--------------------------------------------------------------------------43831 | ERROR:  unsupported XML
feature39496| ERROR:  invalid regular expression: quantifier operand invalid27261 | ERROR:  canceling statement due to
statementtimeout21386 | ERROR:  operator does not exist: point = point 8580 | ERROR:  cannot compare arrays of
differentelement types 5019 | ERROR:  invalid regular expression: brackets [] not balanced 4646 | ERROR:  could not
determinewhich collation to use for string comparison 2583 | ERROR:  invalid regular expression: nfa has too many
states2248 | ERROR:  operator does not exist: xml = xml 1198 | ERROR:  operator does not exist: polygon = polygon 1171
|ERROR:  cache lookup failed for index 16862  677 | ERROR:  invalid regular expression: parentheses () not balanced
172| ERROR:  cache lookup failed for index 257148   84 | ERROR:  could not find member 1(34520,34520) of opfamily 1976
55 | ERROR:  missing support function 1(34516,34516) in opfamily 1976   42 | ERROR:  operator does not exist:
city_budget= city_budget   13 | ERROR:  could not find commutator for operator 34538   10 | ERROR:  could not identify
acomparison function for type xid    4 | Connection to database failed    4 | ERROR:  cache lookup failed for index
2619   3 | ERROR:  plan should not reference subplan's variable    2 | ERROR:  cache lookup failed for index 12322    2
|ERROR:  failed to assign all NestLoopParams to plan nodes    2 | ERROR:  invalid regular expression: invalid character
range   1 | ERROR:  could not find pathkey item to sort 
(25 rows)
Time: 1158,990 ms

[2]  https://github.com/anse1/sqlsmith/blob/master/README.org

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

01 August 2015, 15:59:33

Andreas Seltenreich <seltenreich@gmx.de> writes:
> Tom Lane writes:
>> What concerns me more is that what you're finding is only cases that trip
>> an assertion sanity check.  It seems likely that you're also managing to
>> trigger other bugs with less drastic consequences, such as "could not
>> devise a query plan" failures or just plain wrong answers.

> Ja, some of these are logged as well[1], but most of them are really as
> undrastic as can get, and I was afraid reporting them would be more of a
> nuisance.

Well, I certainly think all of these represent bugs:

>      3 | ERROR:  plan should not reference subplan's variable
>      2 | ERROR:  failed to assign all NestLoopParams to plan nodes
>      1 | ERROR:  could not find pathkey item to sort

This I'm not sure about; it could be that the query gave conflicting
collation specifiers, but on the other hand we've definitely had bugs
with people forgetting to run assign_query_collations on subexpressions:

>   4646 | ERROR:  could not determine which collation to use for string comparison

This one's pretty darn odd, because 2619 is pg_statistic and not an index
at all:

>      4 | ERROR:  cache lookup failed for index 2619

These seem likely to be bugs as well, though maybe they are race
conditions during a DROP and not worth fixing:

>   1171 | ERROR:  cache lookup failed for index 16862
>    172 | ERROR:  cache lookup failed for index 257148
>     84 | ERROR:  could not find member 1(34520,34520) of opfamily 1976
>     55 | ERROR:  missing support function 1(34516,34516) in opfamily 1976
>     13 | ERROR:  could not find commutator for operator 34538
>      2 | ERROR:  cache lookup failed for index 12322

I would say anything of the sort that is repeatable definitely deserves
investigation, because even if it's an expectable error condition, we
should be throwing a more user-friendly error message.
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Andreas Seltenreich

Date:

01 August 2015, 18:41:19

Tom Lane writes:

> Well, I certainly think all of these represent bugs:
>
> [...]

thanks for priorizing them.  I'll try to digest them somewhat before
posting.

> This one's pretty darn odd, because 2619 is pg_statistic and not an index
> at all:
>
>>      4 | ERROR:  cache lookup failed for index 2619

This is actually the one from the README :-).  Quoting to spare media
discontinuity:

--8<---------------cut here---------------start------------->8---
Taking a closer look at it reveals that it happens when you query a
certain catalog view like this:
 self=# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL; FEHLER:  cache lookup failed for index
2619

This is because the planner then puts pg_get_indexdef(oid) in a context
where it sees non-index-oids, which causes it to croak:
                                    QUERY PLAN
------------------------------------------------------------------------------------Hash Join  (cost=17.60..30.65
rows=9width=4)  Hash Cond: (i.oid = x.indexrelid)  ->  Seq Scan on pg_class i  (cost=0.00..12.52 rows=114 width=8)
 Filter: ((pg_get_indexdef(oid) IS NOT NULL) AND (relkind = 'i'::"char"))  ->  Hash  (cost=17.31..17.31 rows=23
width=4)       ->  Hash Join  (cost=12.52..17.31 rows=23 width=4)              Hash Cond: (x.indrelid = c.oid)
   ->  Seq Scan on pg_index x  (cost=0.00..4.13 rows=113 width=8)              ->  Hash  (cost=11.76..11.76 rows=61
width=8)                   ->  Seq Scan on pg_class c  (cost=0.00..11.76 rows=61 width=8)
Filter:(relkind = ANY ('{r,m}'::"char"[]))
 
--8<---------------cut here---------------end--------------->8---

thanks,
Andreas

Re: [sqlsmith] Failed assertion in joinrels.c

From

Piotr Stefaniak

Date:

02 August 2015, 07:59:45

On 08/01/2015 05:59 PM, Tom Lane wrote:
> Well, I certainly think all of these represent bugs:
>

>>       1 | ERROR:  could not find pathkey item to sort

sqlsmith was able to find these two queries that trigger the error on my 
machine:

ERROR:  could not find pathkey item to sort
STATEMENT:  select          rel_16564180.id1 as c0        from          public.bprime as rel_16564178            left
join(select                    rel_16564179.b as c0,                    rel_16564179.c as c1                  from
             public.clstr_tst as rel_16564179                  where rel_16564179.c < rel_16564179.d) as subq_2610919
          inner join public.num_result as rel_16564180              on (subq_2610919.c0 = rel_16564180.id1 )
on(rel_16564178.thousand = subq_2610919.c0 )        where subq_2610919.c1 = subq_2610919.c1        fetch first 122 rows
only;

STATEMENT:  select          subq_1991714.c0 as c0,          subq_1991712.c0 as c1,          rel_12624817.a as c2
from         (select                  rel_12624653.id2 as c0,                  rel_12624653.id2 as c1
from                 public.num_exp_add as rel_12624653                where rel_12624653.expected >=
rel_12624653.expected               fetch first 137 rows only) as subq_1991697            inner join (select
       rel_12624805.z as c0                  from                    public.insert_tbl as rel_12624805
whererel_12624805.y <= rel_12624805.y                  fetch first 108 rows only) as subq_1991712              left
joinpublic.clstr_tst as rel_12624817                left join public.main_table as rel_12624818                on
(rel_12624817.b= rel_12624818.a )              on (subq_1991712.c0 = rel_12624818.a )            on (subq_1991697.c0 =
rel_12624818.a),          lateral (select                rel_12624819.tgdeferrable as c0              from
 pg_catalog.pg_trigger as rel_12624819              where (rel_12624819.tgconstraint >= rel_12624819.tgrelid)
    and ((rel_12624819.tgconstrindid <= rel_12624819.tgrelid)                  or (EXISTS (                    select
                    rel_12624820.grantor as c0,                        rel_12624820.grantee as c1,
 rel_12624820.is_grantable as c2,                        rel_12624820.routine_schema as c3,
rel_12624820.specific_schemaas c4,                        rel_12624820.grantee as c5,
rel_12624820.routine_catalogas c6,                        rel_12624820.routine_schema as c7,
rel_12624820.specific_nameas c8                      from                        information_schema.role_routine_grants
as
 
rel_12624820                      where 22 >= 29                      fetch first 138 rows only)))              fetch
first93 rows only) as subq_1991714        where rel_12624817.d = rel_12624817.c        fetch first 97 rows only;

Re: [sqlsmith] Failed assertion in joinrels.c

From

Peter Geoghegan

Date:

02 August 2015, 08:39:33

On Fri, Jul 31, 2015 at 5:56 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> sqlsmith triggered the following assertion in master (c188204).

Thanks for writing sqlsmith. It seems like a great tool.

I wonder, are you just running the tool with assertions enabled when
PostgreSQL is built? If so, it might make sense to make various
problems more readily detected. As you may know, Clang has a pretty
decent option called AddressSanitizer that can detect memory errors as
they occur with an overhead that is not excessive. One might use the
following configure arguments when building PostgreSQL to use
AddressSanitizer:

./configure CC=clang CFLAGS='-O1 -g -fsanitize=address
-fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert

Of course, it remains to be seen if this pays for itself. Apparently
the tool has about a 2x overhead [1]. I'm really not sure that you'll
find any more bugs this way, but it's certainly possible that you'll
find a lot more. Given your success in finding bugs without using
AddressSanitizer, introducing it may be premature.

[1] http://clang.llvm.org/docs/AddressSanitizer.html#introduction
-- 
Peter Geoghegan

[sqlsmith] subplan variable reference / unassigned NestLoopParams (was: [sqlsmith] Failed assertion in joinrels.c)

From

Andreas Seltenreich

Date:

02 August 2015, 09:13:42

Tom Lane writes:

> Well, I certainly think all of these represent bugs:
>
>>      3 | ERROR:  plan should not reference subplan's variable
>>      2 | ERROR:  failed to assign all NestLoopParams to plan nodes

These appear to be related.  The following query produces the former,
but if you replace the very last reference of provider with the literal
'bar', it raises the latter error.

select 1 from pg_catalog.pg_shseclabel as rel_09     inner join public.rtest_view2 as rel_32  left join
pg_catalog.pg_rolesas rel_33  on (rel_32.a = rel_33.rolconnlimit )     on (rel_09.provider = rel_33.rolpassword )
leftjoin pg_catalog.pg_user as rel_35   on (rel_33.rolconfig = rel_35.useconfig )

where ( ((rel_09.provider ~<~ 'foo')     and (rel_35.usename ~* rel_09.provider)));

,----[ FWIW: git bisect run ]
| first bad commit: [e83bb10d6dcf05a666d4ada00d9788c7974ad378]
| Adjust definition of cheapest_total_path to work better with LATERAL.
`----

regards,
Andreas

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

02 August 2015, 20:18:40

Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
> On 08/01/2015 05:59 PM, Tom Lane wrote:
>> Well, I certainly think all of these represent bugs:
>>> 1 | ERROR:  could not find pathkey item to sort

> sqlsmith was able to find these two queries that trigger the error on my 
> machine:

Hmm ... I see no error with these queries as of today's HEAD or
back-branch tips.  I surmise that this was triggered by one of the other
recently-fixed bugs, though the connection isn't obvious offhand.
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Piotr Stefaniak

Date:

03 August 2015, 05:23:43

On 08/02/2015 10:18 PM, Tom Lane wrote:
> Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
>> On 08/01/2015 05:59 PM, Tom Lane wrote:
>>> Well, I certainly think all of these represent bugs:
>>>> 1 | ERROR:  could not find pathkey item to sort
>
>> sqlsmith was able to find these two queries that trigger the error
>> on my machine:
>
> Hmm ... I see no error with these queries as of today's HEAD or
> back-branch tips.  I surmise that this was triggered by one of the
> other recently-fixed bugs, though the connection isn't obvious
> offhand.

Yeah, I'm no longer able to reproduce it either.

Re: [sqlsmith] Failed assertion in joinrels.c

From

Piotr Stefaniak

Date:

03 August 2015, 05:50:43

On 08/01/2015 05:59 PM, Tom Lane wrote:
> Well, I certainly think all of these represent bugs:

How about this one?

1    ERROR:  could not find RelOptInfo for given relids

It's triggered on 13bba02271dce865cd20b6f49224889c73fed4e7 by this query
and the attached one:

select 1
from public.nums as r5006875
           inner join information_schema.domains as r5006876
           on (r5006875.n = r5006876.character_maximum_length )
         inner join (select
               r5006878.z as c5
             from testxmlschema.test2 as r5006878
             where r5006878.q >= r5006878.q) as subq_771817
         on (r5006875.n = subq_771817.c5 )
       right join public.shoelace_arrive as r5006879
       on (r5006875.n = r5006879.arr_quant )
     left join pg_catalog.pg_tablespace as r5006966
         left join pg_catalog.pg_stat_user_functions as r5006967
         on (r5006966.spcowner = r5006967.funcid )
       inner join pg_catalog.pg_authid as r5006968
       on (r5006967.funcname = r5006968.rolname )
     on (subq_771817.c5 = r5006968.rolconnlimit )
where r5006875.n is NULL;

Attachment

reloptinfo-error.sql

Re: [sqlsmith] Failed assertion in joinrels.c

From

Andreas Seltenreich

Date:

03 August 2015, 06:50:17

Peter Geoghegan writes:

> On Fri, Jul 31, 2015 at 5:56 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>> sqlsmith triggered the following assertion in master (c188204).
>
> Thanks for writing sqlsmith. It seems like a great tool.
>
> I wonder, are you just running the tool with assertions enabled when
> PostgreSQL is built?

Right.  I have to admit my testing setup is still more tailored towards
testing sqlsmith than postgres.

> If so, it might make sense to make various problems more readily
> detected.  As you may know, Clang has a pretty decent option called
> AddressSanitizer that can detect memory errors as they occur with an
> overhead that is not excessive.

I didn't known this clang feature yet, thanks for pointing it out.  I
considered running some instances under valgrind to detect these, but
the performance penalty seemed not worth it.

> One might use the following configure arguments when building
> PostgreSQL to use AddressSanitizer:
>
> ./configure CC=clang CFLAGS='-O1 -g -fsanitize=address
> -fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert

A quick attempt to sneak these in made my ansible playbooks unhappy due
to "make check" failures and other generated noise.  I'll try to have an
instance with the AddressSanitizer active soon though.

> Of course, it remains to be seen if this pays for itself. Apparently
> the tool has about a 2x overhead [1]. I'm really not sure that you'll
> find any more bugs this way, but it's certainly possible that you'll
> find a lot more. Given your success in finding bugs without using
> AddressSanitizer, introducing it may be premature.

Piotr also suggested on IRC to run coverage tests w/ sqlsmith.  This
could yield valuable hints in which direction to extend sqlsmith's
grammar.

Thanks,
Andreas

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

03 August 2015, 19:18:58

Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
> How about this one?
> 1    ERROR:  could not find RelOptInfo for given relids

That would be a bug, for sure ...

> It's triggered on 13bba02271dce865cd20b6f49224889c73fed4e7 by this query 
> and the attached one:

... but I can't reproduce it on HEAD with either of these queries.
Not clear why you're getting different results.
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Piotr Stefaniak

Date:

04 August 2015, 17:31:41

On 08/03/2015 09:18 PM, Tom Lane wrote:
> ... but I can't reproduce it on HEAD with either of these queries.
> Not clear why you're getting different results.

I'm terribly sorry, but I didn't notice that postgresql.conf was modified...

Set join_collapse_limit = 32 and you should see the error.

Re: [sqlsmith] subplan variable reference / unassigned NestLoopParams (was: [sqlsmith] Failed assertion in joinrels.c)

From

Tom Lane

Date:

04 August 2015, 19:00:53

Andreas Seltenreich <seltenreich@gmx.de> writes:
> Tom Lane writes:
>> Well, I certainly think all of these represent bugs:
>>> 3 | ERROR:  plan should not reference subplan's variable
>>> 2 | ERROR:  failed to assign all NestLoopParams to plan nodes

> These appear to be related.  The following query produces the former,
> but if you replace the very last reference of provider with the literal
> 'bar', it raises the latter error.

Fixed that, thanks for the test case!

> ,----[ FWIW: git bisect run ]
> | first bad commit: [e83bb10d6dcf05a666d4ada00d9788c7974ad378]
> | Adjust definition of cheapest_total_path to work better with LATERAL.
> `----

There's still something fishy about your git bisect results; they don't
have much to do with what seems to me to be the triggering condition.
I suspect the problem is that git bisect doesn't allow for the possibility
that the symptom might appear and disappear over time, ie it might have
been visible at some early stage of the LATERAL work but been fixed later,
and then reintroduced by still-later optimization efforts.
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Ewan Higgs

Date:

04 August 2015, 21:56:12

Hi there,
I've been following the sqlsmith work and wanted to jump in and try it out. I took Peter's idea and tried building postgres with the flags suggested but it was hard to get anything working. 

I'm on commit 85e5e222b1dd02f135a8c3bf387d0d6d88e669bd (Tue Aug 4 14:55:32 2015 -0400)

Configure arguments:
./configure --prefix=$HOME/pkg CC=clang CFLAGS='-O1 -g -fsanitize=address -fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert

I had to make a simple leak suppression file:

$ cat leak.supp
leak:save_ps_display_args
leak:__GI___strdup
$ export LSAN_OPTIONS=suppressions=leak.supp

And then I could run postgres. After 50,000 queries, I'm left with the following report:

queries: 50514
AST stats (avg): height = 7.29877 nodes = 37.8156
296    ERROR:  canceling statement due to statement timeout
166    ERROR:  invalid regular expression: quantifier operand invalid
26     ERROR:  could not determine which collation to use for string comparison
23     ERROR:  cannot compare arrays of different element types
12     ERROR:  invalid regular expression: brackets [] not balanced
5      ERROR:  cache lookup failed for index 2619
2      ERROR:  invalid regular expression: parentheses () not balanced
error rate: 0.0104921

AddressSanitizer didn't fire except for the suppressed leaks. The suppressed leaks were only hit at the beginning:

-----------------------------------------------------
Suppressions used:
  count      bytes template
      1        520 save_ps_display_args
      1         10 __GI___strdup
-----------------------------------------------------


sqlsmith is a cool little piece of kit and I see a lot of room for on going work (performance bumps for more queries per second; more db back ends; different fuzzers). 

Yours,
Ewan Higgs



 From: Peter Geoghegan <pg@heroku.com>
To: Andreas Seltenreich <seltenreich@gmx.de> 
Cc: Pg Hackers <pgsql-hackers@postgresql.org> 
Sent: Sunday, 2 August 2015, 10:39
Subject: Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

On Fri, Jul 31, 2015 at 5:56 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> sqlsmith triggered the following assertion in master (c188204).

Thanks for writing sqlsmith. It seems like a great tool.

I wonder, are you just running the tool with assertions enabled when
PostgreSQL is built? If so, it might make sense to make various
problems more readily detected. As you may know, Clang has a pretty
decent option called AddressSanitizer that can detect memory errors as
they occur with an overhead that is not excessive. One might use the
following configure arguments when building PostgreSQL to use
AddressSanitizer:

./configure CC=clang CFLAGS='-O1 -g -fsanitize=address
-fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert

Of course, it remains to be seen if this pays for itself. Apparently
the tool has about a 2x overhead [1]. I'm really not sure that you'll
find any more bugs this way, but it's certainly possible that you'll
find a lot more. Given your success in finding bugs without using
AddressSanitizer, introducing it may be premature.

[1] http://clang.llvm.org/docs/AddressSanitizer.html#introduction
-- 
Peter Geoghegan




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

05 August 2015, 00:25:07

Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
> On 08/03/2015 09:18 PM, Tom Lane wrote:
>> ... but I can't reproduce it on HEAD with either of these queries.
>> Not clear why you're getting different results.

> I'm terribly sorry, but I didn't notice that postgresql.conf was modified...
> Set join_collapse_limit = 32 and you should see the error.

Ah ... now I get that error on the smaller query, but the larger one
(that you put in an attachment) still doesn't show any problem.
Do you have any other nondefault settings?
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Piotr Stefaniak

Date:

05 August 2015, 05:20:57

On 08/05/2015 02:24 AM, Tom Lane wrote:
> Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
>> On 08/03/2015 09:18 PM, Tom Lane wrote:
>>> ... but I can't reproduce it on HEAD with either of these queries.
>>> Not clear why you're getting different results.
>
>> I'm terribly sorry, but I didn't notice that postgresql.conf was modified...
>> Set join_collapse_limit = 32 and you should see the error.
>
> Ah ... now I get that error on the smaller query, but the larger one
> (that you put in an attachment) still doesn't show any problem.
> Do you have any other nondefault settings?

Sorry, that needs from_collapse_limit = 32 as well.

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

05 August 2015, 18:45:15

Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
> On 08/05/2015 02:24 AM, Tom Lane wrote:
>> Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
>>> Set join_collapse_limit = 32 and you should see the error.

>> Ah ... now I get that error on the smaller query, but the larger one
>> (that you put in an attachment) still doesn't show any problem.
>> Do you have any other nondefault settings?

> Sorry, that needs from_collapse_limit = 32 as well.

Yeah, I assumed as much, but it still doesn't happen for me.  Possibly
something platform-dependent in statistics, or something like that.

Anyway, I fixed the problem exposed by the smaller query; would you
check that the larger one is okay for you now?
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Andreas Seltenreich

Date:

05 August 2015, 19:08:33

Tom Lane writes:

>> On 08/01/2015 05:59 PM, Tom Lane wrote:
>>> Well, I certainly think all of these represent bugs:
>>>> 1 | ERROR:  could not find pathkey item to sort
>
> Hmm ... I see no error with these queries as of today's HEAD or
> back-branch tips.  I surmise that this was triggered by one of the other
> recently-fixed bugs, though the connection isn't obvious offhand.

I still see this error in master as of b8cbe43, but the queries are
indeed a pita to reproduce.  The one below is the only one so far that
is robust against running ANALYZE on the regression db, and also
reproduces when I run it as an EXTRA_TEST with make check.

regards,
Andreas

select rel_217088662.a as c0, rel_217088554.a as c1, rel_217088662.b as c2, subq_34235266.c0 as c3, rel_217088660.id2
asc4, rel_217088660.id2 as c5
 
from public.clstr_tst as rel_217088554   inner join (select       rel_217088628.a as c0     from
public.rtest_vview3as rel_217088628     where (rel_217088628.b !~ rel_217088628.b)       and ((((rel_217088628.b ~~*
rel_217088628.b)        or (rel_217088628.b ~* rel_217088628.b))       or (rel_217088628.b <> rel_217088628.b))     or
(rel_217088628.b= rel_217088628.b))) as subq_34235266 inner join public.num_exp_mul as rel_217088660   inner join
public.onek2as rel_217088661   on (rel_217088660.id1 = rel_217088661.unique1 ) on (subq_34235266.c0 = rel_217088660.id1
)    inner join public.main_table as rel_217088662     on (rel_217088661.unique2 = rel_217088662.a )   on
(rel_217088554.b= rel_217088660.id1 )
 
where rel_217088554.d = rel_217088554.c
fetch first 94 rows only;

Re: [sqlsmith] Failed assertion in joinrels.c

From

Piotr Stefaniak

Date:

05 August 2015, 19:36:15

On 08/05/2015 08:43 PM, Tom Lane wrote:
> Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
>> On 08/05/2015 02:24 AM, Tom Lane wrote:
>>> Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
>>>> Set join_collapse_limit = 32 and you should see the error.
>
>>> Ah ... now I get that error on the smaller query, but the larger one
>>> (that you put in an attachment) still doesn't show any problem.
>>> Do you have any other nondefault settings?
>
>> Sorry, that needs from_collapse_limit = 32 as well.
>
> Yeah, I assumed as much, but it still doesn't happen for me.  Possibly
> something platform-dependent in statistics, or something like that.
>
> Anyway, I fixed the problem exposed by the smaller query; would you
> check that the larger one is okay for you now?

I can't reproduce the error in either case, so that seems to be fixed 
now. Thanks!

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

07 August 2015, 00:21:51

Andreas Seltenreich <seltenreich@gmx.de> writes:
> Tom Lane writes:
>> On 08/01/2015 05:59 PM, Tom Lane wrote:
>>> Well, I certainly think all of these represent bugs:
>>> 1 | ERROR:  could not find pathkey item to sort

>> Hmm ... I see no error with these queries as of today's HEAD or
>> back-branch tips.  I surmise that this was triggered by one of the other
>> recently-fixed bugs, though the connection isn't obvious offhand.

> I still see this error in master as of b8cbe43, but the queries are
> indeed a pita to reproduce.  The one below is the only one so far that
> is robust against running ANALYZE on the regression db, and also
> reproduces when I run it as an EXTRA_TEST with make check.

Got that one, thanks!  But we've seen this error message before in all
sorts of weird situations, so I wouldn't assume that all the cases you've
come across had the same cause.
        regards, tom lane

Re: [sqlsmith] subplan variable reference / unassigned NestLoopParams

From

Andreas Seltenreich

Date:

08 August 2015, 14:28:53

Tom Lane writes:

> Andreas Seltenreich <seltenreich@gmx.de> writes:
>> Tom Lane writes:
>>> Well, I certainly think all of these represent bugs:
>>>> 3 | ERROR:  plan should not reference subplan's variable
>>>> 2 | ERROR:  failed to assign all NestLoopParams to plan nodes
>
>> These appear to be related.  The following query produces the former,
>> but if you replace the very last reference of provider with the literal
>> 'bar', it raises the latter error.
>
> Fixed that, thanks for the test case!

I haven't seen the former since your commit, but there recently were
some instances of the latter.  The attached queries all throw the error
against master at 8752bbb.

regards,
Andreas

Attachment

nestloopparams.sql

Re: [sqlsmith] subplan variable reference / unassigned NestLoopParams

From

Tom Lane

Date:

10 August 2015, 21:23:06

Andreas Seltenreich <seltenreich@gmx.de> writes:
> Tom Lane writes:
>> Andreas Seltenreich <seltenreich@gmx.de> writes:
>>> Tom Lane writes:
>>>> Well, I certainly think all of these represent bugs:
>>>> 3 | ERROR:  plan should not reference subplan's variable
>>>> 2 | ERROR:  failed to assign all NestLoopParams to plan nodes

>>> These appear to be related.  The following query produces the former,
>>> but if you replace the very last reference of provider with the literal
>>> 'bar', it raises the latter error.

>> Fixed that, thanks for the test case!

> I haven't seen the former since your commit, but there recently were
> some instances of the latter.  The attached queries all throw the error
> against master at 8752bbb.

Turns out my patch only addressed some cases of the underlying issue.
I've gone back for a second swipe at it; thanks for the continued testing!

BTW, the commit hashes you've been mentioning don't seem to correspond to
anything in the master PG repo ... are you working with modified sources?
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Michael Paquier

Date:

04 June 2016, 12:29:40

On Fri, Aug 7, 2015 at 9:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andreas Seltenreich <seltenreich@gmx.de> writes:
>> Tom Lane writes:
>>> On 08/01/2015 05:59 PM, Tom Lane wrote:
>>>> Well, I certainly think all of these represent bugs:
>>>> 1 | ERROR:  could not find pathkey item to sort
>
>>> Hmm ... I see no error with these queries as of today's HEAD or
>>> back-branch tips.  I surmise that this was triggered by one of the other
>>> recently-fixed bugs, though the connection isn't obvious offhand.
>
>> I still see this error in master as of b8cbe43, but the queries are
>> indeed a pita to reproduce.  The one below is the only one so far that
>> is robust against running ANALYZE on the regression db, and also
>> reproduces when I run it as an EXTRA_TEST with make check.
>
> Got that one, thanks!  But we've seen this error message before in all
> sorts of weird situations, so I wouldn't assume that all the cases you've
> come across had the same cause.

(resurrecting an old thread)

This is still failing:
=# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL;
ERROR:  XX000: cache lookup failed for index 2619
LOCATION:  pg_get_indexdef_worker, ruleutils.c:1054
Do we want to improve at least the error message?
-- 
Michael

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

04 June 2016, 15:29:57

Michael Paquier <michael.paquier@gmail.com> writes:
> This is still failing:
> =# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL;
> ERROR:  XX000: cache lookup failed for index 2619
> LOCATION:  pg_get_indexdef_worker, ruleutils.c:1054
> Do we want to improve at least the error message?

Maybe we should just make the function silently return NULL if the OID
doesn't refer to an index relation.  There's precedent for that sort
of behavior ...
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Michael Paquier

Date:

05 June 2016, 07:28:19

On Sun, Jun 5, 2016 at 12:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> This is still failing:
>> =# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL;
>> ERROR:  XX000: cache lookup failed for index 2619
>> LOCATION:  pg_get_indexdef_worker, ruleutils.c:1054
>> Do we want to improve at least the error message?
>
> Maybe we should just make the function silently return NULL if the OID
> doesn't refer to an index relation.  There's precedent for that sort
> of behavior ...

For views it is simply returned "Not a view", and for rules that's
"-". All the others behave like similarly to indexes:
=# select pg_get_constraintdef(0);
ERROR:  XX000: cache lookup failed for constraint 0
=# select pg_get_triggerdef(0);
ERROR:  XX000: could not find tuple for trigger 0
=# select pg_get_functiondef(0);
ERROR:  XX000: cache lookup failed for function 0
=# select pg_get_viewdef(0);pg_get_viewdef
----------------Not a view
(1 row)
=# select pg_get_ruledef(0);pg_get_ruledef
-----------------
(1 row)

So yes we could just return NULL for all of them, or make them throw
an error. Anyway, my vote goes for a HEAD-only change, and just throw
NULL... This way an application still has the option to fallback to
something else with a CASE at SQL level without using plpgsql to catch
the error.

Also, I have not monitored the code much regarding some code paths
that rely on the existing rules, but I would imagine that there are
things depending on the current behavior as well.
-- 
Michael

Re: [sqlsmith] Failed assertion in joinrels.c

From

Michael Paquier

Date:

05 June 2016, 09:29:33

On Sun, Jun 5, 2016 at 4:28 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sun, Jun 5, 2016 at 12:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Michael Paquier <michael.paquier@gmail.com> writes:
>>> This is still failing:
>>> =# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL;
>>> ERROR:  XX000: cache lookup failed for index 2619
>>> LOCATION:  pg_get_indexdef_worker, ruleutils.c:1054
>>> Do we want to improve at least the error message?
>>
>> Maybe we should just make the function silently return NULL if the OID
>> doesn't refer to an index relation.  There's precedent for that sort
>> of behavior ...
>
> For views it is simply returned "Not a view", and for rules that's
> "-". All the others behave like similarly to indexes:
> =# select pg_get_constraintdef(0);
> ERROR:  XX000: cache lookup failed for constraint 0
> =# select pg_get_triggerdef(0);
> ERROR:  XX000: could not find tuple for trigger 0
> =# select pg_get_functiondef(0);
> ERROR:  XX000: cache lookup failed for function 0
> =# select pg_get_viewdef(0);
>  pg_get_viewdef
> ----------------
>  Not a view
> (1 row)
> =# select pg_get_ruledef(0);
>  pg_get_ruledef
> ----------------
>  -
> (1 row)
>
> So yes we could just return NULL for all of them, or make them throw
> an error. Anyway, my vote goes for a HEAD-only change, and just throw
> NULL... This way an application still has the option to fallback to
> something else with a CASE at SQL level without using plpgsql to catch
> the error.
>
> Also, I have not monitored the code much regarding some code paths
> that rely on the existing rules, but I would imagine that there are
> things depending on the current behavior as well.

Still, on back branches I think that it would be a good idea to have a
better error message for the ones that already throw an error, like
"trigger with OID %u does not exist". Switching from elog to ereport
would be a good idea, but wouldn't it be a problem to change what is
user-visible?
-- 
Michael

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

05 June 2016, 14:16:28

Michael Paquier <michael.paquier@gmail.com> writes:
> Still, on back branches I think that it would be a good idea to have a
> better error message for the ones that already throw an error, like
> "trigger with OID %u does not exist". Switching from elog to ereport
> would be a good idea, but wouldn't it be a problem to change what is
> user-visible?

If we're going to touch this behavior in the back branches, I would
vote for changing it the same as we do in HEAD.  I do not think that
making a user-visible behavior change in a minor release and then a
different change in the next major is good strategy.

But, given the relative shortage of complaints from the field, I'd
be more inclined to just do nothing in back branches.  There might
be people out there with code depending on the current behavior,
and they'd be annoyed if we change it in a minor release.
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Michael Paquier

Date:

06 June 2016, 07:30:29

On Sun, Jun 5, 2016 at 11:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> Still, on back branches I think that it would be a good idea to have a
>> better error message for the ones that already throw an error, like
>> "trigger with OID %u does not exist". Switching from elog to ereport
>> would be a good idea, but wouldn't it be a problem to change what is
>> user-visible?
>
> If we're going to touch this behavior in the back branches, I would
> vote for changing it the same as we do in HEAD.  I do not think that
> making a user-visible behavior change in a minor release and then a
> different change in the next major is good strategy.

So, I have finished with the patch attached that changes all the
*def() functions to return NULL when an object does not exist. Some
callers of the index and constraint definitions still expect a cache
lookup error if the object does not exist, and this patch is careful
about that. I think that it would be nice to get that in 9.6, but I
won't fight hard for it either. So I am parking it for now in the
upcoming CF.

> But, given the relative shortage of complaints from the field, I'd
> be more inclined to just do nothing in back branches.  There might
> be people out there with code depending on the current behavior,
> and they'd be annoyed if we change it in a minor release.

Sure. That's the safest position. Thinking about the existing behavior
for some of the index and constraint callers, even just changing the
error message does not bring much as some of them really expect to
fail with a cache lookup error.
--
Michael

Attachment

def-funcs-null.patch

Re: [sqlsmith] Failed assertion in joinrels.c

From

Robert Haas

Date:

26 July 2016, 20:12:07

On Mon, Jun 6, 2016 at 3:30 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sun, Jun 5, 2016 at 11:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Michael Paquier <michael.paquier@gmail.com> writes:
>>> Still, on back branches I think that it would be a good idea to have a
>>> better error message for the ones that already throw an error, like
>>> "trigger with OID %u does not exist". Switching from elog to ereport
>>> would be a good idea, but wouldn't it be a problem to change what is
>>> user-visible?
>>
>> If we're going to touch this behavior in the back branches, I would
>> vote for changing it the same as we do in HEAD.  I do not think that
>> making a user-visible behavior change in a minor release and then a
>> different change in the next major is good strategy.
>
> So, I have finished with the patch attached that changes all the
> *def() functions to return NULL when an object does not exist. Some
> callers of the index and constraint definitions still expect a cache
> lookup error if the object does not exist, and this patch is careful
> about that. I think that it would be nice to get that in 9.6, but I
> won't fight hard for it either. So I am parking it for now in the
> upcoming CF.
>
>> But, given the relative shortage of complaints from the field, I'd
>> be more inclined to just do nothing in back branches.  There might
>> be people out there with code depending on the current behavior,
>> and they'd be annoyed if we change it in a minor release.
>
> Sure. That's the safest position. Thinking about the existing behavior
> for some of the index and constraint callers, even just changing the
> error message does not bring much as some of them really expect to
> fail with a cache lookup error.

Committed with minor kibitizing: you don't need an "else" after a
statement that transfers control out of the function.  Shouldn't
pg_get_function_arguments, pg_get_function_identity_arguments,
pg_get_function_result, and pg_get_function_arg_default get the same
treatment?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [sqlsmith] Failed assertion in joinrels.c

From

Michael Paquier

Date:

27 July 2016, 01:27:07

On Wed, Jul 27, 2016 at 5:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Committed with minor kibitizing: you don't need an "else" after a
> statement that transfers control out of the function.

Thanks. Right, I forgot that.

> Shouldn't
> pg_get_function_arguments, pg_get_function_identity_arguments,
> pg_get_function_result, and pg_get_function_arg_default get the same
> treatment?

Changing all of them make sense. Please see attached.

While looking at the series of functions pg_get_*, I have noticed as
well that pg_get_userbyid() returns "unknown (OID=%u)" when it does
not know a user. Perhaps we'd want to change that to NULL for
consistency with the rest?
--
Michael

Attachment

def-funcs-null-rest.patch

Re: [sqlsmith] Failed assertion in joinrels.c

From

Robert Haas

Date:

29 July 2016, 16:08:47

On Tue, Jul 26, 2016 at 9:27 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Jul 27, 2016 at 5:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Committed with minor kibitizing: you don't need an "else" after a
>> statement that transfers control out of the function.
>
> Thanks. Right, I forgot that.
>
>> Shouldn't
>> pg_get_function_arguments, pg_get_function_identity_arguments,
>> pg_get_function_result, and pg_get_function_arg_default get the same
>> treatment?
>
> Changing all of them make sense. Please see attached.

Committed.

> While looking at the series of functions pg_get_*, I have noticed as
> well that pg_get_userbyid() returns "unknown (OID=%u)" when it does
> not know a user. Perhaps we'd want to change that to NULL for
> consistency with the rest?

That's probably correct in theory, but it's a little less closely
related, and I'm not entirely sure how far we want to go with this.
Remember, the original purpose was to avoid having an internal error
(cache lookup failed, XX000) exposed as a user-visible error message.
Are we at risk from veering from actual bug-fixing off into useless
tinkering?  Not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

29 July 2016, 16:17:55

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jul 26, 2016 at 9:27 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> While looking at the series of functions pg_get_*, I have noticed as
>> well that pg_get_userbyid() returns "unknown (OID=%u)" when it does
>> not know a user. Perhaps we'd want to change that to NULL for
>> consistency with the rest?

> That's probably correct in theory, but it's a little less closely
> related, and I'm not entirely sure how far we want to go with this.
> Remember, the original purpose was to avoid having an internal error
> (cache lookup failed, XX000) exposed as a user-visible error message.
> Are we at risk from veering from actual bug-fixing off into useless
> tinkering?  Not sure.

I'd vote for leaving that one alone; yeah, it's a bit inconsistent
now, but no one has complained about its behavior.
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Michael Paquier

Date:

29 July 2016, 22:57:21

On Sat, Jul 30, 2016 at 1:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jul 26, 2016 at 9:27 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> While looking at the series of functions pg_get_*, I have noticed as
>>> well that pg_get_userbyid() returns "unknown (OID=%u)" when it does
>>> not know a user. Perhaps we'd want to change that to NULL for
>>> consistency with the rest?
>
>> That's probably correct in theory, but it's a little less closely
>> related, and I'm not entirely sure how far we want to go with this.
>> Remember, the original purpose was to avoid having an internal error
>> (cache lookup failed, XX000) exposed as a user-visible error message.
>> Are we at risk from veering from actual bug-fixing off into useless
>> tinkering?  Not sure.
>
> I'd vote for leaving that one alone; yeah, it's a bit inconsistent
> now, but no one has complained about its behavior.

OK for me. Thanks for the commit.
-- 
Michael

Re: [sqlsmith] Failed assertion in joinrels.c

From

Dilip Kumar

Date:

02 August 2016, 10:03:52

On Sat, Jul 30, 2016 at 4:27 AM, Michael Paquier <michael.paquier@gmail.com> wrote:

OK for me. Thanks for the commit.

There are many more such exposed functions, which can throw cache lookup failure error if we pass wrong value.

i.e.

record_in

domain_in

fmgr_c_validator

edb_get_objecttypeheaddef

plpgsql_validator

pg_procedure_is_visible

Are we planning to change these also..

below query is one example (from sqlsmith).

ERROR: cache lookup failed for procedure 0

postgres=# select

postgres-# pg_catalog.record_in(

postgres(# cast(pg_catalog.numerictypmodout(

postgres(# cast(98 as integer)) as cstring),

postgres(# cast(1 as oid),

postgres(# cast(35 as integer));

ERROR: cache lookup failed for type 1

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in joinrels.c

From

Dilip Kumar

Date:

02 August 2016, 10:20:13

On Tue, Aug 2, 2016 at 3:33 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

There are many more such exposed functions, which can throw cache lookup failure error if we pass wrong value.

i.e.
record_in
domain_in
fmgr_c_validator
edb_get_objecttypeheaddef
plpgsql_validator
pg_procedure_is_visible

Are we planning to change these also..

below query is one example (from sqlsmith).
ERROR: cache lookup failed for procedure 0
postgres=# select
postgres-# pg_catalog.record_in(
postgres(# cast(pg_catalog.numerictypmodout(
postgres(# cast(98 as integer)) as cstring),
postgres(# cast(1 as oid),
postgres(# cast(35 as integer));
ERROR: cache lookup failed for type 1

By mistake I mentioned edb_get_objecttypeheaddef function also, please ignore that.

So actual list is as below..

record_in

domain_in

fmgr_c_validator

plpgsql_validator

pg_procedure_is_visible

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in joinrels.c

From

Robert Haas

Date:

02 August 2016, 11:45:26

On Tue, Aug 2, 2016 at 6:03 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> There are many more such exposed functions, which can throw cache lookup
> failure error if we pass wrong value.
>
> i.e.
> record_in
> domain_in
> fmgr_c_validator
> plpgsql_validator
> pg_procedure_is_visible
>
> Are we planning to change these also..

I think if they are SQL-callable functions that users can invoke from
a SQL prompt, they shouldn't throw "cache lookup failed" errors or,
for that matter, any other error that is reported with elog().  Now, I
don't necessarily think that these functions should return NULL in
that case the way we did with the others; that's not a sensible
behavior for a type-input function AFAIK.  But we should emit a better
error.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

02 August 2016, 23:16:28

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Aug 2, 2016 at 6:03 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> There are many more such exposed functions, which can throw cache lookup
>> failure error if we pass wrong value.
>> 
>> i.e.
>> record_in
>> domain_in
>> fmgr_c_validator
>> plpgsql_validator
>> pg_procedure_is_visible
>> 
>> Are we planning to change these also..

> I think if they are SQL-callable functions that users can invoke from
> a SQL prompt, they shouldn't throw "cache lookup failed" errors or,
> for that matter, any other error that is reported with elog().

FWIW, I would restrict that to "functions that users are *intended* to
call from a SQL prompt".  If you are fooling around calling internal
functions with garbage input, you should not be surprised to get an
internal error back.  So of the above list, only pg_procedure_is_visible
seems particularly worth changing to me.  And for it, returning NULL
(ie, "unknown") seems reasonable enough.

Actually, it looks to me like we already fixed that for the built-in
is_visible functions:

# select pg_function_is_visible(0) is null;?column? 
----------t
(1 row)

There's no such animal as pg_procedure_is_visible.  I suspect that's an
EDB-ism that hasn't caught up with commit 66bb74dbe8206a35.
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Robert Haas

Date:

03 August 2016, 00:06:01

On Tue, Aug 2, 2016 at 7:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Aug 2, 2016 at 6:03 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>> There are many more such exposed functions, which can throw cache lookup
>>> failure error if we pass wrong value.
>>>
>>> i.e.
>>> record_in
>>> domain_in
>>> fmgr_c_validator
>>> plpgsql_validator
>>> pg_procedure_is_visible
>>>
>>> Are we planning to change these also..
>
>> I think if they are SQL-callable functions that users can invoke from
>> a SQL prompt, they shouldn't throw "cache lookup failed" errors or,
>> for that matter, any other error that is reported with elog().
>
> FWIW, I would restrict that to "functions that users are *intended* to
> call from a SQL prompt".  If you are fooling around calling internal
> functions with garbage input, you should not be surprised to get an
> internal error back.  So of the above list, only pg_procedure_is_visible
> seems particularly worth changing to me.  And for it, returning NULL
> (ie, "unknown") seems reasonable enough.

I think that's just making life difficult.  If nothing else, sqlsmith
hunts around for functions it can call that return internal errors,
and if we refuse to fix all of them to return user-facing errors, then
it's just crap for the people running sqlsmith to sift through and
it's a judgment call whether to fix each particular case.  Even aside
from that, I think it's much better to have a clear and unambiguous
rule that elog is only for can't-happen things, not
we-don't-recommend-it things.

> There's no such animal as pg_procedure_is_visible.  I suspect that's an
> EDB-ism that hasn't caught up with commit 66bb74dbe8206a35.

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [sqlsmith] Failed assertion in joinrels.c

From

Peter Geoghegan

Date:

03 August 2016, 00:09:01

On Tue, Aug 2, 2016 at 5:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think that's just making life difficult.  If nothing else, sqlsmith
> hunts around for functions it can call that return internal errors,
> and if we refuse to fix all of them to return user-facing errors, then
> it's just crap for the people running sqlsmith to sift through and
> it's a judgment call whether to fix each particular case.  Even aside
> from that, I think it's much better to have a clear and unambiguous
> rule that elog is only for can't-happen things, not
> we-don't-recommend-it things.

+1.

This also has value in the context of automatically surfacing
situations where "can't happen" errors do in fact happen at scale.

-- 
Peter Geoghegan

Re: [sqlsmith] Failed assertion in joinrels.c

From

Dilip Kumar

Date:

08 August 2016, 11:28:57

On Wed, Aug 3, 2016 at 5:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think that's just making life difficult. If nothing else, sqlsmith
hunts around for functions it can call that return internal errors,
and if we refuse to fix all of them to return user-facing errors, then
it's just crap for the people running sqlsmith to sift through and
it's a judgment call whether to fix each particular case. Even aside
from that, I think it's much better to have a clear and unambiguous
rule that elog is only for can't-happen things, not
we-don't-recommend-it things.

I have changed for all these function to report more appropriate error with ereport.

I used ERRCODE_WRONG_OBJECT_TYPE error code for reporting such errors.

I think this is closest error code among all existing error codes, other options can be (ERRCODE_WRONG_OBJECT_TYPE).

But I think ERRCODE_WRONG_OBJECT_TYPE is better option.

Patch attached for the same.

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

cache_lookup_failure_v1.patch

Re: [sqlsmith] Failed assertion in joinrels.c

From

Amit Kapila

Date:

08 August 2016, 11:41:25

On Mon, Aug 8, 2016 at 4:58 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Aug 3, 2016 at 5:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> I think that's just making life difficult.  If nothing else, sqlsmith
>> hunts around for functions it can call that return internal errors,
>> and if we refuse to fix all of them to return user-facing errors, then
>> it's just crap for the people running sqlsmith to sift through and
>> it's a judgment call whether to fix each particular case.  Even aside
>> from that, I think it's much better to have a clear and unambiguous
>> rule that elog is only for can't-happen things, not
>> we-don't-recommend-it things.
>
>
> I have changed for all these function to report more appropriate error with
> ereport.
>
> I used ERRCODE_WRONG_OBJECT_TYPE error code for reporting such errors.
> I think this is closest error code among all existing error codes, other
> options can be (ERRCODE_WRONG_OBJECT_TYPE).
>

Your other options and the option you choose are same.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in joinrels.c

From

Amit Kapila

Date:

08 August 2016, 11:53:19

On Mon, Aug 8, 2016 at 5:11 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Aug 8, 2016 at 4:58 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>
>> On Wed, Aug 3, 2016 at 5:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>
>>> I think that's just making life difficult.  If nothing else, sqlsmith
>>> hunts around for functions it can call that return internal errors,
>>> and if we refuse to fix all of them to return user-facing errors, then
>>> it's just crap for the people running sqlsmith to sift through and
>>> it's a judgment call whether to fix each particular case.  Even aside
>>> from that, I think it's much better to have a clear and unambiguous
>>> rule that elog is only for can't-happen things, not
>>> we-don't-recommend-it things.
>>
>>
>> I have changed for all these function to report more appropriate error with
>> ereport.
>>
>> I used ERRCODE_WRONG_OBJECT_TYPE error code for reporting such errors.
>> I think this is closest error code among all existing error codes, other
>> options can be (ERRCODE_WRONG_OBJECT_TYPE).
>>
>
> Your other options and the option you choose are same.
>

Did you consider to use ERRCODE_UNDEFINED_COLUMN with error messages
like: "type %u does not exit" or "type id %u does not exit"? Errorcode
ERRCODE_UNDEFINED_COLUMN seems to be used for syscache lookup failure
cases at certain places in code.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in joinrels.c

From

Dilip Kumar

Date:

08 August 2016, 12:30:28

On Mon, Aug 8, 2016 at 5:23 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

>
> Your other options and the option you choose are same.
>

Sorry typo, I meant ERRCODE_INVALID_OBJECT_DEFINITION.

Did you consider to use ERRCODE_UNDEFINED_COLUMN with error messages
like: "type %u does not exit" or "type id %u does not exit"? Errorcode
ERRCODE_UNDEFINED_COLUMN seems to be used for syscache lookup failure
cases at certain places in code.

But I think, OBJECT will be more appropriate than COLUMN at this place.

However I can change error message to "type id %u does not exit" if this seems better ?

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in joinrels.c

From

Amit Kapila

Date:

09 August 2016, 13:19:07

On Mon, Aug 8, 2016 at 6:00 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Mon, Aug 8, 2016 at 5:23 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>>
>> Did you consider to use ERRCODE_UNDEFINED_COLUMN with error messages
>> like: "type %u does not exit" or "type id %u does not exit"? Errorcode
>> ERRCODE_UNDEFINED_COLUMN seems to be used for syscache lookup failure
>> cases at certain places in code.
>
>
> But I think, OBJECT will be more appropriate than COLUMN at this place.
>

Okay, then how about ERRCODE_UNDEFINED_OBJECT?  It seems to be used in
almost similar cases.

> However I can change error message to "type id %u does not exit" if this
> seems better ?
>

I think that is better.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in joinrels.c

From

Dilip Kumar

Date:

10 August 2016, 04:34:50

On Tue, Aug 9, 2016 at 6:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Okay, then how about ERRCODE_UNDEFINED_OBJECT?  It seems to be used in
> almost similar cases.

This seems better, after checking at other places I found that for
invalid type we are using ERRCODE_UNDEFINED_OBJECT and for invalid
functions we are using ERRCODE_UNDEFINED_FUNCTION. So I have done the
same way.

Updated patch attached.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

cache_lookup_failure_v2.patch

Re: [sqlsmith] Failed assertion in joinrels.c

From

Dilip Kumar

Date:

10 August 2016, 05:57:57

On Wed, Aug 10, 2016 at 10:04 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> This seems better, after checking at other places I found that for
> invalid type we are using ERRCODE_UNDEFINED_OBJECT and for invalid
> functions we are using ERRCODE_UNDEFINED_FUNCTION. So I have done the
> same way.
>
> Updated patch attached.

I found some more places where we can get similar error and updated in
my v3 patch.

I don't claim that it fixes all such kind of error, but at least it
fixes error reported by sqlsmith plus some additional what I found in
similar area.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

cache_lookup_failure_v3.patch

Re: [sqlsmith] Failed assertion in joinrels.c

From

Amit Kapila

Date:

17 August 2016, 11:25:52

On Wed, Aug 10, 2016 at 11:27 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Wed, Aug 10, 2016 at 10:04 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> This seems better, after checking at other places I found that for
>> invalid type we are using ERRCODE_UNDEFINED_OBJECT and for invalid
>> functions we are using ERRCODE_UNDEFINED_FUNCTION. So I have done the
>> same way.
>>
>> Updated patch attached.
>
> I found some more places where we can get similar error and updated in
> my v3 patch.
>

@@ -412,7 +412,9 @@ plpgsql_validator(PG_FUNCTION_ARGS) /* Get the new function's pg_proc entry */ tuple =
SearchSysCache1(PROCOID,ObjectIdGetDatum(funcoid)); if (!HeapTupleIsValid(tuple))
 
- elog(ERROR, "cache lookup failed for function %u", funcoid);
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("function with OID %u does not exist", funcoid)));

If you are making changes in plpgsql_validator(), then shouldn't we
make changes in plperl_validator() or plpython_validator()?  I see
that you have made changes to function CheckFunctionValidatorAccess()
which seems to be getting called from *_validator() (* indicates
plpgsql/plperl/plpython) functions.  Is there a reason for changing
the *_validator() function?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in joinrels.c

From

Robert Haas

Date:

17 August 2016, 12:41:00

On Wed, Aug 17, 2016 at 7:25 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Aug 10, 2016 at 11:27 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> On Wed, Aug 10, 2016 at 10:04 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>> This seems better, after checking at other places I found that for
>>> invalid type we are using ERRCODE_UNDEFINED_OBJECT and for invalid
>>> functions we are using ERRCODE_UNDEFINED_FUNCTION. So I have done the
>>> same way.
>>>
>>> Updated patch attached.
>>
>> I found some more places where we can get similar error and updated in
>> my v3 patch.
>>
>
> @@ -412,7 +412,9 @@ plpgsql_validator(PG_FUNCTION_ARGS)
>   /* Get the new function's pg_proc entry */
>   tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcoid));
>   if (!HeapTupleIsValid(tuple))
> - elog(ERROR, "cache lookup failed for function %u", funcoid);
> + ereport(ERROR,
> + (errcode(ERRCODE_UNDEFINED_FUNCTION),
> + errmsg("function with OID %u does not exist", funcoid)));
>
> If you are making changes in plpgsql_validator(), then shouldn't we
> make changes in plperl_validator() or plpython_validator()?  I see
> that you have made changes to function CheckFunctionValidatorAccess()
> which seems to be getting called from *_validator() (* indicates
> plpgsql/plperl/plpython) functions.  Is there a reason for changing
> the *_validator() function?

Yeah, when I glanced briefly at this patch, I found myself wondering
whether all of these call sites were actually reachable (and why the
patch contained no test cases to prove it).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [sqlsmith] Failed assertion in joinrels.c

From

Dilip Kumar

Date:

26 August 2016, 09:11:30

On Wed, Aug 17, 2016 at 6:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> If you are making changes in plpgsql_validator(), then shouldn't we
>> make changes in plperl_validator() or plpython_validator()?  I see
>> that you have made changes to function CheckFunctionValidatorAccess()
>> which seems to be getting called from *_validator() (* indicates
>> plpgsql/plperl/plpython) functions.  Is there a reason for changing
>> the *_validator() function?

Yes you are right, making changes in CheckFunctionValidatorAccess is
sufficient, no need to make changes in *_validator (* indicates
plpgsql/plperl/plpython) functions.

I have changed this in attached patch..

>
> Yeah, when I glanced briefly at this patch, I found myself wondering
> whether all of these call sites were actually reachable (and why the
> patch contained no test cases to prove it).

I have also added test cases to cover all my changes.

Basically this patch changes error at three places.

1. getBaseTypeAndTypmod: This is being called from domain_in exposed
function (domain_in->
domain_state_setup-> getBaseTypeAndTypmod). Though this function is
being called from many other places which are not exposed function,
but I don't this will have any imapct, will this ?

2. lookup_type_cache: This is being called from record_in
(record_in->lookup_rowtype_tupdesc->
lookup_rowtype_tupdesc_internal->lookup_type_cache).

3. CheckFunctionValidatorAccess: This is being called from all
language validator functions.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

cache_lookup_failure_v4.patch

Re: [sqlsmith] Failed assertion in joinrels.c

From

Haribabu Kommi

Date:

07 September 2016, 03:23:02

On Fri, Aug 26, 2016 at 7:11 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have changed this in attached patch..

I reviewed and tested the patch. The changes are fine.

This patch provides better error message compared to earlier.

Marked the patch as "Ready for committer" in commit-fest.

Regards,

Hari Babu

Fujitsu Australia

Re: [sqlsmith] Failed assertion in joinrels.c

From

Dilip Kumar

Date:

07 September 2016, 04:40:01

On Wed, Sep 7, 2016 at 8:52 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> I reviewed and tested the patch. The changes are fine.
> This patch provides better error message compared to earlier.
>
> Marked the patch as "Ready for committer" in commit-fest.


Thanks for the review !

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

07 September 2016, 20:53:48

Dilip Kumar <dilipbalaut@gmail.com> writes:
> Basically this patch changes error at three places.

> 1. getBaseTypeAndTypmod: This is being called from domain_in exposed
> function (domain_in->
> domain_state_setup-> getBaseTypeAndTypmod). Though this function is
> being called from many other places which are not exposed function,
> but I don't this will have any imapct, will this ?

I really don't like this solution.  Changing this one function, out of the
dozens just like it in lsyscache.c, seems utterly random and arbitrary.

I'm actually not especially convinced that passing domain_in a random
OID needs to not produce an XX000 error.  That isn't a user-facing
function and people shouldn't be calling it by hand at all.  The same
goes for record_in.  But if we do want those cases to avoid this,
I think it's appropriate to fix it in those functions, not decree
that essentially-random other parts of the system should bear the
responsibility.  (Thought experiment: if we changed the order of
operations in domain_in so that getBaseTypeAndTypmod were no longer
the first place to fail, would we undo this change and then change
wherever the failure had moved to?  That's pretty messy.)

In the case of record_in, it seems like it'd be easy enough to use
lookup_rowtype_tupdesc_noerror() instead of lookup_rowtype_tupdesc()
and then throw a user-facing error right there.  Not sure about a
nice solution for domain_in.  We might have to resort to an extra
syscache lookup there, but even if we did, it should happen only
once per query so that's not very awful.

> 3. CheckFunctionValidatorAccess: This is being called from all
> language validator functions.

This part seems reasonable, since the validator functions are documented
as something users might call, and CheckFunctionValidatorAccess seems
like an apropos place to handle it.
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Dilip Kumar

Date:

08 September 2016, 10:17:15

On Thu, Sep 8, 2016 at 2:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I really don't like this solution.  Changing this one function, out of the
> dozens just like it in lsyscache.c, seems utterly random and arbitrary.
>
> I'm actually not especially convinced that passing domain_in a random
> OID needs to not produce an XX000 error.  That isn't a user-facing
> function and people shouldn't be calling it by hand at all.  The same
> goes for record_in.  But if we do want those cases to avoid this,
> I think it's appropriate to fix it in those functions, not decree
> that essentially-random other parts of the system should bear the
> responsibility.  (Thought experiment: if we changed the order of
> operations in domain_in so that getBaseTypeAndTypmod were no longer
> the first place to fail, would we undo this change and then change
> wherever the failure had moved to?  That's pretty messy.)
>
> In the case of record_in, it seems like it'd be easy enough to use
> lookup_rowtype_tupdesc_noerror() instead of lookup_rowtype_tupdesc()
> and then throw a user-facing error right there.

Actually when we are passing invalid type to "record_in", error is
thrown in "lookup_type_cache" function, and this function is not
taking "no_error" input (it's only limited to
lookup_rowtype_tupdesc_internal).

One option is to pass "no_error" parameter to "lookup_type_cache"
function also, and throw error only in record_in function.
But problem is, "lookup_type_cache" has lot of references. And also
will it be good idea to throw one generic error instead of multiple
specific errors. ?

Another option is to do same for "record_in" also what you have
suggested for domain_in (an extra syscache lookup). ?

>Not sure about a
> nice solution for domain_in.  We might have to resort to an extra
> syscache lookup there, but even if we did, it should happen only
> once per query so that's not very awful.
>
>> 3. CheckFunctionValidatorAccess: This is being called from all
>> language validator functions.
>
> This part seems reasonable, since the validator functions are documented
> as something users might call, and CheckFunctionValidatorAccess seems
> like an apropos place to handle it.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

08 September 2016, 13:51:52

Dilip Kumar <dilipbalaut@gmail.com> writes:
> On Thu, Sep 8, 2016 at 2:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In the case of record_in, it seems like it'd be easy enough to use
>> lookup_rowtype_tupdesc_noerror() instead of lookup_rowtype_tupdesc()
>> and then throw a user-facing error right there.

> Actually when we are passing invalid type to "record_in", error is
> thrown in "lookup_type_cache" function, and this function is not
> taking "no_error" input (it's only limited to
> lookup_rowtype_tupdesc_internal).

Ah, should have dug a bit deeper.

> One option is to pass "no_error" parameter to "lookup_type_cache"
> function also, and throw error only in record_in function.
> But problem is, "lookup_type_cache" has lot of references. And also
> will it be good idea to throw one generic error instead of multiple
> specific errors. ?

We could make it work like that without breaking the ABI if we were
to add a NOERROR bit to the allowed "flags".  However, after looking
around a bit I'm no longer convinced what I said above is a good idea.
In particular, if we put the responsibility of reporting the error on
callers then we'll lose the ability to distinguish "no such pg_type OID"
from "type exists but it's only a shell".  So I'm now thinking it's okay
to promote lookup_type_cache's elog to a full ereport, especially since
it's already using ereport for some of the related error cases such as
the shell-type failure.

That still leaves us with what to do for domain_in.  A really simple
fix would be to move the InitDomainConstraintRef call before the
getBaseTypeAndTypmod call, because the former fetches the typcache
entry and would then benefit from lookup_type_cache's ereport.
But that seems a bit magic.

I'm tempted to propose that domain_state_setup start with
   typentry = lookup_type_cache(domainType, TYPECACHE_DOMAIN_INFO);   if (typentry->typtype != TYPTYPE_DOMAIN)
ereport(ERROR,              (errcode(ERRCODE_DATATYPE_MISMATCH),                errmsg("type %s is not a domain",
               format_type_be(domainType))));

removing the need to verify that after getBaseTypeAndTypmod.

The cache loading is basically free because InitDomainConstraintRef
would do it anyway; so the extra cost here is only one dynahash search.
You could imagine buying back those cycles by teaching the typcache
to be able to cache the result of getBaseTypeAndTypmod, but I'm doubtful
that we really care.  This whole setup sequence only happens once per
query anyway.
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Dilip Kumar

Date:

09 September 2016, 04:49:18

On Thu, Sep 8, 2016 at 7:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We could make it work like that without breaking the ABI if we were
> to add a NOERROR bit to the allowed "flags".  However, after looking
> around a bit I'm no longer convinced what I said above is a good idea.
> In particular, if we put the responsibility of reporting the error on
> callers then we'll lose the ability to distinguish "no such pg_type OID"
> from "type exists but it's only a shell".  So I'm now thinking it's okay
> to promote lookup_type_cache's elog to a full ereport, especially since
> it's already using ereport for some of the related error cases such as
> the shell-type failure.

Okay..
>
> That still leaves us with what to do for domain_in.  A really simple
> fix would be to move the InitDomainConstraintRef call before the
> getBaseTypeAndTypmod call, because the former fetches the typcache
> entry and would then benefit from lookup_type_cache's ereport.
> But that seems a bit magic.
>
> I'm tempted to propose that domain_state_setup start with
>
>     typentry = lookup_type_cache(domainType, TYPECACHE_DOMAIN_INFO);
>     if (typentry->typtype != TYPTYPE_DOMAIN)
>         ereport(ERROR,
>                 (errcode(ERRCODE_DATATYPE_MISMATCH),
>                  errmsg("type %s is not a domain",
>                         format_type_be(domainType))));
>
> removing the need to verify that after getBaseTypeAndTypmod.

Seems like a better option, done it this way..
attaching the modified patch..
>
> The cache loading is basically free because InitDomainConstraintRef
> would do it anyway; so the extra cost here is only one dynahash search.
> You could imagine buying back those cycles by teaching the typcache
> to be able to cache the result of getBaseTypeAndTypmod, but I'm doubtful
> that we really care.  This whole setup sequence only happens once per
> query anyway.

Agreed.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

cache_lookup_failure_v5.patch

Re: [sqlsmith] Failed assertion in joinrels.c

From

Tom Lane

Date:

09 September 2016, 13:22:08

Dilip Kumar <dilipbalaut@gmail.com> writes:
> Seems like a better option, done it this way..
> attaching the modified patch..

Pushed with cosmetic adjustments --- mostly, I thought we needed some
comments about the topic.
        regards, tom lane

Re: [sqlsmith] Failed assertion in joinrels.c

From

Dilip Kumar

Date:

09 September 2016, 15:24:02

On Fri, Sep 9, 2016 at 6:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pushed with cosmetic adjustments --- mostly, I thought we needed some
> comments about the topic.

Okay, Thanks.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com