Re: Missing dependency tracking for TableFunc nodes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Missing dependency tracking for TableFunc nodes
Date
Msg-id 20191114004631.z4mg5nucvrxiewxe@development
Whole thread Raw
In response to Re: Missing dependency tracking for TableFunc nodes  (Mark Dilger <hornschnorter@gmail.com>)
Responses Re: Missing dependency tracking for TableFunc nodes
List pgsql-hackers
On Wed, Nov 13, 2019 at 03:00:03PM -0800, Mark Dilger wrote:
>
>
>On 11/11/19 1:41 PM, Tom Lane wrote:
>>I happened to notice that find_expr_references_walker has not
>>been taught anything about TableFunc nodes, which means it will
>>miss the type and collation OIDs embedded in such a node.
>>
>>This can be demonstrated to be a problem by the attached script,
>>which will end up with a "cache lookup failed for type NNNNN"
>>error because we allow dropping a type the XMLTABLE construct
>>references.
>>
>>This isn't hard to fix, as per the attached patch, but it makes
>>me nervous.  I wonder what other dependencies we might be missing.
>
>I can consistently generate errors like the following in master:
>
>  ERROR:  cache lookup failed for statistics object 31041
>
>This happens in a stress test in which multiple processes are making 
>changes to the schema.  So far, all the sessions that report this 
>cache lookup error do so when performing one of ANALYZE, VACUUM 
>ANALYZE, UPDATE, DELETE or EXPLAIN ANALYZE on a table that has MCV 
>statistics. All processes running simultaneously are running the same 
>set of functions, which create and delete tables, indexes, and 
>statistics objects, insert, update, and delete rows in those tables, 
>etc.
>
>The fact that the statistics are of the MCV type might not be 
>relevant; I'm creating those on tables as part of testing Tomas 
>Vondra's MCV statistics patch, so all the tables have statistics of 
>that kind on them.
>

Hmmm, I don't know the details of the test, but this seems a bit like
we're trying to use the stats during estimation but it got dropped
meanwhile. If that's the case, it probably affects all stats types, not
just MCV lists - there should no significant difference between
different statistics types, I think.

I've managed to reproduce this with a stress-test, and I do get these
failures with both dependencies and mcv stats, although in slightly
different places.

And I think I see the issue - when dropping the statistics, we do
RemoveObjects which however does not acquire any lock on the table. So
we get the list of stats (without the serialized data), but before we
get to load the contents, someone drops it. If that's the root cause,
it's there since pg 10.

I'm not sure what's the right solution. An straightforward option would
be to lock the relation, but will that work after adding support for
stats on joins? An alternative would be to just ignore those failures,
but that kinda breaks the estimation (we should have picked a different
stats in this case).

>I can try to distill my test case a bit, but first I'd like to know if 
>you are interested.  Currently, the patch is over 2.2MB, gzip'd.  I'll 
>only bother distilling it if you don't already know about these cache 
>lookup failures.
>

Not sure. But I do wonder if we see the same issue.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: tuplesort test coverage
Next
From: "ideriha.takeshi@fujitsu.com"
Date:
Subject: RE: Built-in connection pooler