Re: Missing dependency tracking for TableFunc nodes - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Missing dependency tracking for TableFunc nodes |
Date | |
Msg-id | 20191114004631.z4mg5nucvrxiewxe@development Whole thread Raw |
In response to | Re: Missing dependency tracking for TableFunc nodes (Mark Dilger <hornschnorter@gmail.com>) |
Responses |
Re: Missing dependency tracking for TableFunc nodes
|
List | pgsql-hackers |
On Wed, Nov 13, 2019 at 03:00:03PM -0800, Mark Dilger wrote: > > >On 11/11/19 1:41 PM, Tom Lane wrote: >>I happened to notice that find_expr_references_walker has not >>been taught anything about TableFunc nodes, which means it will >>miss the type and collation OIDs embedded in such a node. >> >>This can be demonstrated to be a problem by the attached script, >>which will end up with a "cache lookup failed for type NNNNN" >>error because we allow dropping a type the XMLTABLE construct >>references. >> >>This isn't hard to fix, as per the attached patch, but it makes >>me nervous. I wonder what other dependencies we might be missing. > >I can consistently generate errors like the following in master: > > ERROR: cache lookup failed for statistics object 31041 > >This happens in a stress test in which multiple processes are making >changes to the schema. So far, all the sessions that report this >cache lookup error do so when performing one of ANALYZE, VACUUM >ANALYZE, UPDATE, DELETE or EXPLAIN ANALYZE on a table that has MCV >statistics. All processes running simultaneously are running the same >set of functions, which create and delete tables, indexes, and >statistics objects, insert, update, and delete rows in those tables, >etc. > >The fact that the statistics are of the MCV type might not be >relevant; I'm creating those on tables as part of testing Tomas >Vondra's MCV statistics patch, so all the tables have statistics of >that kind on them. > Hmmm, I don't know the details of the test, but this seems a bit like we're trying to use the stats during estimation but it got dropped meanwhile. If that's the case, it probably affects all stats types, not just MCV lists - there should no significant difference between different statistics types, I think. I've managed to reproduce this with a stress-test, and I do get these failures with both dependencies and mcv stats, although in slightly different places. And I think I see the issue - when dropping the statistics, we do RemoveObjects which however does not acquire any lock on the table. So we get the list of stats (without the serialized data), but before we get to load the contents, someone drops it. If that's the root cause, it's there since pg 10. I'm not sure what's the right solution. An straightforward option would be to lock the relation, but will that work after adding support for stats on joins? An alternative would be to just ignore those failures, but that kinda breaks the estimation (we should have picked a different stats in this case). >I can try to distill my test case a bit, but first I'd like to know if >you are interested. Currently, the patch is over 2.2MB, gzip'd. I'll >only bother distilling it if you don't already know about these cache >lookup failures. > Not sure. But I do wonder if we see the same issue. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: