Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references. - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references. |
Date | |
Msg-id | 1935.1442329129@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign
key references.
|
List | pgsql-hackers |
I wrote: > Jan Wieck <jan@wi3ck.info> writes: >> I'm able to reproduce that failure with CLOBBER_CACHE_ALWAYS and it >> definitely is caused by this commit. Do you want to back it out for the >> time being? Kevin is on vacation right now. > I'll take a look today, and either fix it if I can find the problem > quickly or back it out. The answer is "back it out", because this patch is fundamentally and irretrievably broken. You can't just clobber the hashtable like that, because there are very possibly active uses of hashtable entries in outer function call levels. In the particular case exhibited in http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=friarbird&dt=2015-09-15%2004%3A20%3A00 namely TRAP: FailedAssertion("!(riinfo->constraint_id == constraintOid)", File: "/pgbuild/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/ri_triggers.c",Line: 2838) what's evidently happened is that a cache flush occurred during the syscache fetch at line 2828, and the patch chose that moment to destroy the hashtable, in particular the entry already created at line 2817. Hashtable resets occurring later in a particular RI trigger's execution would cause other symptoms. The pre-existing coding is safe, despite the lack of any reference counting on hashtable entries, because the cache flush callback never actually destroys any entries. It just cautiously marks them invalid, which won't actually have any effect until the next ri_LoadConstraintInfo() call on that constraint OID. AFAICS, there's no way to make this work reliably without changing the cache API somehow. Reference counts would make it possible to tell whether it's safe to delete a particular entry, but they would be fairly complicated to maintain (because of the need to make sure they get decremented after an elog(ERROR)). Given that the entries are just flat chunks of memory, another answer is to have cache lookups copy the entry data into caller-provided storage rather than rely on the cache to stay valid. Not sure offhand whether that would be unreasonably expensive; sizeof(RI_ConstraintInfo) is about 600 bytes on my machine, which seems like kind of a lot, but I guess it's not enormous. For that matter, if we're gonna do it like that, it's not very clear that we need the ri_constraint_cache at all, rather than just having ri_LoadConstraintInfo fill the data from the pg_constraint syscache row every time. In any case, this patch seems like a pretty ugly and brute-force way to limit the cache size, since it destroys not only old and uninteresting entries but also valid, up-to-date, recently used entries. So I'm not very excited about thinking of some band-aid way to let this work; I don't like the approach of destroying the entire hash table in the first place. So I'm going to revert it and send it back for rework. regards, tom lane
pgsql-hackers by date: