Thread: Adding foreign key constraint holds exclusive lock for too long (on production database)
Adding foreign key constraint holds exclusive lock for too long (on production database)
From
Ben Hoyt
Date:
Hi folks, We're adding a foreign key constraint to a 20-million row table on our production database, and it's taking about 7 minutes. Because it's an ALTER TABLE, Postgres acquires an ACCESS EXCLUSIVE lock that prevents any reads/writes (though this particular table is very write-heavy, so even a read lock wouldn't help here). For context: we do this whenever we deploy our site, because our database is split across two schemas ("live" and "content"), and the "content" schema we dump from our office database and restore into our production database. To achieve this we restore it as "contentnew" into the production db, then rename the "content" schema to "contentold" and the "contentnew" schema to "content". This completes the actual deployment, however, now our live-to-content foreign keys are pointing to "contentold", so the final step is to go through and drop all the live-to-content foreign keys and recreate them (against the new content schema). Most of the tables are small and re-adding the constraint is quick, except for this one table, which is 20M rows and basically pauses our live website for 7 minutes. A couple of questions about the ADD CONSTRAINT. The foreign key column on the local table is indexed, and there are only ~50 unique values, so the db *could* come up with the unique values pretty quickly and then check them. Or, even if it needs to do a full scan of the 20M-big table ("ratesrequests") and join with the referenced table ("provider") on the foreign key, which is I think the most it should have to do to check the foreign key, the following query only takes ~20s, not 7 minutes: select p.name from ratesrequests r join provider p on r.providerid = p.providerid I'm guessing the ADD CONSTRAINT logic bypasses some of the query optimization used for SELECT queries. So I suppose my questions are: 1) Are there ways to speed up adding the constraint? Just speeding it up a little bit won't really help -- for this purpose it'll need to be an order of magnitude or so. I'm aware of a couple of possibilities: a) Upgrade to Postgres 9.1 and use ADD CONSTRAINT NOT VALID. However, this doesn't really help, as you need to run VALIDATE CONSTRAINT at some later stage, which still grabs the exclusive lock. b) Delete old rows from the table so it's not so big. Feels a bit hacky just to fix this issue. c) Get rid of this foreign key constraint entirely and just check it in code when we insert. Pragmatic solution, but not ideal. 2) Is there a better way to do the "content" schema dump/restore that avoids dropping and recreating the inter-schema foreign keys? Other notes and research: * We're running "PostgreSQL 9.0.2, compiled by Visual C++ build 1500, 64-bit" on 64-bit Windows Server 2008 SP1 (6.0.6001) * The "ratesrequests" table has two text columns, one of which often contains a few hundred to a couple of KB of data in the field. It is added to rapidly. We regularly VACCUM ANALYZE it. * As expected, the ADD CONSTRAINT has gotten slower over time as this table grew. However -- I'm not 100% sure of this, but it seems to have jumped recently (from 3-4 minutes to 7 minutes). * http://www.postgresql.org/message-id/20030323112241.W14634-100000@megazone23.bigpanda.com -- indicates that ADD CONSTRAINT isn't optimized as well as it could be * http://www.postgresql.org/message-id/51A11C97.90209@iol.ie -- indicates that the db ignores the index when add constraints Thanks, Ben.
Re: Adding foreign key constraint holds exclusive lock for too long (on production database)
From
David Johnston
Date:
Ben Hoyt wrote > * http://www.postgresql.org/message-id/ > 51A11C97.90209@ > -- > indicates that the db ignores the index when add constraints As noted in the referenced thread (and never contradicted) the current algorithm is "for each record does the value in the FK column exist in the PK table?" not "do all of the values currently found on the FK table exist in the PK table?". The later question being seemingly much faster (if table statistics imply a small-ish number of bins and the presence of an index on the column) to answer during a bulk ALTER TABLE but the former being the more common question - when simply adding a single row. You need to figure out some way to avoid continually evaluating the FK constraint on all 20M row - of which most of them already were previously confirmed. Most commonly people simply perform an incremental update of a live table and insert/update/delete only the records that are changing instead of replacing an entire table with a new one. If you are generally happy with your current procedure I would probably continue on with your "live" and "content" schemas but move this table into a "bulk_content" schema and within that have a "live" table and a "staging" table. You can drop/replace the staging table from your office database and then write a routine to incrementally update the live table. The FK references in live and content would then persistently reference the "live" table and only the subset of changes introduced would need to be checked. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Adding-foreign-key-constraint-holds-exclusive-lock-for-too-long-on-production-database-tp5776313p5776315.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: Re: Adding foreign key constraint holds exclusive lock for too long (on production database)
From
Tom Lane
Date:
David Johnston <polobo@yahoo.com> writes: > As noted in the referenced thread (and never contradicted) the current > algorithm is "for each record does the value in the FK column exist in the > PK table?" not "do all of the values currently found on the FK table exist > in the PK table?". Well, apparently nobody who knows the code was paying attention, because that hasn't been true for some time. ALTER TABLE ADD FOREIGN KEY will actually validate the constraint using a query constructed like this (cf RI_Initial_Check() in ri_triggers.c): * SELECT fk.keycols FROM ONLY relname fk * LEFT OUTER JOIN ONLY pkrelname pk * ON (pk.pkkeycol1=fk.keycol1 [AND ...]) * WHERE pk.pkkeycol1 IS NULL AND * For MATCH SIMPLE: * (fk.keycol1 IS NOT NULL [AND ...]) * For MATCH FULL: * (fk.keycol1 IS NOT NULL [OR ...]) It appears the possible explanations for Ben's problem are: 1. For some reason this query is a lot slower than the one he came up with; 2. The code isn't using this query but is falling back to a row-at-a-time check. Case 2 would apply if the user attempting to do the ALTER TABLE doesn't have read permission on both tables ... though that seems rather unlikely. regards, tom lane
Re: Adding foreign key constraint holds exclusive lock for too long (on production database)
From
Josh Berkus
Date:
Ben, > A couple of questions about the ADD CONSTRAINT. The foreign key column > on the local table is indexed, and there are only ~50 unique values, > so the db *could* come up with the unique values pretty quickly and > then check them. This would indeed be a nice optimization, especially now that we have index-only scans; you could do a VACUUM FREEZE on the tables and then add the constraint. > b) Delete old rows from the table so it's not so big. Feels a bit > hacky just to fix this issue. > > c) Get rid of this foreign key constraint entirely and just check it > in code when we insert. Pragmatic solution, but not ideal. d) add a trigger instead of an actual FK. Slower to execute on subsequent updates/inserts, but doesn't need to be checked. e) do something (slony, scripts, whatever) so that you're incrementally updating this table instead of recreating it from scratch each time. > * We're running "PostgreSQL 9.0.2, compiled by Visual C++ build 1500, > 64-bit" on 64-bit Windows Server 2008 SP1 (6.0.6001) I will point out that you are missing a whole ton of bug fixes, including two critical security patches. > * The "ratesrequests" table has two text columns, one of which often > contains a few hundred to a couple of KB of data in the field. It is > added to rapidly. We regularly VACCUM ANALYZE it. > * As expected, the ADD CONSTRAINT has gotten slower over time as this > table grew. However -- I'm not 100% sure of this, but it seems to have > jumped recently (from 3-4 minutes to 7 minutes). > * http://www.postgresql.org/message-id/20030323112241.W14634-100000@megazone23.bigpanda.com Probably the table just got larger than RAM. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: Re: Adding foreign key constraint holds exclusive lock for too long (on production database)
From
Ben Hoyt
Date:
Thanks, Tom (and David and Josh). > Well, apparently nobody who knows the code was paying attention, because > that hasn't been true for some time. ALTER TABLE ADD FOREIGN KEY will > actually validate the constraint using a query constructed like this > (cf RI_Initial_Check() in ri_triggers.c): This was a very helpful pointer, and interesting to me, because I did a quick look for the source that handled that but didn't find it (not knowing the Postgres codebase at all). It was kinda weird to me at first that the way it implements this is by building an SQL string and then executing that -- at first I would have thought it'd call the internal functions to do the job. But on second thoughts, this makes total sense, as that way it gets all the advantages of the query planner/optimizer for this too. > It appears the possible explanations for Ben's problem are: > > 1. For some reason this query is a lot slower than the one he came up > with; > > 2. The code isn't using this query but is falling back to a row-at-a-time > check. Anyway, it's definitely #1 that's happening, as I build the RI_Initial_Check() query by hand, and it takes just as long as the ADD CONSTRAINT. I'll probably hack around it -- in fact, for now I've just dropped the contraint entirely, as it's not really necessary on this table. So I guess this is really a side effect of the quirky way we're dumping and restoring only one schema, and dropping/re-adding constraints on deployment because of this. Is this a really strange thing to do -- deploying only one schema (the "static" data) and dropping/re-adding constraints -- or are there better practices here? Relatedly, what about best practices regarding inter-schema foreign keys? -Ben
Re: Re: Adding foreign key constraint holds exclusive lock for too long (on production database)
From
Tom Lane
Date:
Ben Hoyt <benhoyt@gmail.com> writes: >> It appears the possible explanations for Ben's problem are: >> 1. For some reason this query is a lot slower than the one he came up >> with; > Anyway, it's definitely #1 that's happening, as I build the > RI_Initial_Check() query by hand, and it takes just as long as the ADD > CONSTRAINT. Huh. Maybe an optimizer failing? Could we see the full text of both queries and EXPLAIN ANALYZE results for them? > So I guess this is really a side effect of the quirky way we're > dumping and restoring only one schema, and dropping/re-adding > constraints on deployment because of this. Is this a really strange > thing to do -- deploying only one schema (the "static" data) and > dropping/re-adding constraints -- or are there better practices here? Doesn't seem unreasonable. One thought is that maybe you need to insert a manual ANALYZE after reloading the data? regards, tom lane
Re: Re: Adding foreign key constraint holds exclusive lock for too long (on production database)
From
Ben Hoyt
Date:
Hmm, weird -- now the RI_Initial_Check() query is much quicker (20s). We do ANALYZE the data every few nights, so maybe that's what changed it. I'll keep that in mind. -Ben
On Fri, Nov 1, 2013 at 3:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ben Hoyt <benhoyt@gmail.com> writes:
>> It appears the possible explanations for Ben's problem are:
>> 1. For some reason this query is a lot slower than the one he came up
>> with;> Anyway, it's definitely #1 that's happening, as I build theHuh. Maybe an optimizer failing? Could we see the full text of both
> RI_Initial_Check() query by hand, and it takes just as long as the ADD
> CONSTRAINT.
queries and EXPLAIN ANALYZE results for them?Doesn't seem unreasonable. One thought is that maybe you need to insert a
> So I guess this is really a side effect of the quirky way we're
> dumping and restoring only one schema, and dropping/re-adding
> constraints on deployment because of this. Is this a really strange
> thing to do -- deploying only one schema (the "static" data) and
> dropping/re-adding constraints -- or are there better practices here?
manual ANALYZE after reloading the data?
regards, tom lane