Thread: Re: BUG #4204: COPY to table with FK has memory leak
[moving to -hackers] "Tom Lane" <tgl@sss.pgh.pa.us> writes: > "Tomasz Rybak" <bogomips@post.pl> writes: >> I tried to use COPY to import 27M rows to table: >> CREATE TABLE sputnik.ccc24 ( >> station CHARACTER(4) NOT NULL REFERENCES sputnik.station24 (id), >> moment INTEGER NOT NULL, >> flags INTEGER NOT NULL >> ) INHERITS (sputnik.sputnik); >> COPY sputnik.ccc24(id, moment, station, strength, sequence, flags) >> FROM '/tmp/24c3' WITH DELIMITER AS ' '; > > This is expected to take lots of memory because each row-requiring-check > generates an entry in the pending trigger event list. Even if you had > not exhausted memory, the actual execution of the retail checks would > have taken an unreasonable amount of time. The recommended way to do > this sort of thing is to add the REFERENCES constraint *after* you load > all the data; that'll be a lot faster in most cases because the checks > are done "in bulk" using a JOIN rather than one-at-a-time. Hm, it occurs to me that we could still do a join against the pending event trigger list... I wonder how feasible it would be to store the pending trigger event list in a temporary table instead of in ram. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> This is expected to take lots of memory because each row-requiring-check >> generates an entry in the pending trigger event list. > Hm, it occurs to me that we could still do a join against the pending event > trigger list... I wonder how feasible it would be to store the pending trigger > event list in a temporary table instead of in ram. We could make that list spill to disk, but the problem remains that verifying the rows one at a time will take forever. The idea that's been kicked around occasionally is that once you get past N pending events, throw them all away and instead queue a single operation to do a bulk verify (just like initial establishment of the FK constraint). I'm not sure how to do the queue management for this though. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: >>> This is expected to take lots of memory because each row-requiring-check >>> generates an entry in the pending trigger event list. > >> Hm, it occurs to me that we could still do a join against the pending event >> trigger list... I wonder how feasible it would be to store the pending trigger >> event list in a temporary table instead of in ram. > > We could make that list spill to disk, but the problem remains that > verifying the rows one at a time will take forever. Well I was thinking if we did a join between a temporary table and the fk target then it wouldn't have to be a one-by-one operation. It could be a merge join if the planner thought that was better. How to get accurate stats into the planner at that point would be a missing detail though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
On Wed, 2008-05-28 at 16:28 -0400, Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: > > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >> This is expected to take lots of memory because each row-requiring-check > >> generates an entry in the pending trigger event list. > > > Hm, it occurs to me that we could still do a join against the pending event > > trigger list... I wonder how feasible it would be to store the pending trigger > > event list in a temporary table instead of in ram. > > We could make that list spill to disk, but the problem remains that > verifying the rows one at a time will take forever. > > The idea that's been kicked around occasionally is that once you get > past N pending events, throw them all away and instead queue a single > operation to do a bulk verify (just like initial establishment of the > FK constraint). I'm not sure how to do the queue management for this > though. Neither of those approaches is really suitable. Just spilling to disk is O(N) of the number of rows loaded, the second one is O(N) at least on the number of rows (loaded + existing). The second one doesn't help either since if the table was empty you'd have added the FK afterwards, so we must assume there is already rows in there and in most cases rows already loaded will exceed those being added by the bulk operation. AFAICS we must aggregate the trigger checks. We would need a special property of triggers that allowed them to be aggregated when two similar checks arrived. We can then use hash aggregation to accumulate them. We might conceivably need to spill to disk also, since the aggregation may not always be effective. But in most cases the tables against which FK checks are made are significantly smaller than the tables being loaded. Once we have hash aggregated them, that is then the first part of a hash join to the target table. We certainly need a TODO item for "improve RI checks during bulk operations". -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
"Simon Riggs" <simon@2ndquadrant.com> writes: > AFAICS we must aggregate the trigger checks. We would need a special > property of triggers that allowed them to be aggregated when two similar > checks arrived. We can then use hash aggregation to accumulate them. We > might conceivably need to spill to disk also, since the aggregation may > not always be effective. But in most cases the tables against which FK > checks are made are significantly smaller than the tables being loaded. > Once we have hash aggregated them, that is then the first part of a hash > join to the target table. Well we can't aggregate them as they're created because later modifications could delete or update the original records. The SQL spec requires that FK checks be effective at the end of the command. I admit off the top of my head I can't actually come up with any situations which would be covered by the spec. All the instances I can think of involve either Postgres's UPDATE FROM or plpgsql functions or some other postgres specific functionality. But I do seem to recall there were some situations where it mattered. But we could aggregate them when it comes time to actually check them. Or we could hash the FK keys and scan the event list. Or we could sort the two and merge join them.... > We certainly need a TODO item for "improve RI checks during bulk > operations". I have a feeling it's already there. Hm. There's a whole section on RI triggers but the closest I see is this, neither of the links appear to refer to bulk operations: Optimize referential integrity checks http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.phphttp://archives.postgresql.org/pgsql-hackers/2007-04/msg00744.php -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
Gregory Stark <stark@enterprisedb.com> writes: > "Simon Riggs" <simon@2ndquadrant.com> writes: >> We certainly need a TODO item for "improve RI checks during bulk >> operations". > I have a feeling it's already there. Hm. There's a whole section on RI > triggers but the closest I see is this, neither of the links appear to refer > to bulk operations: > Optimize referential integrity checks > http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php > http://archives.postgresql.org/pgsql-hackers/2007-04/msg00744.php No, both of those are talking about the same thing, ie, (1) making the are-the-keys-unchanged optimization work when NULLs are present, and (2) not testing for this case twice. There's an entry in the Triggers section * Add deferred trigger queue file Right now all deferred trigger information is stored in backend memory. This could exhaust memory for very large triggerqueues. This item involves dumping large queues into files. but as already noted, this is a pretty myopic answer (at least for RI triggers). regards, tom lane
On Wed, 2008-05-28 at 18:17 -0400, Gregory Stark wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > > AFAICS we must aggregate the trigger checks. We would need a special > > property of triggers that allowed them to be aggregated when two similar > > checks arrived. We can then use hash aggregation to accumulate them. We > > might conceivably need to spill to disk also, since the aggregation may > > not always be effective. But in most cases the tables against which FK > > checks are made are significantly smaller than the tables being loaded. > > Once we have hash aggregated them, that is then the first part of a hash > > join to the target table. > > Well we can't aggregate them as they're created because later modifications > could delete or update the original records. The SQL spec requires that FK > checks be effective at the end of the command. Well, thats what we need to do. We just need to find a way... Currently, we store trigger entries by htid. I guess we need to aggregate them on the actual values looked up. The SQL spec also says that the contents of the FK check table should be taken as at the start of the command, so we should be safe to aggregate the values prior to the check. As already suggested in work on Read Only Tables, we could optimise them away to being constraint checks. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2008-05-28 at 22:45 +0100, Simon Riggs wrote: > On Wed, 2008-05-28 at 16:28 -0400, Tom Lane wrote: > > Gregory Stark <stark@enterprisedb.com> writes: > > > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > > >> This is expected to take lots of memory because each row-requiring-check > > >> generates an entry in the pending trigger event list. > > > > > Hm, it occurs to me that we could still do a join against the pending event > > > trigger list... I wonder how feasible it would be to store the pending trigger > > > event list in a temporary table instead of in ram. > > > > We could make that list spill to disk, but the problem remains that > > verifying the rows one at a time will take forever. > > > > The idea that's been kicked around occasionally is that once you get > > past N pending events, throw them all away and instead queue a single > > operation to do a bulk verify (just like initial establishment of the > > FK constraint). I'm not sure how to do the queue management for this > > though. > > Neither of those approaches is really suitable. Just spilling to disk is > O(N) of the number of rows loaded, the second one is O(N) at least on > the number of rows (loaded + existing). The second one doesn't help > either since if the table was empty you'd have added the FK afterwards, > so we must assume there is already rows in there and in most cases rows > already loaded will exceed those being added by the bulk operation. > > AFAICS we must aggregate the trigger checks. We would need a special > property of triggers that allowed them to be aggregated when two similar > checks arrived. We can then use hash aggregation to accumulate them. We > might conceivably need to spill to disk also, since the aggregation may > not always be effective. Can't we just do the checks for the FKs accumulated at the point they don't fit in memory, instead of spilling to disk ? > But in most cases the tables against which FK > checks are made are significantly smaller than the tables being loaded. > Once we have hash aggregated them, that is then the first part of a hash > join to the target table. > > We certainly need a TODO item for "improve RI checks during bulk > operations". Agreed. ---------------- Hannu
On May 28, 2008, at 1:22 PM, Gregory Stark wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> "Tomasz Rybak" <bogomips@post.pl> writes: >>> I tried to use COPY to import 27M rows to table: >>> CREATE TABLE sputnik.ccc24 ( >>> station CHARACTER(4) NOT NULL REFERENCES >>> sputnik.station24 (id), >>> moment INTEGER NOT NULL, >>> flags INTEGER NOT NULL >>> ) INHERITS (sputnik.sputnik); >>> COPY sputnik.ccc24(id, moment, station, strength, sequence, flags) >>> FROM '/tmp/24c3' WITH DELIMITER AS ' '; >> >> This is expected to take lots of memory because each row-requiring- >> check >> generates an entry in the pending trigger event list. Even if you >> had >> not exhausted memory, the actual execution of the retail checks would >> have taken an unreasonable amount of time. The recommended way to do >> this sort of thing is to add the REFERENCES constraint *after* you >> load >> all the data; that'll be a lot faster in most cases because the >> checks >> are done "in bulk" using a JOIN rather than one-at-a-time. > > Hm, it occurs to me that we could still do a join against the > pending event > trigger list... I wonder how feasible it would be to store the > pending trigger > event list in a temporary table instead of in ram. Related to that, I really wish that our statement-level triggers provided NEW and OLD recordsets like some other databases do. That would allow for RI triggers to be done on a per-statement basis, and they could aggregate keys to be checked. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828