Thread: partitioning and locking problems
We have a large database system designed around partitioning. Our application is characterized with - terabytes of data - billions of rows in dozens of base tables (and 100s of paritions) - 24x7 insert load of new data that cannot be stopped, data is time sensitive. - periodic reports that can have long running queries with query times measured in hours We have 2 classes of "maintenance" activities that are causing us problems: - periodically we need to change an insert rule on a view to point to a different partition. - periodically we need to delete data that is no longer needed. Performed via truncate. Under both these circumstances (truncate and create / replace rule) the locking behaviour of these commands can cause locking problems for us. The scenario is best illustrated as a series of steps: 1- long running report is running on view 2- continuous inserters into view into a table via a rule 3- truncate or rule change occurs, taking an exclusive lock. Must wait for #1 to finish. 4- new reports and inserters must now wait for #3. 5- now everyone is waiting for a single query in #1. Results in loss of insert data granularity (important for our application). Would like to understand the implications of changing postgres' code/locking for rule changes and truncate to not require locking out select statements? The following is a simplified schema to help illustrate the problem. create table a_1 ( pkey int primary key ); create table a_2 ( pkey int primary key ); create view a as select * from a_1 union all select * from a_2; create function change_rule(int) returns void as ' begin execute ''create or replace rule insert as on insert to a do instead insert into a_''||$1||''(pkey) values(NEW.pkey)''; end; ' language plpgsql; -- change rule, execute something like the following periodically select change_rule(1); We've looked at the code and the rule changes appear "easy" but we are concerned about the required changes for truncate. Thanks Marc
"Marc Morin" <marc@sandvine.com> writes: > Would like to understand the implications of changing postgres' > code/locking for rule changes and truncate to not require locking out > select statements? It won't work... regards, tom lane
Marc Morin wrote: > Under both these circumstances (truncate and create / replace rule) the > locking behaviour of these commands can cause locking problems for us. > The scenario is best illustrated as a series of steps: > > > 1- long running report is running on view > 2- continuous inserters into view into a table via a rule > 3- truncate or rule change occurs, taking an exclusive lock. > Must wait for #1 to finish. > 4- new reports and inserters must now wait for #3. > 5- now everyone is waiting for a single query in #1. Results > in loss of insert data granularity (important for our application). How much would you get from splitting the view into two: reporting and inserting? -- Richard Huxton Archonet Ltd
Tom, Do you mean it would be impossible to change the code so that existing selects continue to use the pre-truncated table until they commit? Or just require a more extensive change? The update/insert rule change appears to be more more doable? No? > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, February 01, 2006 12:50 AM > To: Marc Morin > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] partitioning and locking problems > > "Marc Morin" <marc@sandvine.com> writes: > > Would like to understand the implications of changing postgres' > > code/locking for rule changes and truncate to not require > locking out > > select statements? > > It won't work... > > regards, tom lane > >
"Marc Morin" <marc@sandvine.com> writes: > Do you mean it would be impossible to change the code so that existing > selects continue to use the pre-truncated table until they commit? Yes, because that table won't exist any more (as in the file's been unlinked) once the TRUNCATE commits. > The update/insert rule change appears to be more more doable? No? You've still got race conditions there: do onlooker transactions see the old set of rules, or the new set, or some unholy mixture? Removing the lock as you suggest would make it possible for the rule rewriter to pick up non-self-consistent data from the system catalogs, leading to arbitrarily bad behavior ... if you're lucky, it'll just crash, if you're not lucky the incorrect rule will do a fandango on your data. regards, tom lane
Hi, Marc, Marc Morin wrote: > 1- long running report is running on view > 2- continuous inserters into view into a table via a rule > 3- truncate or rule change occurs, taking an exclusive lock. > Must wait for #1 to finish. > 4- new reports and inserters must now wait for #3. > 5- now everyone is waiting for a single query in #1. Results > in loss of insert data granularity (important for our application). Apart from having two separate views (one for report, one for insert) as Richard suggested: If you have fixed times for #3, don't start any #1 that won't finish before it's time for #3. You could also use the LOCK command on an empty lock table at the beginning of each #1 or #3 transaction to prevent #3 from getting the view lock before #1 is finished. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
Using a separate lock table is what we've decided to do in this particular case to serialize #1 and #3. Inserters don't take this lock and as such will not be stalled. > -----Original Message----- > From: Markus Schaber [mailto:schabi@logix-tt.com] > Sent: Thursday, February 02, 2006 7:44 AM > To: Marc Morin > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] partitioning and locking problems > > Hi, Marc, > > Marc Morin wrote: > > > 1- long running report is running on view > > 2- continuous inserters into view into a table via a rule > > 3- truncate or rule change occurs, taking an exclusive lock. > > Must wait for #1 to finish. > > 4- new reports and inserters must now wait for #3. > > 5- now everyone is waiting for a single query in #1. Results > > in loss of insert data granularity (important for our application). > > Apart from having two separate views (one for report, one for > insert) as Richard suggested: > > If you have fixed times for #3, don't start any #1 that won't > finish before it's time for #3. > > You could also use the LOCK command on an empty lock table at > the beginning of each #1 or #3 transaction to prevent #3 from > getting the view lock before #1 is finished. > > > HTH, > Markus > > -- > Markus Schaber | Logical Tracking&Tracing International AG > Dipl. Inf. | Software Development GIS > > Fight against software patents in EU! www.ffii.org > www.nosoftwarepatents.org > >
On Wed, Feb 01, 2006 at 10:20:21AM -0500, Tom Lane wrote: > "Marc Morin" <marc@sandvine.com> writes: > > Do you mean it would be impossible to change the code so that existing > > selects continue to use the pre-truncated table until they commit? > > Yes, because that table won't exist any more (as in the file's been > unlinked) once the TRUNCATE commits. Is there a reason the truncate must happen in 'real time'? If TRUNCATE marked a table as "truncated as of tid, cid" and created a new set of empty objects to be used by all transactions after that, then it should be possible to truncate without waiting on existing selects. Unfortunately, I can't think of any way to avoid blocking new inserters, but in the partitioning case that shouldn't matter. > > The update/insert rule change appears to be more more doable? No? > > You've still got race conditions there: do onlooker transactions see the > old set of rules, or the new set, or some unholy mixture? Removing the > lock as you suggest would make it possible for the rule rewriter to pick > up non-self-consistent data from the system catalogs, leading to > arbitrarily bad behavior ... if you're lucky, it'll just crash, if > you're not lucky the incorrect rule will do a fandango on your data. Where can one read about why the catalogs can't/don't use MVCC (I'm assuming that's why this won't work...) -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, 2006-02-02 at 11:27 -0500, Marc Morin wrote: > > > 1- long running report is running on view > > > 2- continuous inserters into view into a table via a rule > > > 3- truncate or rule change occurs, taking an exclusive lock. > > > Must wait for #1 to finish. > > > 4- new reports and inserters must now wait for #3. > > > 5- now everyone is waiting for a single query in #1. Results > > > in loss of insert data granularity (important for our application). > Using a separate lock table is what we've decided to do in this > particular case to serialize #1 and #3. Inserters don't take this lock > and as such will not be stalled. Would it not be simpler to have the Inserters change from one table to another either upon command, on a fixed timing cycle or even better based upon one of the inserted values (Logdate?) (or all 3?). (Requires changes in the application layer: 3GL or db functions). The truncates can wait until the data has stopped being used. I'd be disinclined to using the locking system as a scheduling tool. Best Regards, Simon Riggs
At 05:09 PM 2/7/2006, Simon Riggs wrote: >I'd be disinclined to using the locking system as a scheduling tool. I Agree with Simon. Using the locking system for scheduling feels like a form of Programming by Side Effect. Ron
All good ideas, unfortunately, we can't change the inserting applicatin code easily. > -----Original Message----- > From: Simon Riggs [mailto:simon@2ndquadrant.com] > Sent: Tuesday, February 07, 2006 5:09 PM > To: Marc Morin > Cc: Markus Schaber; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] partitioning and locking problems > > On Thu, 2006-02-02 at 11:27 -0500, Marc Morin wrote: > > > > > 1- long running report is running on view > > > > 2- continuous inserters into view into a table > via a rule > > > > 3- truncate or rule change occurs, taking an > exclusive lock. > > > > Must wait for #1 to finish. > > > > 4- new reports and inserters must now wait for #3. > > > > 5- now everyone is waiting for a single query > in #1. Results > > > > in loss of insert data granularity (important for our > application). > > > Using a separate lock table is what we've decided to do in this > > particular case to serialize #1 and #3. Inserters don't take this > > lock and as such will not be stalled. > > Would it not be simpler to have the Inserters change from one > table to another either upon command, on a fixed timing cycle > or even better based upon one of the inserted values > (Logdate?) (or all 3?). (Requires changes in the application > layer: 3GL or db functions). > > The truncates can wait until the data has stopped being used. > > I'd be disinclined to using the locking system as a scheduling tool. > > Best Regards, Simon Riggs > > >
On Tue, Feb 07, 2006 at 10:09:02PM +0000, Simon Riggs wrote: > On Thu, 2006-02-02 at 11:27 -0500, Marc Morin wrote: > > > > > 1- long running report is running on view > > > > 2- continuous inserters into view into a table via a rule > > > > 3- truncate or rule change occurs, taking an exclusive lock. > > > > Must wait for #1 to finish. > > > > 4- new reports and inserters must now wait for #3. > > > > 5- now everyone is waiting for a single query in #1. Results > > > > in loss of insert data granularity (important for our application). > > > Using a separate lock table is what we've decided to do in this > > particular case to serialize #1 and #3. Inserters don't take this lock > > and as such will not be stalled. > > Would it not be simpler to have the Inserters change from one table to > another either upon command, on a fixed timing cycle or even better > based upon one of the inserted values (Logdate?) (or all 3?). (Requires > changes in the application layer: 3GL or db functions). Unfortunately, AFAIK rule changes would suffer from the exact same problem, which will be a serious issue for table partitioning. If you try and add a new partition while a long report is running you'll end up blocking everything. ALso, IIRC the OP was trying *not* to have the locking system impose scheduling. I believe the intention is that either 1 not block 3 or 3 not block 4. I'm honestly somewhat surprised someone hasn't run into this problem with partitioning yet; or maybe everyone who needs to do long transactions just shoves those off to slony slaves... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Tue, 2006-02-07 at 18:59 -0600, Jim C. Nasby wrote: > I'm honestly somewhat surprised someone hasn't run into this problem > with partitioning yet; or maybe everyone who needs to do long > transactions just shoves those off to slony slaves... All DDL takes locks, on all DBMS. Best Regards, Simon Riggs