Thread: WIP patch for parallel pg_dump
This is the second patch for parallel pg_dump, now the actual part that
parallelizes the whole thing. More precisely, it adds parallel backup/restore
to pg_dump/pg_restore for the directory archive format and keeps the parallel
restore part of the custom archive format. Combined with my archive format
directory patch, which also includes a prototype of the liblzf compression you
can combine this compression with any of the just mentioned backup/restore
scenarios. This patch is on top of the previous directory patch.
You would add a regular parallel dump with
$ pg_dump -j 4 -Fd -f out.dir dbname
In previous discussions there was a request to add support for multiple
directories, which I have done as well, so that you can also run
$ pg_dump -j 4 -Fd -f dir1:dir2:dir3 dbname
to equally distribute the data among those three directories (we can still
discuss the syntax, I am not all that happy with the colon either...)
The dump would always start with the largest objects, by looking at the
relpages column of pg_class which should give a good estimate. The order of the
objects to restore is determined by the dependencies among the objects (which
is already used in the parallel restore of the custom archivetype).
The file test.sh includes some example commands that I have run here as a kind
of regression test that should give you an impression of how to call it from the
command line.
One thing that is currently missing is proper support for Windows, this is the next
thing that I will be working on. Also this version still gives quite a bunch of debug
information about what the processes are doing, so don't try to pipe the
pg_dump output anywhere (even when not run in parallel), it will probably just
not work...
The missing part that would make parallel pg_dump work with no strings attached
is snapshot synchronization. As long as there are no synchronized snapshots,
you would need to stop writing to your database before starting the parallel
pg_dump. However it turns out that most often when you are especially concerned
about a fast dump, you have shut down your applications anyway (which is the
reason why you are so concerned about speed in the first place). These cases
are typically database migrations from one host/platform to another or database
upgrades without pg_migrator.
Joachim
parallelizes the whole thing. More precisely, it adds parallel backup/restore
to pg_dump/pg_restore for the directory archive format and keeps the parallel
restore part of the custom archive format. Combined with my archive format
directory patch, which also includes a prototype of the liblzf compression you
can combine this compression with any of the just mentioned backup/restore
scenarios. This patch is on top of the previous directory patch.
You would add a regular parallel dump with
$ pg_dump -j 4 -Fd -f out.dir dbname
In previous discussions there was a request to add support for multiple
directories, which I have done as well, so that you can also run
$ pg_dump -j 4 -Fd -f dir1:dir2:dir3 dbname
to equally distribute the data among those three directories (we can still
discuss the syntax, I am not all that happy with the colon either...)
The dump would always start with the largest objects, by looking at the
relpages column of pg_class which should give a good estimate. The order of the
objects to restore is determined by the dependencies among the objects (which
is already used in the parallel restore of the custom archivetype).
The file test.sh includes some example commands that I have run here as a kind
of regression test that should give you an impression of how to call it from the
command line.
One thing that is currently missing is proper support for Windows, this is the next
thing that I will be working on. Also this version still gives quite a bunch of debug
information about what the processes are doing, so don't try to pipe the
pg_dump output anywhere (even when not run in parallel), it will probably just
not work...
The missing part that would make parallel pg_dump work with no strings attached
is snapshot synchronization. As long as there are no synchronized snapshots,
you would need to stop writing to your database before starting the parallel
pg_dump. However it turns out that most often when you are especially concerned
about a fast dump, you have shut down your applications anyway (which is the
reason why you are so concerned about speed in the first place). These cases
are typically database migrations from one host/platform to another or database
upgrades without pg_migrator.
Joachim
Attachment
On Sun, Nov 14, 2010 at 6:52 PM, Joachim Wieland <joe@mcknight.de> wrote: > You would add a regular parallel dump with > > $ pg_dump -j 4 -Fd -f out.dir dbname So this is an updated series of patches for my parallel pg_dump WIP patch. Most importantly it now runs on Windows once you get it to compile there (I have added the new files to the respective project of Mkvcbuild.pm but I wondered why the other archive formats do not need to be defined in that file...). So far nobody has volunteered to review this patch. It would be great if people could at least check it out, run it and let me know if it works and if they have any comments. I have put all four patches in a tar archive, the patches must be applied sequentially: 1. pg_dump_compression-refactor.diff 2. pg_dump_directory.diff 3. pg_dump_directory_parallel.diff 4. pg_dump_directory_parallel_lzf.diff The compression-refactor patch does not include Heikki's latest changes yet. And the last of the four patches adds LZF compression for whoever wants to try that out. You need to link against an already installed liblzf and call it with --compress-lzf. Joachim
Attachment
On 02.12.2010 07:39, Joachim Wieland wrote: > On Sun, Nov 14, 2010 at 6:52 PM, Joachim Wieland<joe@mcknight.de> wrote: >> You would add a regular parallel dump with >> >> $ pg_dump -j 4 -Fd -f out.dir dbname > > So this is an updated series of patches for my parallel pg_dump WIP > patch. Most importantly it now runs on Windows once you get it to > compile there (I have added the new files to the respective project of > Mkvcbuild.pm but I wondered why the other archive formats do not need > to be defined in that file...). > > So far nobody has volunteered to review this patch. It would be great > if people could at least check it out, run it and let me know if it > works and if they have any comments. That's a big patch.. I don't see the point of the sort-by-relpages code. The order the objects are dumped should be irrelevant, as long as you obey the restrictions dictated by dependencies. Or is it only needed for the multiple-target-dirs feature? Frankly I don't see the point of that, so it would be good to cull it out at least in this first stage. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > I don't see the point of the sort-by-relpages code. The order the objects > are dumped should be irrelevant, as long as you obey the restrictions > dictated by dependencies. Or is it only needed for the multiple-target-dirs > feature? Frankly I don't see the point of that, so it would be good to cull > it out at least in this first stage. From the talk at CHAR(10), and provided memory serves, it's an optimisation so that you're doing largest file in a process and all the little file in other processes. In lots of case the total pg_dump duration is then reduced to about the time to dump the biggest files. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Dec 2, 2010 at 6:19 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I don't see the point of the sort-by-relpages code. The order the objects > are dumped should be irrelevant, as long as you obey the restrictions > dictated by dependencies. Or is it only needed for the multiple-target-dirs > feature? Frankly I don't see the point of that, so it would be good to cull > it out at least in this first stage. A guy called Dimitri Fontaine actually proposed the serveral-directories feature here and other people liked the idea. http://archives.postgresql.org/pgsql-hackers/2008-02/msg01061.php :-) The code doesn't change much with or without it, and if people are no longer in favour of it, I have no problem with taking it out. As Dimitri has already pointed out, the relpage sorting thing is there to start with the largest table(s) first. Joachim
Joachim Wieland <joe@mcknight.de> writes: > A guy called Dimitri Fontaine actually proposed the > serveral-directories feature here and other people liked the idea. Hehe :) Reading that now, it could be that I didn't know at the time that given a powerful enough subsystem disk there's no way to saturate it with one CPU. So the use case of parralel dump in a bunch or user given locations would be to use different mount points (disk subsystems) at the same time. Not sure how releveant it is. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 12/02/2010 05:50 AM, Dimitri Fontaine wrote: > So the use case of parralel dump in a bunch or user given locations > would be to use different mount points (disk subsystems) at the same > time. Not sure how releveant it is. I think it will complicate this feature unnecessarily for 9.1. Personally, I need this patch so much I'm thinking of backporting it. However, having all the data go to one directory/mount wouldn't trouble me at all. Now, if only I could think of some way to write a parallel dump to a set of pipes, I'd be in heaven. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 12/02/2010 12:56 PM, Josh Berkus wrote: > On 12/02/2010 05:50 AM, Dimitri Fontaine wrote: >> So the use case of parralel dump in a bunch or user given locations >> would be to use different mount points (disk subsystems) at the same >> time. Not sure how releveant it is. > > I think it will complicate this feature unnecessarily for 9.1. > Personally, I need this patch so much I'm thinking of backporting it. > However, having all the data go to one directory/mount wouldn't > trouble me at all. > > Now, if only I could think of some way to write a parallel dump to a > set of pipes, I'd be in heaven. The only way I can see that working sanely would be to have a program gathering stuff at the other end of the pipes, and ensuring it was all coherent. That would be a huge growth in scope for this, and I seriously doubt it's worth it. cheers andrew
>> Now, if only I could think of some way to write a parallel dump to a >> set of pipes, I'd be in heaven. > > The only way I can see that working sanely would be to have a program > gathering stuff at the other end of the pipes, and ensuring it was all > coherent. That would be a huge growth in scope for this, and I seriously > doubt it's worth it. Oh, no question. And there's workarounds ... sshfs, for example. I'm just thinking of the ad-hoc parallel backup I'm running today, which relies heavily on pipes. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Thu, Dec 2, 2010 at 12:56 PM, Josh Berkus <josh@agliodbs.com> wrote: > Now, if only I could think of some way to write a parallel dump to a set of > pipes, I'd be in heaven. What exactly are you trying to accomplish with the pipes? Joachim
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > That's a big patch.. Not nearly big enough :-( In the past, proposals for this have always been rejected on the grounds that it's impossible to assure a consistent dump if different connections are used to read different tables. I fail to understand why that consideration can be allowed to go by the wayside now. regards, tom lane
On 12/02/2010 05:01 PM, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> That's a big patch.. > Not nearly big enough :-( > > In the past, proposals for this have always been rejected on the grounds > that it's impossible to assure a consistent dump if different > connections are used to read different tables. I fail to understand > why that consideration can be allowed to go by the wayside now. > > Well, snapshot cloning should allow that objection to be overcome, no? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/02/2010 05:01 PM, Tom Lane wrote: >> In the past, proposals for this have always been rejected on the grounds >> that it's impossible to assure a consistent dump if different >> connections are used to read different tables. I fail to understand >> why that consideration can be allowed to go by the wayside now. > Well, snapshot cloning should allow that objection to be overcome, no? Possibly, but we need to see that patch first not second. (I'm not actually convinced that snapshot cloning is the only problem here; locking could be an issue too, if there are concurrent processes trying to take locks that will conflict with pg_dump's. But the snapshot issue is definitely a showstopper.) regards, tom lane
Dimitri Fontaine wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > I don't see the point of the sort-by-relpages code. The order the objects > > are dumped should be irrelevant, as long as you obey the restrictions > > dictated by dependencies. Or is it only needed for the multiple-target-dirs > > feature? Frankly I don't see the point of that, so it would be good to cull > > it out at least in this first stage. > > >From the talk at CHAR(10), and provided memory serves, it's an > optimisation so that you're doing largest file in a process and all the > little file in other processes. In lots of case the total pg_dump > duration is then reduced to about the time to dump the biggest files. Seems there should be a comment in the code explaining why this is being done. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 12/02/2010 05:32 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 12/02/2010 05:01 PM, Tom Lane wrote: >>> In the past, proposals for this have always been rejected on the grounds >>> that it's impossible to assure a consistent dump if different >>> connections are used to read different tables. I fail to understand >>> why that consideration can be allowed to go by the wayside now. >> Well, snapshot cloning should allow that objection to be overcome, no? > Possibly, but we need to see that patch first not second. Yes, I agree with that. > (I'm not actually convinced that snapshot cloning is the only problem > here; locking could be an issue too, if there are concurrent processes > trying to take locks that will conflict with pg_dump's. But the > snapshot issue is definitely a showstopper.) > > Why is that more an issue with parallel pg_dump? cheers andrew
On Thu, Dec 2, 2010 at 5:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 12/02/2010 05:01 PM, Tom Lane wrote: >>> In the past, proposals for this have always been rejected on the grounds >>> that it's impossible to assure a consistent dump if different >>> connections are used to read different tables. I fail to understand >>> why that consideration can be allowed to go by the wayside now. > >> Well, snapshot cloning should allow that objection to be overcome, no? > > Possibly, but we need to see that patch first not second. Yes, by all means let's allow the perfect to be the enemy of the good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/02/2010 07:13 PM, Robert Haas wrote: > On Thu, Dec 2, 2010 at 5:32 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Andrew Dunstan<andrew@dunslane.net> writes: >>> On 12/02/2010 05:01 PM, Tom Lane wrote: >>>> In the past, proposals for this have always been rejected on the grounds >>>> that it's impossible to assure a consistent dump if different >>>> connections are used to read different tables. I fail to understand >>>> why that consideration can be allowed to go by the wayside now. >>> Well, snapshot cloning should allow that objection to be overcome, no? >> Possibly, but we need to see that patch first not second. > Yes, by all means let's allow the perfect to be the enemy of the good. > That seems like a bit of an easy shot. Requiring that parallel pg_dump produce a dump that is as consistent as non-parallel pg_dump currently produces isn't unreasonable. It's not stopping us moving forward, it's just not wanting to go backwards. And it shouldn't be terribly hard. IIRC Joachim has already done some work on it. cheers andrew
On Thu, Dec 2, 2010 at 7:21 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >>>>> In the past, proposals for this have always been rejected on the >>>>> grounds >>>>> that it's impossible to assure a consistent dump if different >>>>> connections are used to read different tables. I fail to understand >>>>> why that consideration can be allowed to go by the wayside now. >>>> Well, snapshot cloning should allow that objection to be overcome, no? >>> Possibly, but we need to see that patch first not second. >> Yes, by all means let's allow the perfect to be the enemy of the good. >> > > That seems like a bit of an easy shot. Requiring that parallel pg_dump > produce a dump that is as consistent as non-parallel pg_dump currently > produces isn't unreasonable. It's not stopping us moving forward, it's just > not wanting to go backwards. I certainly agree that would be nice. But if Joachim thought the patch were useless without that, perhaps he wouldn't have bothered writing it at this point. In fact, he doesn't think that, and he mentioned the use cases he sees in his original post. But even supposing you wouldn't personally find this useful in those situations, how can you possibly say that HE wouldn't find it useful in those situations? I understand that people sometimes show up here and ask for ridiculous things, but I don't think we should be too quick to attribute ridiculousness to regular contributors. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/02/2010 07:48 PM, Robert Haas wrote: > On Thu, Dec 2, 2010 at 7:21 PM, Andrew Dunstan<andrew@dunslane.net> wrote: >>>>>> In the past, proposals for this have always been rejected on the >>>>>> grounds >>>>>> that it's impossible to assure a consistent dump if different >>>>>> connections are used to read different tables. I fail to understand >>>>>> why that consideration can be allowed to go by the wayside now. >>>>> Well, snapshot cloning should allow that objection to be overcome, no? >>>> Possibly, but we need to see that patch first not second. >>> Yes, by all means let's allow the perfect to be the enemy of the good. >>> >> That seems like a bit of an easy shot. Requiring that parallel pg_dump >> produce a dump that is as consistent as non-parallel pg_dump currently >> produces isn't unreasonable. It's not stopping us moving forward, it's just >> not wanting to go backwards. > I certainly agree that would be nice. But if Joachim thought the > patch were useless without that, perhaps he wouldn't have bothered > writing it at this point. In fact, he doesn't think that, and he > mentioned the use cases he sees in his original post. But even > supposing you wouldn't personally find this useful in those > situations, how can you possibly say that HE wouldn't find it useful > in those situations? I understand that people sometimes show up here > and ask for ridiculous things, but I don't think we should be too > quick to attribute ridiculousness to regular contributors. Umm, nobody has attributed ridiculousness to anyone. Please don't put words in my mouth. But I think this is a perfectly reasonable discussion to have. Nobody gets to come along and get the features they want without some sort of consensus, not me, not you, not Joachim, not Tom. cheers andrew
On Dec 2, 2010, at 8:11 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Umm, nobody has attributed ridiculousness to anyone. Please don't put words in my mouth. But I think this is a perfectlyreasonable discussion to have. Nobody gets to come along and get the features they want without some sort of consensus,not me, not you, not Joachim, not Tom. I'm not disputing that we COULD reject the patch. I AM disputing that we've made a cogent argument for doing so. ...Robert
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/02/2010 05:32 PM, Tom Lane wrote: >> (I'm not actually convinced that snapshot cloning is the only problem >> here; locking could be an issue too, if there are concurrent processes >> trying to take locks that will conflict with pg_dump's. But the >> snapshot issue is definitely a showstopper.) > Why is that more an issue with parallel pg_dump? The scenario that bothers me is 1. pg_dump parent process AccessShareLocks everything to be dumped. 2. somebody else tries to acquire AccessExclusiveLock on table foo. 3. pg_dump child process is told to dump foo, tries to acquire AccessShareLock. Now, process 3 is blocked behind process 2 is blocked behind process 1 which is waiting for 3 to complete. Can you say "undetectable deadlock"? regards, tom lane
On 12/02/2010 09:09 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 12/02/2010 05:32 PM, Tom Lane wrote: >>> (I'm not actually convinced that snapshot cloning is the only problem >>> here; locking could be an issue too, if there are concurrent processes >>> trying to take locks that will conflict with pg_dump's. But the >>> snapshot issue is definitely a showstopper.) >> Why is that more an issue with parallel pg_dump? > The scenario that bothers me is > > 1. pg_dump parent process AccessShareLocks everything to be dumped. > > 2. somebody else tries to acquire AccessExclusiveLock on table foo. > hmm. > 3. pg_dump child process is told to dump foo, tries to acquire > AccessShareLock. > > Now, process 3 is blocked behind process 2 is blocked behind process 1 > which is waiting for 3 to complete. Can you say "undetectable deadlock"? > > Hmm. Yeah. Maybe we could get around it if we prefork the workers and they all acquire locks on everything to be dumped up front in nowait mode, right after the parent, and if they can't the whole dump fails. Or something along those lines. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Umm, nobody has attributed ridiculousness to anyone. Please don't put > words in my mouth. But I think this is a perfectly reasonable discussion > to have. Nobody gets to come along and get the features they want > without some sort of consensus, not me, not you, not Joachim, not Tom. In particular, this issue *has* been discussed before, and there was a consensus that preserving dump consistency was a requirement. I don't think that Joachim gets to bypass that decision just by submitting a patch that ignores it. regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes: > On 12/02/2010 09:09 PM, Tom Lane wrote: >> Now, process 3 is blocked behind process 2 is blocked behind process 1 >> which is waiting for 3 to complete. Can you say "undetectable deadlock"? > Hmm. Yeah. Maybe we could get around it if we prefork the workers and > they all acquire locks on everything to be dumped up front in nowait > mode, right after the parent, and if they can't the whole dump fails. Or > something along those lines. [ thinks for a bit... ] Actually it might be good enough if a child simply takes the lock it needs in nowait mode, and reports failure on error. We know the parent already has that lock, so the only way that the child's request can fail is if something conflicting with AccessShareLock is queued up behind the parent's lock. So failure to get the child lock immediately proves that the deadlock case applies. regards, tom lane
On 12/02/2010 09:41 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 12/02/2010 09:09 PM, Tom Lane wrote: >>> Now, process 3 is blocked behind process 2 is blocked behind process 1 >>> which is waiting for 3 to complete. Can you say "undetectable deadlock"? >> Hmm. Yeah. Maybe we could get around it if we prefork the workers and >> they all acquire locks on everything to be dumped up front in nowait >> mode, right after the parent, and if they can't the whole dump fails. Or >> something along those lines. > [ thinks for a bit... ] Actually it might be good enough if a child > simply takes the lock it needs in nowait mode, and reports failure on > error. We know the parent already has that lock, so the only way that > the child's request can fail is if something conflicting with > AccessShareLock is queued up behind the parent's lock. So failure to > get the child lock immediately proves that the deadlock case applies. > > Yeah, that would be a whole lot simpler. It would avoid the deadlock, but it would have lots more chances for failure. But it would at least be a good place to start. cheers andrew
On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > In particular, this issue *has* been discussed before, and there was a > consensus that preserving dump consistency was a requirement. I don't > think that Joachim gets to bypass that decision just by submitting a > patch that ignores it. I am not trying to bypass anything here :) Regarding the locking issue I probably haven't done sufficient research, at least I managed to miss the emails that mentioned it. Anyway, that seems to be solved now fortunately, I'm going to implement your idea over the weekend. Regarding snapshot cloning and dump consistency, I brought this up already several months ago and asked if the feature is considered useful even without snapshot cloning. And actually it was you who motivated me to work on it even without having snapshot consistency... http://archives.postgresql.org/pgsql-hackers/2010-03/msg01181.php In my patch pg_dump emits a warning when called with -j, if you feel better with an extra option --i-know-that-i-have-no-synchronized-snapshots, fine with me :-) In the end we provide a tool with limitations, it might not serve all use cases but there are use cases that would benefit a lot. I personally think this is better than to provide no tool at all... Joachim
On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Umm, nobody has attributed ridiculousness to anyone. Please don't put >> words in my mouth. But I think this is a perfectly reasonable discussion >> to have. Nobody gets to come along and get the features they want >> without some sort of consensus, not me, not you, not Joachim, not Tom. > > In particular, this issue *has* been discussed before, and there was a > consensus that preserving dump consistency was a requirement. I don't > think that Joachim gets to bypass that decision just by submitting a > patch that ignores it. Well, the discussion that Joachim linked too certainly doesn't have any sort of clear consensus that that's the only way to go. In fact, it seems to be much closer to the opposite consensus. Perhaps there is some OTHER time that this has been discussed where "synchronization is a hard requirement" was the consensus. There's an old saw that the nice thing about standards is there are so many to choose from, and the same thing can certainly be said about -hackers discussions on any particular topic. I actually think that the phrase "this has been discussed before and rejected" should be permanently removed from our list of excuses for rejecting a patch. Or if we must use that excuse, then I think a link to the relevant discussion is a must, and the relevant discussion had better reflect the fact that $TOPIC was in fact rejected. It seems to me that in at least 50% of cases, someone comes back and says one of the following things: 1. I searched the archives and could find no discussion along those lines. 2. I read that discussion and it doesn't appear to me that it reflects a rejection of this idea. Instead what people seemed to be saying was X. 3. At the time that might have been true, but what has changed in the meanwhile is X. In short, the problem with referring to previous discussions is that our memories grow fuzzy over time. We remember that an idea was not adopted, but not exactly why it wasn't adopted. We reject a new patch with a good implementation of $FEATURE because an old patch was badly done, or fell down on some peripheral issue, or just never got done. Veteran backend hackers understand the inevitable necessity of arguing about what consensus is actually reflected in the archives and whether it's still relevant, but new people can be (and frequently are) put off by it; and even for experienced contributors, it does little to advance the dialogue. Hmm, according to so-and-so's memory, sometime in the fourteen-year-history of the project someone didn't like this idea, or maybe a similar one. Whee, time to start Googling. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/02/2010 11:44 PM, Joachim Wieland wrote: > On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> In particular, this issue *has* been discussed before, and there was a >> consensus that preserving dump consistency was a requirement. I don't >> think that Joachim gets to bypass that decision just by submitting a >> patch that ignores it. > I am not trying to bypass anything here :) Regarding the locking > issue I probably haven't done sufficient research, at least I managed > to miss the emails that mentioned it. Anyway, that seems to be solved > now fortunately, I'm going to implement your idea over the weekend. > > Regarding snapshot cloning and dump consistency, I brought this up > already several months ago and asked if the feature is considered > useful even without snapshot cloning. And actually it was you who > motivated me to work on it even without having snapshot consistency... > > http://archives.postgresql.org/pgsql-hackers/2010-03/msg01181.php > > In my patch pg_dump emits a warning when called with -j, if you feel > better with an extra option > --i-know-that-i-have-no-synchronized-snapshots, fine with me :-) > > In the end we provide a tool with limitations, it might not serve all > use cases but there are use cases that would benefit a lot. I > personally think this is better than to provide no tool at all... > > > I think Tom's statement there: > I think migration to a new server version (that's too incompatible for > PITR or pg_migrate migration) is really the only likely use case. is just wrong. Say you have a site that's open 24/7. But there is a window of, say, 6 hours, each day, when it's almost but not quite quiet. You want to be able to make your disaster recovery dump within that window, and the low level of traffic means you can afford the degraded performance that might result from a parallel dump. Or say you have a hot standby machine from which you want to make the dump but want to set the max_standby_*_delay as low as possible. These are both cases where you might want parallel dump and yet you want dump consistency. I have a client currently considering the latter setup, and the timing tolerances are a little tricky. The times in which the system is in a state that we want dumped are fixed, and we want to be sure that the dump is finished by the next time such a time rolls around. (This is a system that in effect makes one giant state change at a time.) If we can't complete the dump in that time then there will be a delay introduced to the system's critical path. Parallel dump will be very useful in helping us avoid such a situation, but only if it's properly consistent. I think Josh Berkus' comments in the thread you mentioned are correct: > Actually, I'd say that there's a broad set of cases of people who want > to do a parallel pg_dump while their system is active. Parallel pg_dump > on a stopped system will help some people (for migration, particularly) > but parallel pg_dump with snapshot cloning will help a lot more people. cheers andrew
On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > I think Josh Berkus' comments in the thread you mentioned are correct: > >> Actually, I'd say that there's a broad set of cases of people who want >> to do a parallel pg_dump while their system is active. Parallel pg_dump >> on a stopped system will help some people (for migration, particularly) >> but parallel pg_dump with snapshot cloning will help a lot more people. But you failed to quote the rest of what he said: > So: if parallel dump in single-user mode is what you can get done, then > do it. We can always improve it later, and we have to start somewhere. > But we will eventually need parallel pg_dump on active systems, and > that should remain on the TODO list. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/03/2010 11:23 AM, Robert Haas wrote: > On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan<andrew@dunslane.net> wrote: >> I think Josh Berkus' comments in the thread you mentioned are correct: >> >>> Actually, I'd say that there's a broad set of cases of people who want >>> to do a parallel pg_dump while their system is active. Parallel pg_dump >>> on a stopped system will help some people (for migration, particularly) >>> but parallel pg_dump with snapshot cloning will help a lot more people. > But you failed to quote the rest of what he said: > >> So: if parallel dump in single-user mode is what you can get done, then >> do it. We can always improve it later, and we have to start somewhere. >> But we will eventually need parallel pg_dump on active systems, and >> that should remain on the TODO list. Right, and the reason I don't think that's right is that it seems to me like a serious potential footgun. But in any case, the reason I quoted Josh was in answer to a different point, namely Tom's statement about the limited potential uses. cheers andre
On Fri, Dec 3, 2010 at 11:40 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 12/03/2010 11:23 AM, Robert Haas wrote: >> >> On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan<andrew@dunslane.net> >> wrote: >>> >>> I think Josh Berkus' comments in the thread you mentioned are correct: >>> >>>> Actually, I'd say that there's a broad set of cases of people who want >>>> to do a parallel pg_dump while their system is active. Parallel pg_dump >>>> on a stopped system will help some people (for migration, particularly) >>>> but parallel pg_dump with snapshot cloning will help a lot more people. >> >> But you failed to quote the rest of what he said: >> >>> So: if parallel dump in single-user mode is what you can get done, then >>> do it. We can always improve it later, and we have to start somewhere. >>> But we will eventually need parallel pg_dump on active systems, and >>> that should remain on the TODO list. > > Right, and the reason I don't think that's right is that it seems to me like > a serious potential footgun. > > But in any case, the reason I quoted Josh was in answer to a different > point, namely Tom's statement about the limited potential uses. I know the use cases are limited, but I think it's still useful on its own. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of vie dic 03 13:56:32 -0300 2010: > I know the use cases are limited, but I think it's still useful on its own. I don't understand what's so difficult about starting with the snapshot cloning patch. AFAIR it's already been written anyway, no? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 12/03/2010 12:17 PM, Alvaro Herrera wrote: > Excerpts from Robert Haas's message of vie dic 03 13:56:32 -0300 2010: > >> I know the use cases are limited, but I think it's still useful on its own. > I don't understand what's so difficult about starting with the snapshot > cloning patch. AFAIR it's already been written anyway, no? Yeah. If we can do it then this whole argument becomes moot. Like you I don't see why we can't. cheers andrew
Joachim Wieland wrote: > Regarding snapshot cloning and dump consistency, I brought this up > already several months ago and asked if the feature is considered > useful even without snapshot cloning. In addition, Joachim submitted a synchronized snapshot patch that looks to me like it slipped through the cracks without being fully explored. Since it's split in the official archives the easiest way to read the thread is at http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg143866.html Or you can use these two: http://archives.postgresql.org/pgsql-hackers/2010-01/msg00916.php http://archives.postgresql.org/pgsql-hackers/2010-02/msg00363.php That never made it into a CommitFest proper that I can see, it just picked up review mainly from Markus. The way I read that thread, there were two objections: 1) This mechanism isn't general enough for all use-cases outside of pg_dump, which doesn't make it wrong when the question is how to get parallel pg_dump running 2) Running as superuser is excessive. Running as the database owner was suggested as likely to be good enough for pg_dump purposes. Ultimately I think that stalled because without a client that needed it the code wasn't so interesting yet. But now there is one; should that get revived again? It seems like all of the pieces needed to build what's really desired here are available, it's just the always non-trivial task of integrating them together the right way that's needed. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us
Greg Smith <greg@2ndquadrant.com> writes: > In addition, Joachim submitted a synchronized snapshot patch that looks > to me like it slipped through the cracks without being fully explored. > ... > The way I read that thread, there were two objections: > 1) This mechanism isn't general enough for all use-cases outside of > pg_dump, which doesn't make it wrong when the question is how to get > parallel pg_dump running > 2) Running as superuser is excessive. Running as the database owner was > suggested as likely to be good enough for pg_dump purposes. IIRC, in old discussions of this problem we first considered allowing clients to pull down an explicit representation of their snapshot (which actually is an existing feature now, txid_current_snapshot()) and then upload that again to become the active snapshot in another connection. That was rejected on the grounds that you could cause all kinds of mischief by uploading a bad snapshot; so we decided to think about providing a server-side-only means to clone another backend's current snapshot. Which is essentially what Joachim's above-mentioned patch provides. However, as was discussed in that thread, that approach is far from being ideal either. I'm wondering if we should reconsider the pass-it-through-the-client approach, because if we could make that work it would be more general and it wouldn't need any special privileges. The trick seems to be to apply sufficient sanity testing to the snapshot proposed to be installed in the subsidiary transaction. I think the requirements would basically be (1) xmin <= any listed XIDs < xmax (2) xmin not so old as to cause GlobalXmin to decrease (3) xmax not beyond current XID counter (4) XID list includes all still-running XIDs in the given range One tricky part would be ensuring GlobalXmin doesn't decrease when the snap is installed, but I think that could be made to work if we take ProcArrayLock exclusively and insist on observing some other running transaction with xmin <= proposed xmin. For the pg_dump case this would certainly hold since xmin would be the parent pg_dump's xmin. Given the checks stated above, it would be possible for someone to install a snapshot that corresponds to no actual state of the database, eg it shows some T1 as running and T2 as committed when actually T1 committed before T2. I don't see any simple way for the installation function to detect that, but I'm not sure whether it matters. The user might see inconsistent data, but do we care? Perhaps as a safety measure we should only allow snapshot installation in read-only transactions, so that even if the xact does observe inconsistent data it can't possibly corrupt the database state thereby. This'd be no skin off pg_dump's nose, obviously. Or compromise on "only superusers can do it in non-read-only transactions". Thoughts? regards, tom lane
On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm wondering if we should reconsider the pass-it-through-the-client > approach, because if we could make that work it would be more general and > it wouldn't need any special privileges. The trick seems to be to apply > sufficient sanity testing to the snapshot proposed to be installed in > the subsidiary transaction. I think the requirements would basically be > (1) xmin <= any listed XIDs < xmax > (2) xmin not so old as to cause GlobalXmin to decrease > (3) xmax not beyond current XID counter > (4) XID list includes all still-running XIDs in the given range > > Thoughts? I think this is too ugly to live. I really think it's a very bad idea for database clients to need to explicitly know anywhere near this many details about how the server represents snapshots. It's not impossible we might want to change this in the future, and even if we don't, it seems to me to be exposing a whole lot of unnecessary internal grottiness. How about just pg_publish_snapshot(), returning a token that is only valid until the end of the transaction in which it was called, and pg_subscribe_snapshot(token)? The implementation can be that the publisher writes its snapshot to a temp file and returns the name of the temp file, setting an at-commit hook to remove the temp file. The subscriber reads the temp file and sets the contents as its transaction snapshot. If security is a concern, one could also save the publisher's role OID to the file and require the subscriber's to match. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/05/2010 08:55 PM, Robert Haas wrote: > On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> I'm wondering if we should reconsider the pass-it-through-the-client >> approach, because if we could make that work it would be more general and >> it wouldn't need any special privileges. The trick seems to be to apply >> sufficient sanity testing to the snapshot proposed to be installed in >> the subsidiary transaction. I think the requirements would basically be >> (1) xmin<= any listed XIDs< xmax >> (2) xmin not so old as to cause GlobalXmin to decrease >> (3) xmax not beyond current XID counter >> (4) XID list includes all still-running XIDs in the given range >> >> Thoughts? > I think this is too ugly to live. I really think it's a very bad idea > for database clients to need to explicitly know anywhere near this > many details about how the server represents snapshots. It's not > impossible we might want to change this in the future, and even if we > don't, it seems to me to be exposing a whole lot of unnecessary > internal grottiness. > > How about just pg_publish_snapshot(), returning a token that is only > valid until the end of the transaction in which it was called, and > pg_subscribe_snapshot(token)? The implementation can be that the > publisher writes its snapshot to a temp file and returns the name of > the temp file, setting an at-commit hook to remove the temp file. The > subscriber reads the temp file and sets the contents as its > transaction snapshot. If security is a concern, one could also save > the publisher's role OID to the file and require the subscriber's to > match. Why not just say give me the snapshot currently held by process nnnn? And please, not temp files if possible. cheers andrew
On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Why not just say give me the snapshot currently held by process nnnn? > > And please, not temp files if possible. As far as I'm aware, the full snapshot doesn't normally exist in shared memory, hence the need for publication of some sort. We could dedicate a shared memory region for publication but then you have to decide how many slots to allocate, and any number you pick will be too many for some people and not enough for others, not to mention that shared memory is a fairly precious resource. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Dec 5, 2010 at 9:27 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> Why not just say give me the snapshot currently held by process nnnn? >> >> And please, not temp files if possible. > > As far as I'm aware, the full snapshot doesn't normally exist in > shared memory, hence the need for publication of some sort. We could > dedicate a shared memory region for publication but then you have to > decide how many slots to allocate, and any number you pick will be too > many for some people and not enough for others, not to mention that > shared memory is a fairly precious resource. So here is a patch that I have been playing with in the past, I have done it a while back and thanks go to Koichi Suzuki for his helpful comments. I have not published it earlier because I haven't worked on it recently and from the discussion that I brought up in march I got the feeling that people are fine with having a first version of parallel dump without synchronized snapshots. I am not really sure that what the patch does is sufficient nor if it does it in the right way but I hope that it can serve as a basis to collect ideas (and doubt). My idea is pretty much similar to Robert's about publishing snapshots and subscribing to them, the patch even uses these words. Basically the idea is that a transaction in isolation level serializable can publish a snapshot and as long as this transaction is alive, its snapshot can be adopted by other transactions. Requiring the publishing transaction to be serializable guarantees that the copy of the snapshot in shared memory is always current. When the transaction ends, the copy of the snapshot is also invalidated and cannot be adopted anymore. So instead of doing explicit checks, the patch aims at always having a reference transaction around that guarantees validity of the snapshot information in shared memory. The patch currently creates a new area in shared memory to store snapshot information but we can certainly discuss this... I had a GUC in mind that can control the number of available "slots", similar to max_prepared_transactions. Snapshot information can become quite large, especially with a high number of max_connections. Known limitations: the patch is lacking awareness of prepared transactions completely and doesn't check if both backends belong to the same user. Joachim
Attachment
Thank you Joachim; Yes, and the current patch requires the original (publisher) transaction is alive to prevent RecentXmin updated. I hope this restriction is acceptable if publishing/subscribing is provided via functions, not statements. Cheers; ---------- Koichi Suzuki 2010/12/6 Joachim Wieland <joe@mcknight.de>: > On Sun, Dec 5, 2010 at 9:27 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >>> Why not just say give me the snapshot currently held by process nnnn? >>> >>> And please, not temp files if possible. >> >> As far as I'm aware, the full snapshot doesn't normally exist in >> shared memory, hence the need for publication of some sort. We could >> dedicate a shared memory region for publication but then you have to >> decide how many slots to allocate, and any number you pick will be too >> many for some people and not enough for others, not to mention that >> shared memory is a fairly precious resource. > > So here is a patch that I have been playing with in the past, I have > done it a while back and thanks go to Koichi Suzuki for his helpful > comments. I have not published it earlier because I haven't worked on > it recently and from the discussion that I brought up in march I got > the feeling that people are fine with having a first version of > parallel dump without synchronized snapshots. > > I am not really sure that what the patch does is sufficient nor if it > does it in the right way but I hope that it can serve as a basis to > collect ideas (and doubt). > > My idea is pretty much similar to Robert's about publishing snapshots > and subscribing to them, the patch even uses these words. > > Basically the idea is that a transaction in isolation level > serializable can publish a snapshot and as long as this transaction is > alive, its snapshot can be adopted by other transactions. Requiring > the publishing transaction to be serializable guarantees that the copy > of the snapshot in shared memory is always current. When the > transaction ends, the copy of the snapshot is also invalidated and > cannot be adopted anymore. So instead of doing explicit checks, the > patch aims at always having a reference transaction around that > guarantees validity of the snapshot information in shared memory. > > The patch currently creates a new area in shared memory to store > snapshot information but we can certainly discuss this... I had a GUC > in mind that can control the number of available "slots", similar to > max_prepared_transactions. Snapshot information can become quite > large, especially with a high number of max_connections. > > Known limitations: the patch is lacking awareness of prepared > transactions completely and doesn't check if both backends belong to > the same user. > > > Joachim >
On 06.12.2010 02:55, Robert Haas wrote: > On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> I'm wondering if we should reconsider the pass-it-through-the-client >> approach, because if we could make that work it would be more general and >> it wouldn't need any special privileges. The trick seems to be to apply >> sufficient sanity testing to the snapshot proposed to be installed in >> the subsidiary transaction. I think the requirements would basically be >> (1) xmin<= any listed XIDs< xmax >> (2) xmin not so old as to cause GlobalXmin to decrease >> (3) xmax not beyond current XID counter >> (4) XID list includes all still-running XIDs in the given range >> >> Thoughts? > > I think this is too ugly to live. I really think it's a very bad idea > for database clients to need to explicitly know anywhere near this > many details about how the server represents snapshots. It's not > impossible we might want to change this in the future, and even if we > don't, it seems to me to be exposing a whole lot of unnecessary > internal grottiness. The client doesn't need to know anything about the snapshot blob that the server gives it. It just needs to pass it back to the server through the other connection. To the client, it's just an opaque chunk of bytes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 06.12.2010 02:55, Robert Haas wrote: >> >> On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> >>> I'm wondering if we should reconsider the pass-it-through-the-client >>> approach, because if we could make that work it would be more general and >>> it wouldn't need any special privileges. The trick seems to be to apply >>> sufficient sanity testing to the snapshot proposed to be installed in >>> the subsidiary transaction. I think the requirements would basically be >>> (1) xmin<= any listed XIDs< xmax >>> (2) xmin not so old as to cause GlobalXmin to decrease >>> (3) xmax not beyond current XID counter >>> (4) XID list includes all still-running XIDs in the given range >>> >>> Thoughts? >> >> I think this is too ugly to live. I really think it's a very bad idea >> for database clients to need to explicitly know anywhere near this >> many details about how the server represents snapshots. It's not >> impossible we might want to change this in the future, and even if we >> don't, it seems to me to be exposing a whole lot of unnecessary >> internal grottiness. > > The client doesn't need to know anything about the snapshot blob that the > server gives it. It just needs to pass it back to the server through the > other connection. To the client, it's just an opaque chunk of bytes. I suppose that would work, but I still think it's a bad idea. We made this mistake with expression trees. Any oversight in the code that validates the chunk of bytes when it (or a modified version) is sent back to the server turns into a security hole. I think it's a whole lot simpler and cleaner to keep the representation details private to the server. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06.12.2010 14:57, Robert Haas wrote: > On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> The client doesn't need to know anything about the snapshot blob that the >> server gives it. It just needs to pass it back to the server through the >> other connection. To the client, it's just an opaque chunk of bytes. > > I suppose that would work, but I still think it's a bad idea. We made > this mistake with expression trees. Any oversight in the code that > validates the chunk of bytes when it (or a modified version) is sent > back to the server turns into a security hole. True, but a snapshot is a lot simpler than an expression tree. It's pretty much impossible to plug all the holes in the expression-tree reading functions, and keep them hole-free in the future. The expression tree format is constantly in flux. A snapshot, however, is a fairly isolated small data structure that rarely changes. > I think it's a whole > lot simpler and cleaner to keep the representation details private to > the server. Well, then you need some sort of cross-backend communication, which is always a bit clumsy. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 06.12.2010 14:57, Robert Haas wrote: >> >> On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> >>> The client doesn't need to know anything about the snapshot blob that the >>> server gives it. It just needs to pass it back to the server through the >>> other connection. To the client, it's just an opaque chunk of bytes. >> >> I suppose that would work, but I still think it's a bad idea. We made >> this mistake with expression trees. Any oversight in the code that >> validates the chunk of bytes when it (or a modified version) is sent >> back to the server turns into a security hole. > > True, but a snapshot is a lot simpler than an expression tree. It's pretty > much impossible to plug all the holes in the expression-tree reading > functions, and keep them hole-free in the future. The expression tree format > is constantly in flux. A snapshot, however, is a fairly isolated small data > structure that rarely changes. I guess. It still seems far too much like exposing the server's guts for my taste. It might not be as bad as the expression tree stuff, but there's nothing particularly good about it either. >> I think it's a whole >> lot simpler and cleaner to keep the representation details private to >> the server. > > Well, then you need some sort of cross-backend communication, which is > always a bit clumsy. A temp file seems quite sufficient, and not at all difficult. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06.12.2010 15:53, Robert Haas wrote: > I guess. It still seems far too much like exposing the server's guts > for my taste. It might not be as bad as the expression tree stuff, > but there's nothing particularly good about it either. Note that we already have txid_current_snapshot() function, which exposes all that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 06.12.2010 15:53, Robert Haas wrote: >> >> I guess. It still seems far too much like exposing the server's guts >> for my taste. It might not be as bad as the expression tree stuff, >> but there's nothing particularly good about it either. > > Note that we already have txid_current_snapshot() function, which exposes > all that. Fair enough, and I think that's actually useful for Slony &c. But I don't think we should shy away of providing a cleaner API here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/06/2010 10:22 AM, Robert Haas wrote: > On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 06.12.2010 15:53, Robert Haas wrote: >>> I guess. It still seems far too much like exposing the server's guts >>> for my taste. It might not be as bad as the expression tree stuff, >>> but there's nothing particularly good about it either. >> Note that we already have txid_current_snapshot() function, which exposes >> all that. > Fair enough, and I think that's actually useful for Slony&c. But I > don't think we should shy away of providing a cleaner API here. > Just don't let the perfect get in the way of the good :P cheers andrew
On Mon, Dec 6, 2010 at 10:35 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > On 12/06/2010 10:22 AM, Robert Haas wrote: >> >> On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> >>> On 06.12.2010 15:53, Robert Haas wrote: >>>> >>>> I guess. It still seems far too much like exposing the server's guts >>>> for my taste. It might not be as bad as the expression tree stuff, >>>> but there's nothing particularly good about it either. >>> >>> Note that we already have txid_current_snapshot() function, which exposes >>> all that. >> >> Fair enough, and I think that's actually useful for Slony&c. But I >> don't think we should shy away of providing a cleaner API here. >> > > Just don't let the perfect get in the way of the good :P I'll keep that in mind. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Well, then you need some sort of cross-backend communication, which is >> always a bit clumsy. > A temp file seems quite sufficient, and not at all difficult. "Not at all difficult" is nonsense. To do that, you need to invent some mechanism for sender and receivers to identify which temp file they want to use, and you need to think of some way to clean up the files when the client forgets to tell you to do so. That's going to be at least as ugly as anything else. And I think it's unproven that this approach would be security-hole-free either. For instance, what about some other session overwriting pg_dump's snapshot temp file? regards, tom lane
On Mon, Dec 6, 2010 at 10:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> Well, then you need some sort of cross-backend communication, which is >>> always a bit clumsy. > >> A temp file seems quite sufficient, and not at all difficult. > > "Not at all difficult" is nonsense. To do that, you need to invent some > mechanism for sender and receivers to identify which temp file they want > to use, Why is this even remotely hard? That's the whole point of having the "publish" operation return a token. The token either is, or uniquely identifies, the file name. > and you need to think of some way to clean up the files when the > client forgets to tell you to do so. That's going to be at least as > ugly as anything else. Backends don't forget to call their end-of-transaction hooks, do they?They might crash, but we already have code to removetemp files on server restart. At most it would need minor adjustment. > And I think it's unproven that this approach > would be security-hole-free either. For instance, what about some other > session overwriting pg_dump's snapshot temp file? Why would this be any different from any other temp file? We surely must have a mechanism in place to ensure that the temporary files used by sorts or hash joins don't get overwritten by some other session, or the system would be totally unstable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/06/2010 10:40 AM, Tom Lane wrote: > Robert Haas<robertmhaas@gmail.com> writes: >> On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> Well, then you need some sort of cross-backend communication, which is >>> always a bit clumsy. >> A temp file seems quite sufficient, and not at all difficult. > "Not at all difficult" is nonsense. To do that, you need to invent some > mechanism for sender and receivers to identify which temp file they want > to use, and you need to think of some way to clean up the files when the > client forgets to tell you to do so. That's going to be at least as > ugly as anything else. And I think it's unproven that this approach > would be security-hole-free either. For instance, what about some other > session overwriting pg_dump's snapshot temp file? > > Yeah. I'm still not convinced that using shared memory is a bad way to pass these around. Surely we're not talking about large numbers of them. What am I missing here? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Yeah. I'm still not convinced that using shared memory is a bad way to > pass these around. Surely we're not talking about large numbers of them. > What am I missing here? They're not of a very predictable size. Robert's idea of publish() returning a temp file identifier, which then gets removed at transaction end, might work all right. regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes: > Why not just say give me the snapshot currently held by process nnnn? There's not a unique snapshot held by a particular process. Also, we don't want to expend the overhead to fully publish every snapshot. I think it's really necessary that the "sending" process take some deliberate action to publish a snapshot. > And please, not temp files if possible. Barring the cleanup issue, I don't see why not. This is a relatively low-usage feature, I think, so I wouldn't be much in favor of dedicating shmem to it even if the space requirement were predictable. regards, tom lane
On 12/06/2010 12:28 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> Yeah. I'm still not convinced that using shared memory is a bad way to >> pass these around. Surely we're not talking about large numbers of them. >> What am I missing here? > They're not of a very predictable size. > > Ah. Ok. cheers andrew
Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm still not convinced that using shared memory is a bad way to >> pass these around. Surely we're not talking about large numbers >> of them. What am I missing here? > > They're not of a very predictable size. Surely you can predict that any snapshot is no larger than a fairly small fixed portion plus sizeof(TransactionId) * MaxBackends? So, for example, if you're configured for 100 connections, you'd be limited to something under 1kB, maximum? -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm still not convinced that using shared memory is a bad way to >>> pass these around. Surely we're not talking about large numbers >>> of them. What am I missing here? >> >> They're not of a very predictable size. > Surely you can predict that any snapshot is no larger than a fairly > small fixed portion plus sizeof(TransactionId) * MaxBackends? No. See subtransactions. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Surely you can predict that any snapshot is no larger than a fairly >> small fixed portion plus sizeof(TransactionId) * MaxBackends? > > No. See subtransactions. Subtransactions are included in snapshots? -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> No. See subtransactions. > Subtransactions are included in snapshots? Sure, see GetSnapshotData(). You could avoid it by setting suboverflowed, but that comes at a nontrivial performance cost. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> No. See subtransactions. > >> Subtransactions are included in snapshots? > > Sure, see GetSnapshotData(). You could avoid it by setting > suboverflowed, but that comes at a nontrivial performance cost. Yeah, sorry for blurting like that before I checked. I was somewhat panicked that I'd missed something important for SSI, because my XidIsConcurrent check just uses xmin, xmax, and xip; I was afraid what I have would fall down in the face of subtransactions. But on review I found that I'd thought that through and (discussion in in the archives) I always wanted to associate the locks and conflicts with the top level transaction; so that was already identified before checking for overlap, and it was therefore more efficient to just check that. Sorry for the "senior moment". :-/ Perhaps a line or two of comments about that in the SSI patch would be a good idea. And maybe some tests involving subtransactions.... -Kevin
On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > IIRC, in old discussions of this problem we first considered allowing > clients to pull down an explicit representation of their snapshot (which > actually is an existing feature now, txid_current_snapshot()) and then > upload that again to become the active snapshot in another connection. Could a hot standby use such a snapshot representation? I.e. same snapshot on the master and the standby? Greetings Marcin Mańk
On 06.12.2010 21:48, marcin mank wrote: > On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> IIRC, in old discussions of this problem we first considered allowing >> clients to pull down an explicit representation of their snapshot (which >> actually is an existing feature now, txid_current_snapshot()) and then >> upload that again to become the active snapshot in another connection. > > Could a hot standby use such a snapshot representation? I.e. same > snapshot on the master and the standby? Hmm, I suppose it could. That's an interesting idea, you could run parallel pg_dump or something else against master and/or multiple hot standby servers, all working on the same snapshot. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
marcin mank <marcin.mank@gmail.com> writes: > On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> IIRC, in old discussions of this problem we first considered allowing >> clients to pull down an explicit representation of their snapshot (which >> actually is an existing feature now, txid_current_snapshot()) and then >> upload that again to become the active snapshot in another connection. > Could a hot standby use such a snapshot representation? I.e. same > snapshot on the master and the standby? Hm, that's a good question. It seems like it's at least possibly workable, but I'm not sure if there are any showstoppers. The other proposal of publish-a-snapshot would presumably NOT support this, since we'd not want to ship the snapshot temp files down the WAL stream. However, if you were doing something like parallel pg_dump you could just run the parent and child instances all against the slave, so the pg_dump scenario doesn't seem to offer much of a supporting use-case for worrying about this. When would you really need to be able to do it? regards, tom lane
> However, if you were doing something like parallel pg_dump you could > just run the parent and child instances all against the slave, so the > pg_dump scenario doesn't seem to offer much of a supporting use-case for > worrying about this. When would you really need to be able to do it? If you had several standbys, you could distribute the work of the pg_dump among them. This would be a huge speedup for a large database, potentially, thanks to parallelization of I/O and network. Imagine doing a pg_dump of a 300GB database in 10min. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> However, if you were doing something like parallel pg_dump you could >> just run the parent and child instances all against the slave, so the >> pg_dump scenario doesn't seem to offer much of a supporting use-case for >> worrying about this. When would you really need to be able to do it? > If you had several standbys, you could distribute the work of the > pg_dump among them. This would be a huge speedup for a large database, > potentially, thanks to parallelization of I/O and network. Imagine > doing a pg_dump of a 300GB database in 10min. That does sound kind of attractive. But to do that I think we'd have to go with the pass-the-snapshot-through-the-client approach. Shipping internal snapshot files through the WAL stream doesn't seem attractive to me. While I see Robert's point about preferring not to expose the snapshot contents to clients, I don't think it outweighs all other considerations here; and every other one is pointing to doing it the other way. regards, tom lane
We may need other means to ensure that the snapshot is available on the slave. It could be a bit too early to use the snapshot on the slave depending upon the delay of WAL replay. ---------- Koichi Suzuki 2010/12/7 Tom Lane <tgl@sss.pgh.pa.us>: > marcin mank <marcin.mank@gmail.com> writes: >> On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> IIRC, in old discussions of this problem we first considered allowing >>> clients to pull down an explicit representation of their snapshot (which >>> actually is an existing feature now, txid_current_snapshot()) and then >>> upload that again to become the active snapshot in another connection. > >> Could a hot standby use such a snapshot representation? I.e. same >> snapshot on the master and the standby? > > Hm, that's a good question. It seems like it's at least possibly > workable, but I'm not sure if there are any showstoppers. The other > proposal of publish-a-snapshot would presumably NOT support this, since > we'd not want to ship the snapshot temp files down the WAL stream. > > However, if you were doing something like parallel pg_dump you could > just run the parent and child instances all against the slave, so the > pg_dump scenario doesn't seem to offer much of a supporting use-case for > worrying about this. When would you really need to be able to do it? > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On 12/07/2010 01:22 AM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >>> However, if you were doing something like parallel pg_dump you could >>> just run the parent and child instances all against the slave, so the >>> pg_dump scenario doesn't seem to offer much of a supporting use-case for >>> worrying about this. When would you really need to be able to do it? > >> If you had several standbys, you could distribute the work of the >> pg_dump among them. This would be a huge speedup for a large database, >> potentially, thanks to parallelization of I/O and network. Imagine >> doing a pg_dump of a 300GB database in 10min. > > That does sound kind of attractive. But to do that I think we'd have to > go with the pass-the-snapshot-through-the-client approach. Shipping > internal snapshot files through the WAL stream doesn't seem attractive > to me. this kind of functionality would also be very useful/interesting for connection poolers/loadbalancers that are trying to distribute load across multiple hosts and could use that to at least give some sort of consistency guarantee. Stefan
> On 12/07/2010 01:22 AM, Tom Lane wrote: >> Josh Berkus <josh@agliodbs.com> writes: >>>> However, if you were doing something like parallel pg_dump you could >>>> just run the parent and child instances all against the slave, so the >>>> pg_dump scenario doesn't seem to offer much of a supporting use-case for >>>> worrying about this. When would you really need to be able to do it? >> >>> If you had several standbys, you could distribute the work of the >>> pg_dump among them. This would be a huge speedup for a large database, >>> potentially, thanks to parallelization of I/O and network. Imagine >>> doing a pg_dump of a 300GB database in 10min. >> >> That does sound kind of attractive. But to do that I think we'd have to >> go with the pass-the-snapshot-through-the-client approach. Shipping >> internal snapshot files through the WAL stream doesn't seem attractive >> to me. > > this kind of functionality would also be very useful/interesting for > connection poolers/loadbalancers that are trying to distribute load > across multiple hosts and could use that to at least give some sort of > consistency guarantee. In addition to this, that will greatly help query based replication tools such as pgpool-II. Sounds great. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
This is what Postgres-XC is doing between a coordinator and a datanode. Coordinator may correspond to poolers/loadbalancers. Does anyone think it makes sense to extract XC implementation of snapshot shipping to PostgreSQL itself? Cheers; ---------- Koichi Suzuki 2010/12/7 Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>: > On 12/07/2010 01:22 AM, Tom Lane wrote: >> Josh Berkus <josh@agliodbs.com> writes: >>>> However, if you were doing something like parallel pg_dump you could >>>> just run the parent and child instances all against the slave, so the >>>> pg_dump scenario doesn't seem to offer much of a supporting use-case for >>>> worrying about this. When would you really need to be able to do it? >> >>> If you had several standbys, you could distribute the work of the >>> pg_dump among them. This would be a huge speedup for a large database, >>> potentially, thanks to parallelization of I/O and network. Imagine >>> doing a pg_dump of a 300GB database in 10min. >> >> That does sound kind of attractive. But to do that I think we'd have to >> go with the pass-the-snapshot-through-the-client approach. Shipping >> internal snapshot files through the WAL stream doesn't seem attractive >> to me. > > this kind of functionality would also be very useful/interesting for > connection poolers/loadbalancers that are trying to distribute load > across multiple hosts and could use that to at least give some sort of > consistency guarantee. > > > > Stefan > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On 12/07/2010 09:23 AM, Koichi Suzuki wrote: > This is what Postgres-XC is doing between a coordinator and a > datanode. Coordinator may correspond to poolers/loadbalancers. > Does anyone think it makes sense to extract XC implementation of > snapshot shipping to PostgreSQL itself? well if there is a preeceeding implementation of that it would certainly be of interest to see that - but before you go and extract the code maybe you could tell us how exactly it works? Stefan
On Tue, Dec 7, 2010 at 3:23 AM, Koichi Suzuki <koichi.szk@gmail.com> wrote: > This is what Postgres-XC is doing between a coordinator and a > datanode. Coordinator may correspond to poolers/loadbalancers. > Does anyone think it makes sense to extract XC implementation of > snapshot shipping to PostgreSQL itself? Perhaps, though of course it would need to be re-licensed. I'd be happy to see us pursue a snapshot cloning framework, wherever it comes from. I remain unconvinced that it should be made a hard requirement for parallel pg_dump, but of course if we can get it implemented then the point becomes moot. Let's not let this fall on the floor. Someone should pursue this, whether it's Joachim or Koichi or someone else. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert; Thank you very much for your advice. Indeed, I'm considering to change the license to PostgreSQL's one. It may take a bit more though... ---------- Koichi Suzuki 2010/12/15 Robert Haas <robertmhaas@gmail.com>: > On Tue, Dec 7, 2010 at 3:23 AM, Koichi Suzuki <koichi.szk@gmail.com> wrote: >> This is what Postgres-XC is doing between a coordinator and a >> datanode. Coordinator may correspond to poolers/loadbalancers. >> Does anyone think it makes sense to extract XC implementation of >> snapshot shipping to PostgreSQL itself? > > Perhaps, though of course it would need to be re-licensed. I'd be > happy to see us pursue a snapshot cloning framework, wherever it comes > from. I remain unconvinced that it should be made a hard requirement > for parallel pg_dump, but of course if we can get it implemented then > the point becomes moot. > > Let's not let this fall on the floor. Someone should pursue this, > whether it's Joachim or Koichi or someone else. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
On Tue, Dec 14, 2010 at 7:06 PM, Koichi Suzuki <koichi.szk@gmail.com> wrote: > Thank you very much for your advice. Indeed, I'm considering to > change the license to PostgreSQL's one. It may take a bit more > though... You wouldn't necessarily need to relicense all of Postgres-XC (although that would be cool, too, at least IMO), just the portion you were proposing for commit to PostgreSQL. Or it doesn't sound like it would be infeasible for someone to code this up from scratch. But we should try to make something good happen here! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > I actually think that the phrase "this has been discussed before and > rejected" should be permanently removed from our list of excuses for > rejecting a patch. Or if we must use that excuse, then I think a link > to the relevant discussion is a must, and the relevant discussion had > better reflect the fact that $TOPIC was in fact rejected. It seems to > me that in at least 50% of cases, someone comes back and says one of > the following things: > > 1. I searched the archives and could find no discussion along those lines. > 2. I read that discussion and it doesn't appear to me that it reflects > a rejection of this idea. Instead what people seemed to be saying was > X. > 3. At the time that might have been true, but what has changed in the > meanwhile is X. Agreed. Perhaps we need an anti-TODO that lists things we don't want in more detail. The TODO has that for a few items, but scaling things up there will be cumbersome. I agree that having the person saying it was rejected find the email discussion is ideal --- if they can't find it, odds are the patch person will not be able to find it either. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
> anwhile is X. > > Agreed. Perhaps we need an anti-TODO that lists things we don't want in > more detail. The TODO has that for a few items, but scaling things up > there will be cumbersome. > Well there is a problem with this too. A good example is hints. A lot of the community wants hints. A lot of the community doesn't. The community changes as we get more mature and more hackers. It isn't hard to point to dozens of items we have now that would have been on that list 5 years ago. > I agree that having the person saying it was rejected find the email > discussion is ideal --- if they can't find it, odds are the patch person > will not be able to find it either. I would have to agree here. The idea that we have to search email is bad enough (issue/bug/feature tracker anyone?) but to have someone say, search the archives? That is just plain rude and anti-community. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > I would have to agree here. The idea that we have to search email is bad > enough (issue/bug/feature tracker anyone?) but to have someone say, > search the archives? That is just plain rude and anti-community. Saying "search the bugtracker" is no less rude than "search the archives"... And most of the bugtrackers I've had to search have way *less* ease-of-use for searching than a good mailing list archive (I tend to keep going back to gmane's search) a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On 12/24/2010 06:26 PM, Aidan Van Dyk wrote: > On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake<jd@commandprompt.com> wrote: > >> I would have to agree here. The idea that we have to search email is bad >> enough (issue/bug/feature tracker anyone?) but to have someone say, >> search the archives? That is just plain rude and anti-community. > Saying "search the bugtracker" is no less rude than "search the archives"... > > And most of the bugtrackers I've had to search have way *less* > ease-of-use for searching than a good mailing list archive (I tend to > keep going back to gmane's search) > > It's deja vu all over again. See mailing list archives for details. cheers andrew
On Fri, 2010-12-24 at 18:26 -0500, Aidan Van Dyk wrote: > On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > > > I would have to agree here. The idea that we have to search email is bad > > enough (issue/bug/feature tracker anyone?) but to have someone say, > > search the archives? That is just plain rude and anti-community. > > Saying "search the bugtracker" is no less rude than "search the archives"... > > And most of the bugtrackers I've had to search have way *less* > ease-of-use for searching than a good mailing list archive (I tend to > keep going back to gmane's search) I think you kind of missed my point. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Dec 24, 2010, at 10:52 AM, Bruce Momjian <bruce@momjian.us> wrote: > Agreed. Perhaps we need an anti-TODO that lists things we don't want in > more detail. The TODO has that for a few items, but scaling things up > there will be cumbersome. I don't really think that'd be much better. What might be of some value is summaries of previous discussions, *with citations*. Foo seems like it would be useful [1,2,3] but there are concerns about bar [4,5] and baz[6]. ...Robert
On Fri, Dec 24, 2010 at 06:37:26PM -0500, Andrew Dunstan wrote: > On 12/24/2010 06:26 PM, Aidan Van Dyk wrote: > >On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake<jd@commandprompt.com> wrote: > > > >>I would have to agree here. The idea that we have to search email > >>is bad enough (issue/bug/feature tracker anyone?) but to have > >>someone say, search the archives? That is just plain rude and > >>anti-community. > >Saying "search the bugtracker" is no less rude than "search the > >archives"... > > > >And most of the bugtrackers I've had to search have way *less* > >ease-of-use for searching than a good mailing list archive (I tend > >to keep going back to gmane's search) > > It's deja vu all over again. See mailing list archives for details. LOL! Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, Dec 6, 2010 at 7:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
How about the publishing transaction puts the snapshot in a (new) system table and passes a UUID to its children, and the joining transactions looks for that UUID in the system table using dirty snapshot (SnapshotAny) using a security-definer function owned by superuser.
No shared memory used, and if WAL-logged, the snapshot would get to the slaves too.
I realize SnapshotAny wouldn't be sufficient since we want the tuple to become invisible when the publishing transaction ends (commit/rollback), hence something akin to (new) HeapTupleSatisfiesStillRunning() would be needed.
Regards,
-- Josh Berkus <josh@agliodbs.com> writes:That does sound kind of attractive. But to do that I think we'd have to
>> However, if you were doing something like parallel pg_dump you could
>> just run the parent and child instances all against the slave, so the
>> pg_dump scenario doesn't seem to offer much of a supporting use-case for
>> worrying about this. When would you really need to be able to do it?
> If you had several standbys, you could distribute the work of the
> pg_dump among them. This would be a huge speedup for a large database,
> potentially, thanks to parallelization of I/O and network. Imagine
> doing a pg_dump of a 300GB database in 10min.
go with the pass-the-snapshot-through-the-client approach. Shipping
internal snapshot files through the WAL stream doesn't seem attractive
to me.
While I see Robert's point about preferring not to expose the snapshot
contents to clients, I don't think it outweighs all other considerations
here; and every other one is pointing to doing it the other way.
How about the publishing transaction puts the snapshot in a (new) system table and passes a UUID to its children, and the joining transactions looks for that UUID in the system table using dirty snapshot (SnapshotAny) using a security-definer function owned by superuser.
No shared memory used, and if WAL-logged, the snapshot would get to the slaves too.
I realize SnapshotAny wouldn't be sufficient since we want the tuple to become invisible when the publishing transaction ends (commit/rollback), hence something akin to (new) HeapTupleSatisfiesStillRunning() would be needed.
Regards,
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com
singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet
Mail sent from my BlackLaptop device