Thread: Changing pg_dump default file format
Hello, For as long as I can remember I have heard complaints about the fact that pg_dump doesn't use a backup format that is supported by the pg_restore program. At every single talk that I give that talks about pg_dump, the people in the majority of the audience agrees that this is rather silly and a barrier to usability. The only counter argument I have heard is that it adds an extra step for developers to get the SQL that generates a database. That to me is a weak argument as pg_dump is a production/admin too, first, not a developer tool. I humbly request on behalf of those who manage production postgresql instances that we change the default backup file format from -Fp to -Fc. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On Thu, Nov 7, 2013 at 5:02 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > > Hello, > > For as long as I can remember I have heard complaints about the fact that > pg_dump doesn't use a backup format that is supported by the pg_restore > program. At every single talk that I give that talks about pg_dump, the > people in the majority of the audience agrees that this is rather silly and > a barrier to usability. > > The only counter argument I have heard is that it adds an extra step for > developers to get the SQL that generates a database. That to me is a weak > argument as pg_dump is a production/admin too, first, not a developer tool. > > I humbly request on behalf of those who manage production postgresql > instances that we change the default backup file format from -Fp to -Fc. The main reason for keeping plain I guess goes with the fact that we sent it to stdout by default, which really doesn't lend itself to the custom format. I'm a bit worried about it breaking peoples scripts as well, if they pipe the output. In particular when doing a partial dump of say a sigle table or such, and in my experience that's mostly done in pipe-mode, which would break. Though I have to say I like the idea in general :) Maybe we could consider changing the default format if -f is specified to write it directly to a file. OTOH, hat might just make it confusing.... -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On 11/07/2013 11:02 AM, Joshua D. Drake wrote: > > Hello, > > For as long as I can remember I have heard complaints about the fact > that pg_dump doesn't use a backup format that is supported by the > pg_restore program. At every single talk that I give that talks about > pg_dump, the people in the majority of the audience agrees that this > is rather silly and a barrier to usability. > > The only counter argument I have heard is that it adds an extra step > for developers to get the SQL that generates a database. That to me is > a weak argument as pg_dump is a production/admin too, first, not a > developer tool. > > I humbly request on behalf of those who manage production postgresql > instances that we change the default backup file format from -Fp to -Fc. At least it gives you a nice message now instead of the rather cryptic message it used to give: [andrew@emma inst.5715]$ bin/pg_dump -f foo.sql [andrew@emma inst.5715]$ bin/pg_restore foo.sql pg_restore: [archiver]input file appears to be a text format dump. Please use psql. cheers andrew
Joshua D. Drake <jd@commandprompt.com> wrote: > For as long as I can remember I have heard complaints about the fact > that pg_dump doesn't use a backup format that is supported by the > pg_restore program. At every single talk that I give that talks about > pg_dump, the people in the majority of the audience agrees that this is > rather silly and a barrier to usability. > > The only counter argument I have heard is that it adds an extra step for > developers to get the SQL that generates a database. That to me is a > weak argument as pg_dump is a production/admin too, first, not a > developer tool. > > I humbly request on behalf of those who manage production postgresql > instances that we change the default backup file format from -Fp to -Fc. -1 in the case that output goes to stdout. It might make sense for when -f is specified. As a DBA, 90% of my usage of pg_dump was piped to psql, with the rest split between data-only dumps for individual tables and schema-only dumps. I think I used -Fc about twice -- before I figured out how to do PITR-style backups. This, for example, is a valid and useful construct: pg_dump test | psql test2 ... or: pg_dump -s test | psql test3 A lot of scripts will be broken by changing the default format for piped output. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Magnus Hagander <magnus@hagander.net> writes: > On Thu, Nov 7, 2013 at 5:02 PM, Joshua D. Drake <jd@commandprompt.com> wrote: >> I humbly request on behalf of those who manage production postgresql >> instances that we change the default backup file format from -Fp to -Fc. > I'm a bit worried about it breaking peoples scripts as well, if they > pipe the output. Whether they pipe the output or not, this will break people's scripts. I agree it's a legacy behavior that we wouldn't choose again, but you could say that about an awful lot of things including much of the SQL standard. I think it's too late to consider this now. regards, tom lane
On 11/07/2013 08:17 AM, Andrew Dunstan wrote: > > At least it gives you a nice message now instead of the rather cryptic > message it used to give: > > [andrew@emma inst.5715]$ bin/pg_dump -f foo.sql > [andrew@emma inst.5715]$ bin/pg_restore foo.sql > pg_restore: [archiver] input file appears to be a text format dump. > Please use psql. > > cheers > True :) but still... > andrew > > -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On 11/07/2013 08:21 AM, Kevin Grittner wrote: >> I humbly request on behalf of those who manage production postgresql >> instances that we change the default backup file format from -Fp to -Fc. > > -1 in the case that output goes to stdout. It might make sense for > when -f is specified. As a DBA, 90% of my usage of pg_dump was > piped to psql, with the rest split between data-only dumps for > individual tables and schema-only dumps. I think I used -Fc about > twice -- before I figured out how to do PITR-style backups. > PITR is wonderful, I use it all the time. It has exactly zero to do with what we are talking about. We are talking about pg_dump and its usefulness. Are you suggesting that we tell people to not use pg_dump and instead use PITR? > This, for example, is a valid and useful construct: > > pg_dump test | psql test2 > > ... or: > > pg_dump -s test | psql test3 > pg_dump -s test -fp| psql test3 > A lot of scripts will be broken by changing the default format for > piped output. For the sake of sanity... really? You just proved my point. This isn't about developers. Developers now how to code. They can change a simple script. This is about production DBA/admins, the 98% of people using postgresql. Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On Thu, Nov 7, 2013 at 8:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Thu, Nov 7, 2013 at 5:02 PM, Joshua D. Drake <jd@commandprompt.com> wrote:>> I humbly request on behalf of those who manage production postgresql
>> instances that we change the default backup file format from -Fp to -Fc.> I'm a bit worried about it breaking peoples scripts as well, if theyWhether they pipe the output or not, this will break people's scripts.
> pipe the output.
I agree it's a legacy behavior that we wouldn't choose again, but you
could say that about an awful lot of things including much of the SQL
standard. I think it's too late to consider this now.
Just a thought: If it's the right thing to do, why not do it in small steps?
For example, couldn't we deprecate the behavior on 9.4, and switch over in
9.5? By deprecate I mean a warning message on STDERR making it clear
they're relying on behavior that is scheduled to change.
I don't think there's any precedence in postgres or it's tooling of this kind of
deprecation process, but it seems we need to define something like it.
Thoughts?
-H
On 11/07/2013 08:26 AM, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> On Thu, Nov 7, 2013 at 5:02 PM, Joshua D. Drake <jd@commandprompt.com> wrote: >>> I humbly request on behalf of those who manage production postgresql >>> instances that we change the default backup file format from -Fp to -Fc. > >> I'm a bit worried about it breaking peoples scripts as well, if they >> pipe the output. > > Whether they pipe the output or not, this will break people's scripts. > I agree it's a legacy behavior that we wouldn't choose again, but you > could say that about an awful lot of things including much of the SQL > standard. I think it's too late to consider this now. We have certainly done worse in the past. 8.3 comes to mind. I am trying to consider the future here, as we continue to grow. Why leave thorns for the sake of leaving thorns? Note: Perfectly happy to spit out a, "NOTICE: The default format of pg_dump will be changing in release X" for a release. Joshua D. Drake > > regards, tom lane > -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On Thu, Nov 7, 2013 at 5:53 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > > On 11/07/2013 08:21 AM, Kevin Grittner wrote: > >>> I humbly request on behalf of those who manage production postgresql >>> instances that we change the default backup file format from -Fp to -Fc. >> >> >> -1 in the case that output goes to stdout. It might make sense for >> when -f is specified. As a DBA, 90% of my usage of pg_dump was >> piped to psql, with the rest split between data-only dumps for >> individual tables and schema-only dumps. I think I used -Fc about >> twice -- before I figured out how to do PITR-style backups. >> > > PITR is wonderful, I use it all the time. It has exactly zero to do with > what we are talking about. We are talking about pg_dump and its usefulness. > Are you suggesting that we tell people to not use pg_dump and instead use > PITR? For backups, absolutely. That is, assuming you mean "pg_basebackup -x", and not full archive_command/full pitr setup. That's still a separate argument, of course. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On 11/07/2013 09:25 AM, Magnus Hagander wrote: >> PITR is wonderful, I use it all the time. It has exactly zero to do with >> what we are talking about. We are talking about pg_dump and its usefulness. >> Are you suggesting that we tell people to not use pg_dump and instead use >> PITR? > > For backups, absolutely. > > That is, assuming you mean "pg_basebackup -x", and not full > archive_command/full pitr setup. > > That's still a separate argument, of course. Please, for the love of God start another thread for that :P. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On 11/07/2013 08:26 AM, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> On Thu, Nov 7, 2013 at 5:02 PM, Joshua D. Drake <jd@commandprompt.com> wrote: >>> I humbly request on behalf of those who manage production postgresql >>> instances that we change the default backup file format from -Fp to -Fc. > >> I'm a bit worried about it breaking peoples scripts as well, if they >> pipe the output. > > Whether they pipe the output or not, this will break people's scripts. > I agree it's a legacy behavior that we wouldn't choose again, but you > could say that about an awful lot of things including much of the SQL > standard. I think it's too late to consider this now. I don't agree that it's too late, personally. However, I do agree that it would be problematic to change the default format since the default is to send to STDOUT. If we wanted to change the default format, we would need to make "filename" a required parameter. If we wanted to change the defaults, I think it would be easier to create a separate bin name (e.g. pg_backup) than to change the existing parameters for pg_dump. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/07/2013 10:00 AM, Josh Berkus wrote: > > On 11/07/2013 08:26 AM, Tom Lane wrote: >> Magnus Hagander <magnus@hagander.net> writes: >>> On Thu, Nov 7, 2013 at 5:02 PM, Joshua D. Drake <jd@commandprompt.com> wrote: >>>> I humbly request on behalf of those who manage production postgresql >>>> instances that we change the default backup file format from -Fp to -Fc. >> >>> I'm a bit worried about it breaking peoples scripts as well, if they >>> pipe the output. >> >> Whether they pipe the output or not, this will break people's scripts. >> I agree it's a legacy behavior that we wouldn't choose again, but you >> could say that about an awful lot of things including much of the SQL >> standard. I think it's too late to consider this now. > > I don't agree that it's too late, personally. However, I do agree that > it would be problematic to change the default format since the default > is to send to STDOUT. If we wanted to change the default format, we > would need to make "filename" a required parameter. > > If we wanted to change the defaults, I think it would be easier to > create a separate bin name (e.g. pg_backup) than to change the existing > parameters for pg_dump. I am not opposed to that. Allow pg_dump to be what it is, and create a pg_backup? JD > -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On Thu, Nov 7, 2013 at 7:08 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > > On 11/07/2013 10:00 AM, Josh Berkus wrote: >> >> >> On 11/07/2013 08:26 AM, Tom Lane wrote: >>> >>> Magnus Hagander <magnus@hagander.net> writes: >>>> >>>> On Thu, Nov 7, 2013 at 5:02 PM, Joshua D. Drake <jd@commandprompt.com> >>>> wrote: >>>>> >>>>> I humbly request on behalf of those who manage production postgresql >>>>> instances that we change the default backup file format from -Fp to >>>>> -Fc. >>> >>> >>>> I'm a bit worried about it breaking peoples scripts as well, if they >>>> pipe the output. >>> >>> >>> Whether they pipe the output or not, this will break people's scripts. >>> I agree it's a legacy behavior that we wouldn't choose again, but you >>> could say that about an awful lot of things including much of the SQL >>> standard. I think it's too late to consider this now. >> >> >> I don't agree that it's too late, personally. However, I do agree that >> it would be problematic to change the default format since the default >> is to send to STDOUT. If we wanted to change the default format, we >> would need to make "filename" a required parameter. >> >> If we wanted to change the defaults, I think it would be easier to >> create a separate bin name (e.g. pg_backup) than to change the existing >> parameters for pg_dump. > > > I am not opposed to that. Allow pg_dump to be what it is, and create a > pg_backup? If we want to go down that route, please let's pick a different name. pg_backup is going to cause a *lot* of confusion in relation to pg_basebackup for people. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On 2013-11-07 10:00:53 -0800, Josh Berkus wrote: > If we wanted to change the defaults, I think it would be easier to > create a separate bin name (e.g. pg_backup) than to change the existing > parameters for pg_dump. Provided somebody is actually willing to put in the not inconsiderable amount of work to make this smoother for users, something like pg_backup would be a great idea. It should have an interface that allows to dump global and local object and it should be able to dump multiple databases in one invocation. Explaining users they have to use pg_dumpall for global objects but that they also shouldn't dare to use it for anything else is, well, difficult. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Josh Berkus <josh@agliodbs.com> writes: > If we wanted to change the defaults, I think it would be easier to > create a separate bin name (e.g. pg_backup) than to change the existing > parameters for pg_dump. That might work. However, just to be contrary: what about the functionality of pg_dumpall? We've heard repeated complaints about pg_dump's lack of ability to dump multiple databases or global settings, and just institutionalizing the existing -Fc format as the default will make it harder not easier to fix that. I'd be happier about inventing a pg_backup if it were a replacement for pg_dumpall and not just a cosmetic wrapper. regards, tom lane
On Thu, Nov 7, 2013 at 8:53 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
PITR is wonderful, I use it all the time. It has exactly zero to do with what we are talking about. We are talking about pg_dump and its usefulness. Are you suggesting that we tell people to not use pg_dump and instead use PITR?
On 11/07/2013 08:21 AM, Kevin Grittner wrote:I humbly request on behalf of those who manage production postgresql
instances that we change the default backup file format from -Fp to -Fc.
-1 in the case that output goes to stdout. It might make sense for
when -f is specified. As a DBA, 90% of my usage of pg_dump was
piped to psql, with the rest split between data-only dumps for
individual tables and schema-only dumps. I think I used -Fc about
twice -- before I figured out how to do PITR-style backups.
That depends on what they are using it for. If they are using pg_dump to do backups that are better done with PITR, then they should use PITR. When I use pg_dump, it is because I want to see what it is that is getting dumped--if I didn't want it be human readable, I'd use a binary backup instead of a binary dump.
One of the things that first attracted me to PostgreSQL was that you use the tool provided to dump the data, and what you get by default is identifiably your data, rather than something that looks like it was designed to lock you in to some proprietary tool and never let you out again.
(Another thing that attracted me to it was that it didn't run around renaming things between every release.)
pg_dump -s test -fp| psql test3This, for example, is a valid and useful construct:
pg_dump test | psql test2
... or:
pg_dump -s test | psql test3For the sake of sanity... really? You just proved my point. This isn't about developers. Developers now how to code. They can change a simple script. This is about production DBA/admins, the 98% of people using postgresql.A lot of scripts will be broken by changing the default format for
piped output.
I don't think that adding a command line option constitutes coding. If they can't figure out how to do that for pg_dump, how are they ever going to use pg_restore in the first place?
Cheers,
Jeff
On Thu, Nov 07, 2013 at 01:45:20PM -0500, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > If we wanted to change the defaults, I think it would be easier to > > create a separate bin name (e.g. pg_backup) than to change the > > existing parameters for pg_dump. > > That might work. However, just to be contrary: what about the > functionality of pg_dumpall? We've heard repeated complaints about > pg_dump's lack of ability to dump multiple databases or global > settings, and just institutionalizing the existing -Fc format as the > default will make it harder not easier to fix that. > > I'd be happier about inventing a pg_backup if it were a replacement > for pg_dumpall and not just a cosmetic wrapper. That the two are still separate is idiotic. If it were me waving this wand, I'd see to it that -Fd (directory) rather than -Fc (custom), was the default for pg_backup, as it makes the choice to use -j (>1 core) trivial. I'd also make sure that the directory format could handle the entirety of the backed-up database in some comprehensible way. Maybe a globals/ and a databases/ directory up top... Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Joshua D. Drake <jd@commandprompt.com> wrote: > PITR is wonderful, I use it all the time. It has exactly zero to > do with what we are talking about. We are talking about pg_dump > and its usefulness. Are you suggesting that we tell people to not > use pg_dump and instead use PITR? Yes. (Well, actually, any of the WAL-based techniques.) pg_dump is not a good tool for primary backups, for many reasons. At least on databases over 50GB or so or which have more than a few data modifications per second. It is OK on a tiny 10GB database with daily (or less frequent) batch modifications, but how often do you see something like that? > This is about production DBA/admins, the 98% of people using > postgresql. That is who I am thinking of. A DBA team may have hundreds of databases to manage, each with many scripts which have been running nicely for years. A change like this is bound to break some of those crontab scripts they may not even remember they are running -- like ones which dump a key table to INSERT statements to feed into some other database product, which will now choke when it gets a custom format file instead of the text file it has been getting for years. Requiring the DBA team to track all these scripts down to add -Fp would be annoying, to put it mildly. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > If we wanted to change the defaults, I think it would be easier to > > create a separate bin name (e.g. pg_backup) than to change the existing > > parameters for pg_dump. > > That might work. However, just to be contrary: what about the > functionality of pg_dumpall? We've heard repeated complaints about > pg_dump's lack of ability to dump multiple databases or global settings, > and just institutionalizing the existing -Fc format as the default > will make it harder not easier to fix that. > > I'd be happier about inventing a pg_backup if it were a replacement for > pg_dumpall and not just a cosmetic wrapper. Maybe we could provide a master controller program (pg_backup or whatever name we agree on), which could receive commands much like pg_ctl. $ pg_backup --help pg_backup is a backup handler program for PostgreSQL. Usage: pg_backup backup [-p PATTERN] DATABASE [...] pg_backup restore [-f PATH] pg_backup backup_globals ... The "pattern" stuff would let you specify many databases in the command line and have it dump each DB on a separate file, and perhaps do strftime-style replacements, etc. Initially we could have this just call pg_dump underneath, but eventually it might grow superpowers of its own. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Am 07.11.2013 19:08, schrieb Joshua D. Drake:> > On 11/07/2013 10:00 AM, Josh Berkus wrote: >> If we wanted to change the defaults, I think it would be easier to >> create a separate bin name (e.g. pg_backup) than to change the existing >> parameters for pg_dump. > > I am not opposed to that. Allow pg_dump to be what it is, and create a > pg_backup? > > JD I would definitely agree to having "one" backup utility and making -Fc the default for SQL dumps. One could even argue if the functionality of pg_basebackup should be part of that too. But I would be fine with having two distinct utilities (one for file level backups and one for logical/SQL level backups), too. Btw, how hard would it be, to have pg_restore and now also pg_dump run with -j option do some ordering of work by size of e.g. the tables? E.g. if you run with -j4 it would make sense to start working on the largest tables (and it's indexes) first and continue by descending in t´size to keep all available "slots" filled as good as possible. Just at though. Jan
On 11/07/2013 11:01 AM, Alvaro Herrera wrote: > Maybe we could provide a master controller program (pg_backup or > whatever name we agree on), which could receive commands much like > pg_ctl. > > $ pg_backup --help > pg_backup is a backup handler program for PostgreSQL. > > Usage: > pg_backup backup [-p PATTERN] DATABASE [...] > pg_backup restore [-f PATH] > pg_backup backup_globals > ... > > The "pattern" stuff would let you specify many databases in the command > line and have it dump each DB on a separate file, and perhaps do > strftime-style replacements, etc. > > Initially we could have this just call pg_dump underneath, but > eventually it might grow superpowers of its own. > +1 -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On 11/7/13, 1:00 PM, Josh Berkus wrote: > If we wanted to change the defaults, I think it would be easier to > create a separate bin name (e.g. pg_backup) than to change the existing > parameters for pg_dump. Note the following code in pg_dump.c: /* Set default options based on progname */ if (strcmp(progname, "pg_backup") == 0) format = "c";
On 11/07/2013 10:54 AM, Kevin Grittner wrote: > That is who I am thinking of. A DBA team may have hundreds of > databases to manage, each with many scripts which have been running > nicely for years. A change like this is bound to break some of > those crontab scripts they may not even remember they are running > -- like ones which dump a key table to INSERT statements to feed > into some other database product, which will now choke when it gets > a custom format file instead of the text file it has been getting > for years. Requiring the DBA team to track all these scripts down > to add -Fp would be annoying, to put it mildly. Only because they don't take the time to properly document or put into some form of automation/tracking/configfile management mechanism. Don't get me wrong, I would be in the same boat but I don't think it is realistic to manage the communities expectations with the lackluster operations performance of fellow DBAs. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On 11/07/2013 02:27 PM, Jan Lentfer wrote: > > Btw, how hard would it be, to have pg_restore and now also pg_dump run > with -j option do some ordering of work by size of e.g. the tables? E.g. > if you run with -j4 it would make sense to start working on the largest > tables (and it's indexes) first and continue by descending in t´size to > keep all available "slots" filled as good as possible. Just at though. > Please start another thread on this - it's really completely separate from the issue at hand. Hijacking threads like this is not a good idea. cheers andrew
All, I'm amused at how this has become a "let's pile on everything which has ever been missing in pg_dump into one thread". Agree with Tom that if we're going to create a new program name, we should fix the pg_dumpall issue as well. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/07/2013 12:42 PM, Josh Berkus wrote: > > All, > > I'm amused at how this has become a "let's pile on everything which has > ever been missing in pg_dump into one thread". Well it has been broken longer than most of our utilities. Sorry... not broken but certainly not complete. It is to be expected. > > Agree with Tom that if we're going to create a new program name, we > should fix the pg_dumpall issue as well. > All I want to start is this simple fix. I don't know who is going to step up to work on pg_dump/all considering it is the bastard step child of the project. We only work on it when we absolutely have to (and CMD is just as guilty as the rest). JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On 11/07/2013 06:33 PM, Joshua D. Drake wrote: > > On 11/07/2013 12:42 PM, Josh Berkus wrote: >> >> All, >> >> I'm amused at how this has become a "let's pile on everything which has >> ever been missing in pg_dump into one thread". > > Well it has been broken longer than most of our utilities. Sorry... > not broken but certainly not complete. It is to be expected. > >> >> Agree with Tom that if we're going to create a new program name, we >> should fix the pg_dumpall issue as well. >> > > All I want to start is this simple fix. I don't know who is going to > step up to work on pg_dump/all considering it is the bastard step > child of the project. We only work on it when we absolutely have to > (and CMD is just as guilty as the rest). > > What you're asking for isn't a simple fix. Calling it one doesn't make it so. cheers andrew
On 11/08/2013 12:53 AM, Harold Giménez wrote: > > > Just a thought: If it's the right thing to do, why not do it in small > steps? > For example, couldn't we deprecate the behavior on 9.4, and switch over in > 9.5? By deprecate I mean a warning message on STDERR making it clear > they're relying on behavior that is scheduled to change. I used to like that idea too, but I've since learned the hard way that: (a) Lots of people only upgrade every two, three, or even more major versions. I'm dealing with clients on 8.3, and people still pop up on Stack Overflow with 8.1 sometimes! These people don't ever see the deprecated phase. (b) People routinely ignore cron job output. Even important job output. They won't see the messages, and won't act on them if they do until something actually breaks. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/08/2013 12:55 AM, Joshua D. Drake wrote: > > We have certainly done worse in the past. 8.3 comes to mind. I am trying > to consider the future here, as we continue to grow. Why leave thorns > for the sake of leaving thorns? 8.3 was _bad_. It's been a major upgrade barrier, and a huge pain. I was also very unhappy with the way that the bytea_output change worked out. I didn't have any idea it'd work out that way, but it caused a fair bit of data corruption for real-world users because some client drivers liked to interpret 'hex' format data as 'escape', then re-escape it when writing it back to the server. PHP's `pg' driver was notably at fault here - I'm not sure it supports 'hex' even now. It's mostly deprecated in favour of PDO, but it's still in heavy use even in new code (you know, PHP users). It also took quite some time for Ruby's "Pg" gem to stop mangling round-tripped bytea data. If we changed the default, it would be absolutely vital to modify the pg_restore output and psql's behaviour so that this didn't happen: psql:/home/craig/tmp/rt-db-2013-07-09.dump:1: ERROR: syntax error at or near "PGDMP" LINE 1: PGDMP^A^L^A^A^A^AREVOKE ALL ON SCHEMA londiste FROM dbadmin; ^ psql:/home/craig/tmp/rt-db-2013-07-09.dump:2: ERROR: schema "londiste" does not exist psql:/home/craig/tmp/rt-db-2013-07-09.dump:3: ERROR: schema "londiste" does not exist psql:/home/craig/tmp/rt-db-2013-07-09.dump:4: ERROR: syntax error at or near "^A" LINE 1: ^A^AREVOKE ALL ON SCHEMA pgq FROM dbadmin; ^ psql:/home/craig/tmp/rt-db-2013-07-09.dump:5: ERROR: schema "pgq" does not exist psql:/home/craig/tmp/rt-db-2013-07-09.dump:6: ERROR: schema "pgq" does not exist psql:/home/craig/tmp/rt-db-2013-07-09.dump:7: ERROR: syntax error at or near "^A" LINE 1: ^A^AREVOKE ALL ON SCHEMA pgq_ext FROM dbadmin; ^ psql:/home/craig/tmp/rt-db-2013-07-09.dump:8: ERROR: schema "pgq_ext" does not exist psql:/home/craig/tmp/rt-db-2013-07-09.dump:9: ERROR: schema "pgq_ext" does not exist psql:/home/craig/tmp/rt-db-2013-07-09.dump:10: ERROR: syntax error at or near "^A" LINE 1: ^A^AREVOKE ALL ON SCHEMA pgq_node FROM dbadmin; ^ psql:/home/craig/tmp/rt-db-2013-07-09.dump:11: ERROR: schema "pgq_node" does not exist psql:/home/craig/tmp/rt-db-2013-07-09.dump:12: ERROR: schema "pgq_node" does not exist psql:/home/craig/tmp/rt-db-2013-07-09.dump:13: ERROR: syntax error at or near "^A" LINE 1: ^A^AREVOKE ALL ON SCHEMA public FROM postgres; ^ -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/08/2013 07:33 AM, Joshua D. Drake wrote: >> > > All I want to start is this simple fix. I don't know who is going to > step up to work on pg_dump/all considering it is the bastard step child > of the project. We only work on it when we absolutely have to (and CMD > is just as guilty as the rest). The change its self is simple; the consequences aren't. We learned that with bytea_output. There are two-way compat issues to deal with: - old pg_dump with new pg_restore - new pg_dump with old pg_restore - old pg_dump with new psql - new pg_dump with old psql and all of pg_dump, pg_restore, and psql are relevant because of the way they handle errors and consume pg_dump output. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 7, 2013 at 7:01 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
(a) Lots of people only upgrade every two, three, or even more major
versions. I'm dealing with clients on 8.3, and people still pop up on
Stack Overflow with 8.1 sometimes! These people don't ever see the
deprecated phase.
Interesting that they'd never update their clients either. I guess if that's true
there's a mentality of if it ain't broke don't fix it going on here.
Would they read change logs before upgrading, or just cross their fingers?
(b) People routinely ignore cron job output. Even important job output.
They won't see the messages, and won't act on them if they do until
something actually breaks.
How common is this?
On 11/08/2013 11:41 AM, Harold Giménez wrote: > > > > On Thu, Nov 7, 2013 at 7:01 PM, Craig Ringer <craig@2ndquadrant.com > <mailto:craig@2ndquadrant.com>> wrote: > > > (a) Lots of people only upgrade every two, three, or even more major > versions. I'm dealing with clients on 8.3, and people still pop up on > Stack Overflow with 8.1 sometimes! These people don't ever see the > deprecated phase. > > > Interesting that they'd never update their clients either. I guess if > that's true > there's a mentality of if it ain't broke don't fix it going on here. I've seen all sorts of combinations of client and server versions in the wild. Ancient JDBC, ODBC, etc drivers are also common. > Would they read change logs before upgrading, or just cross their fingers? > > (b) People routinely ignore cron job output. Even important job output. > They won't see the messages, and won't act on them if they do until > something actually breaks. > > > How common is this? I couldn't possibly say, I can only go by what I see in communication with clients, in private mail, in the mailing lists, and on Stack Overflow. I do see a couple of different groups: * People who upgrade casually without looking at release notes etc, then wonder why everything just broke. Typically people running PostgreSQL in development environments. I'm not too fussed about this group, they do it to themselves. On the other hand they're a *big* group (think Ruby on Rails developers, etc) and the more of them who whine about how PostgreSQL is painful to upgrade and always breaks, the worse it is for general uptake of Pg. * Those who don't upgrade because they don't know or care to. That box has been sitting there doing its thing for ages, and until they hit some key limitation, trigger an old bug, or finally need to do something different they don't realise that life would've been easier if they hadn't stayed on 7.1. These folks seem to be the most likely ones to unthinkingly upgrade from 7.something-low or 8.x straight to 9.3 without reading the release notes then wonder why things don't work, especially since they'll tend to do it in a hurry as a knee-jerk to try to fix a problem. * People who want to upgrade but are choked by bureaucracy and change management processes that move on a tectonic time scale. They're still on 8.something-low because it's just too painful to change. They're the ones who'll ask you to backport synchronous replication into 9.0 because they're not allowed to upgrade to 9.1/9.2, despite the obvious insanity of running custom-patched and rebuilt software instead of a well-tested public release. When they upgrade they do it in giant leaps despite the pain involved, just because it's so hard to upgrade at all. They're likely to plan carefully and do it right. * People who'd love to upgrade, but have an old database running a 24x7 mission critical system with enough data that they just can't get a dump/reload window, and they're on something older than 8.4 so they can't pg_upgrade. When they do upgrade they tend to plan it in detail, test carefully first, etc, so again they're pretty OK. In general, people are busy and the database isn't all they care about. They probably read the release notes about as well as you read the release notes on a new distro upgrade or something else that you care moderately about. If you're doing a major upgrade from an old Pg to a very new one there's a lot to take in and a lot of rel notes to read. One of the things that might help here is if we had (on the wiki, maybe) a single document that kept a running list of compatibility affecting changes and upgrades that need special action, along with a recommended upgrade path. That way people wouldn't have to read 6 major versions of release notes, get half way, and give up. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
I don't want to hijack this thread any further, but Craig, thanks for your insight.
-Harold
On Thu, Nov 7, 2013 at 8:35 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 11/08/2013 11:41 AM, Harold Giménez wrote:
>
>
>
> On Thu, Nov 7, 2013 at 7:01 PM, Craig Ringer <craig@2ndquadrant.com> <mailto:craig@2ndquadrant.com>> wrote:I've seen all sorts of combinations of client and server versions in the
>
>
> (a) Lots of people only upgrade every two, three, or even more major
> versions. I'm dealing with clients on 8.3, and people still pop up on
> Stack Overflow with 8.1 sometimes! These people don't ever see the
> deprecated phase.
>
>
> Interesting that they'd never update their clients either. I guess if
> that's true
> there's a mentality of if it ain't broke don't fix it going on here.
wild. Ancient JDBC, ODBC, etc drivers are also common.I couldn't possibly say, I can only go by what I see in communication
> Would they read change logs before upgrading, or just cross their fingers?
>
> (b) People routinely ignore cron job output. Even important job output.
> They won't see the messages, and won't act on them if they do until
> something actually breaks.
>
>
> How common is this?
with clients, in private mail, in the mailing lists, and on Stack Overflow.
I do see a couple of different groups:
* People who upgrade casually without looking at release notes etc, then
wonder why everything just broke. Typically people running PostgreSQL in
development environments. I'm not too fussed about this group, they do
it to themselves. On the other hand they're a *big* group (think Ruby on
Rails developers, etc) and the more of them who whine about how
PostgreSQL is painful to upgrade and always breaks, the worse it is for
general uptake of Pg.
* Those who don't upgrade because they don't know or care to. That box
has been sitting there doing its thing for ages, and until they hit some
key limitation, trigger an old bug, or finally need to do something
different they don't realise that life would've been easier if they
hadn't stayed on 7.1. These folks seem to be the most likely ones to
unthinkingly upgrade from 7.something-low or 8.x straight to 9.3 without
reading the release notes then wonder why things don't work, especially
since they'll tend to do it in a hurry as a knee-jerk to try to fix a
problem.
* People who want to upgrade but are choked by bureaucracy and change
management processes that move on a tectonic time scale. They're still
on 8.something-low because it's just too painful to change. They're the
ones who'll ask you to backport synchronous replication into 9.0 because
they're not allowed to upgrade to 9.1/9.2, despite the obvious insanity
of running custom-patched and rebuilt software instead of a well-tested
public release. When they upgrade they do it in giant leaps despite the
pain involved, just because it's so hard to upgrade at all. They're
likely to plan carefully and do it right.
* People who'd love to upgrade, but have an old database running a 24x7
mission critical system with enough data that they just can't get a
dump/reload window, and they're on something older than 8.4 so they
can't pg_upgrade. When they do upgrade they tend to plan it in detail,
test carefully first, etc, so again they're pretty OK.
In general, people are busy and the database isn't all they care about.
They probably read the release notes about as well as you read the
release notes on a new distro upgrade or something else that you care
moderately about.
If you're doing a major upgrade from an old Pg to a very new one there's
a lot to take in and a lot of rel notes to read. One of the things that
might help here is if we had (on the wiki, maybe) a single document that
kept a running list of compatibility affecting changes and upgrades that
need special action, along with a recommended upgrade path. That way
people wouldn't have to read 6 major versions of release notes, get half
way, and give up.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 11/07/2013 08:35 PM, Craig Ringer wrote: > I've seen all sorts of combinations of client and server versions in the > wild. Ancient JDBC, ODBC, etc drivers are also common. There's a security compromise out there for the 8.1 JDBC driver which was discovered last year, and folks *keep* reporting to pgsql-security.Even though the 8.1 driver had been officially deprecatedfor 2 years before the security issue was discovered. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Nov 7, 2013 at 02:40:17PM -0500, Peter Eisentraut wrote: > On 11/7/13, 1:00 PM, Josh Berkus wrote: > > If we wanted to change the defaults, I think it would be easier to > > create a separate bin name (e.g. pg_backup) than to change the existing > > parameters for pg_dump. > > Note the following code in pg_dump.c: > > /* Set default options based on progname */ > if (strcmp(progname, "pg_backup") == 0) > format = "c"; Wow, when was that added? git blame says 2002: 9f0ae0c8 (Tom Lane 2002-05-10 22:36:27 +0000 387) /* Set default options based on progname */ 9f0ae0c8 (TomLane 2002-05-10 22:36:27 +0000 388) if (strcmp(progname, "pg_backup") == 0) 9f0ae0c8 (Tom Lane 2002-05-10 22:36:27 +0000 389) format = "c"; However, pggit log 9f0ae0c82060e3dcd1fa7ac8bbe35a3f9a44dbba does not show that line being added by the diff. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Mon, Nov 18, 2013 at 3:21 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Nov 7, 2013 at 02:40:17PM -0500, Peter Eisentraut wrote: >> On 11/7/13, 1:00 PM, Josh Berkus wrote: >> > If we wanted to change the defaults, I think it would be easier to >> > create a separate bin name (e.g. pg_backup) than to change the existing >> > parameters for pg_dump. >> >> Note the following code in pg_dump.c: >> >> /* Set default options based on progname */ >> if (strcmp(progname, "pg_backup") == 0) >> format = "c"; > > Wow, when was that added? git blame says 2002: > > 9f0ae0c8 (Tom Lane 2002-05-10 22:36:27 +0000 387) /* Set default options based on progname */ > 9f0ae0c8 (Tom Lane 2002-05-10 22:36:27 +0000 388) if (strcmp(progname, "pg_backup") == 0) > 9f0ae0c8 (Tom Lane 2002-05-10 22:36:27 +0000 389) format = "c"; > > However, pggit log 9f0ae0c82060e3dcd1fa7ac8bbe35a3f9a44dbba does not > show that line being added by the diff. I dunno what your pggit script does, but "git log" doesn't normally show the diff at all. "git show 9f0ae0c82060e3dcd1fa7ac8bbe35a3f9a44dbba", however, does show that line being added. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> However, pggit log 9f0ae0c82060e3dcd1fa7ac8bbe35a3f9a44dbba does not >> show that line being added by the diff. > I dunno what your pggit script does, but "git log" doesn't normally > show the diff at all. "git show > 9f0ae0c82060e3dcd1fa7ac8bbe35a3f9a44dbba", however, does show that > line being added. It also shows it being removed, ie this is just -u diff format being unhelpful, as it so often is. The actual source of the code seems to be commit e8f69be054e9343b3c41d7e77cc142913ee55439. regards, tom lane
On Tue, Nov 19, 2013 at 11:26:34AM -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > >> However, pggit log 9f0ae0c82060e3dcd1fa7ac8bbe35a3f9a44dbba does not > >> show that line being added by the diff. > > > I dunno what your pggit script does, but "git log" doesn't normally > > show the diff at all. "git show > > 9f0ae0c82060e3dcd1fa7ac8bbe35a3f9a44dbba", however, does show that > > line being added. > > It also shows it being removed, ie this is just -u diff format being > unhelpful, as it so often is. > > The actual source of the code seems to be commit > e8f69be054e9343b3c41d7e77cc142913ee55439. Thanks. Even the commit message doesn't mention this gem. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +