Thread: security labels on databases are bad for dump & restore
Hi, pg_dump dumps security labels on databases. Which makes sense. The problem is that they're dumped including the database name. Which means that if you dump a database and restore it into a differently named one you'll either get a failure because the database does not exist, or worse you'll update the label of the wrong database. So I think we need CURRENT_DATABASE (or similar) support for security labels on databases. I won't have time to do anything about this anytime soon, but I think we should fix that at some point. Shall I put this on the todo? Or do we want to create an 'open items' page that's not major version specific? Greetings, Andres Freund
On Fri, Jul 10, 2015 at 7:57 AM, Andres Freund <andres@anarazel.de> wrote: > pg_dump dumps security labels on databases. Which makes sense. The > problem is that they're dumped including the database name. > > Which means that if you dump a database and restore it into a > differently named one you'll either get a failure because the database > does not exist, or worse you'll update the label of the wrong database. > > So I think we need CURRENT_DATABASE (or similar) support for security > labels on databases. > > I won't have time to do anything about this anytime soon, but I think we > should fix that at some point. Shall I put this on the todo? Or do we > want to create an 'open items' page that's not major version specific? I think adding it to the TODO would be great. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
All, >> I won't have time to do anything about this anytime soon, but I think we >> should fix that at some point. Shall I put this on the todo? Or do we >> want to create an 'open items' page that's not major version specific? > > I think adding it to the TODO would be great. I'd be willing to look/dive into this one further. -Adam -- Adam Brightwell - adam.brightwell@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
I'm sort of new to this so maybe I'm missing something but since the sepgsql SELinux userspace object manager was never integrated into postgresql (AFAIK KaiGais branch was never merged into the mainline) who uses these labels? What use are they? Ted On Tue, Jul 14, 2015 at 12:09 PM, Adam Brightwell <adam.brightwell@crunchydatasolutions.com> wrote: > All, > >>> I won't have time to do anything about this anytime soon, but I think we >>> should fix that at some point. Shall I put this on the todo? Or do we >>> want to create an 'open items' page that's not major version specific? >> >> I think adding it to the TODO would be great. > > I'd be willing to look/dive into this one further. > > -Adam > > -- > Adam Brightwell - adam.brightwell@crunchydatasolutions.com > Database Engineer - www.crunchydatasolutions.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 14, 2015 at 1:22 PM, Ted Toth <txtoth@gmail.com> wrote: > I'm sort of new to this so maybe I'm missing something but since the > sepgsql SELinux userspace object manager was never integrated into > postgresql (AFAIK KaiGais branch was never merged into the mainline) > who uses these labels? What use are they? See contrib/sepgsql -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
That's exactly what I'm talking about like I said KaiGais branch was never merged into the mainline so I do not believe that it is used at all. Ted On Tue, Jul 14, 2015 at 12:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jul 14, 2015 at 1:22 PM, Ted Toth <txtoth@gmail.com> wrote: >> I'm sort of new to this so maybe I'm missing something but since the >> sepgsql SELinux userspace object manager was never integrated into >> postgresql (AFAIK KaiGais branch was never merged into the mainline) >> who uses these labels? What use are they? > > See contrib/sepgsql > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
2015-07-15 2:39 GMT+09:00 Ted Toth <txtoth@gmail.com>: > That's exactly what I'm talking about like I said KaiGais branch was > never merged into the mainline so I do not believe that it is used at > all. > It depends on the definition of "integrated". The PostgreSQL core offers an infrastructure for label based security mechanism, not only selinux. Also, one extension module that is usually distributed with PosgreSQL bridges the world of database and the world of selinux (even though all the features I initially designed are not yet implemented). I like to say it is integrated. > On Tue, Jul 14, 2015 at 12:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Jul 14, 2015 at 1:22 PM, Ted Toth <txtoth@gmail.com> wrote: >>> I'm sort of new to this so maybe I'm missing something but since the >>> sepgsql SELinux userspace object manager was never integrated into >>> postgresql (AFAIK KaiGais branch was never merged into the mainline) >>> who uses these labels? What use are they? >> >> See contrib/sepgsql >> >> -- >> Robert Haas >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company -- KaiGai Kohei <kaigai@kaigai.gr.jp>
So if I label a table with an SELinux context and the type of my client connection does not have policy to be able to access the table type will an AVC be generated and the access denied? Ted On Tue, Jul 14, 2015 at 12:53 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > 2015-07-15 2:39 GMT+09:00 Ted Toth <txtoth@gmail.com>: >> That's exactly what I'm talking about like I said KaiGais branch was >> never merged into the mainline so I do not believe that it is used at >> all. >> > It depends on the definition of "integrated". > The PostgreSQL core offers an infrastructure for label based security > mechanism, not only selinux. Also, one extension module that is > usually distributed with PosgreSQL bridges the world of database and > the world of selinux (even though all the features I initially designed > are not yet implemented). I like to say it is integrated. > >> On Tue, Jul 14, 2015 at 12:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Tue, Jul 14, 2015 at 1:22 PM, Ted Toth <txtoth@gmail.com> wrote: >>>> I'm sort of new to this so maybe I'm missing something but since the >>>> sepgsql SELinux userspace object manager was never integrated into >>>> postgresql (AFAIK KaiGais branch was never merged into the mainline) >>>> who uses these labels? What use are they? >>> >>> See contrib/sepgsql >>> >>> -- >>> Robert Haas >>> EnterpriseDB: http://www.enterprisedb.com >>> The Enterprise PostgreSQL Company > > > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp>
> So if I label a table with an SELinux context and the type of my > client connection does not have policy to be able to access the table > type will an AVC be generated and the access denied? > Of course, it depends on the policy of the system. If client connection come from none-SELinux system, use netlabelctl to configure default fallback security context. It gives getpeercon(3) the client label shall be applied when netlabel is not configured on the connection. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Ted Toth > Sent: Wednesday, July 15, 2015 2:59 AM > To: Kohei KaiGai > Cc: Robert Haas; Adam Brightwell; Andres Freund; pgsql-hackers@postgresql.org; > Alvaro Herrera > Subject: Re: [HACKERS] security labels on databases are bad for dump & restore > > So if I label a table with an SELinux context and the type of my > client connection does not have policy to be able to access the table > type will an AVC be generated and the access denied? > > Ted > > On Tue, Jul 14, 2015 at 12:53 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > > 2015-07-15 2:39 GMT+09:00 Ted Toth <txtoth@gmail.com>: > >> That's exactly what I'm talking about like I said KaiGais branch was > >> never merged into the mainline so I do not believe that it is used at > >> all. > >> > > It depends on the definition of "integrated". > > The PostgreSQL core offers an infrastructure for label based security > > mechanism, not only selinux. Also, one extension module that is > > usually distributed with PosgreSQL bridges the world of database and > > the world of selinux (even though all the features I initially designed > > are not yet implemented). I like to say it is integrated. > > > >> On Tue, Jul 14, 2015 at 12:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >>> On Tue, Jul 14, 2015 at 1:22 PM, Ted Toth <txtoth@gmail.com> wrote: > >>>> I'm sort of new to this so maybe I'm missing something but since the > >>>> sepgsql SELinux userspace object manager was never integrated into > >>>> postgresql (AFAIK KaiGais branch was never merged into the mainline) > >>>> who uses these labels? What use are they? > >>> > >>> See contrib/sepgsql > >>> > >>> -- > >>> Robert Haas > >>> EnterpriseDB: http://www.enterprisedb.com > >>> The Enterprise PostgreSQL Company > > > > > > > > -- > > KaiGai Kohei <kaigai@kaigai.gr.jp> > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
That doesn't answer my question. I'm talking about a client and server running on the same system with SELinux MLS policy so that getpeercon will return the context of the client process unless it has explicitly sets the socket create context . So again will postgresql if the sepgsql module is loaded call a function in sepgsql to compute the access vector for the source (getpeercon label) contexts access to the target context (tables context set by SECURITY LABEL) and fail the operation generating an AVC if access is denied because there is no policy? On Tue, Jul 14, 2015 at 8:35 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> So if I label a table with an SELinux context and the type of my >> client connection does not have policy to be able to access the table >> type will an AVC be generated and the access denied? >> > Of course, it depends on the policy of the system. > > If client connection come from none-SELinux system, use netlabelctl > to configure default fallback security context. It gives getpeercon(3) > the client label shall be applied when netlabel is not configured on > the connection. > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> > > >> -----Original Message----- >> From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Ted Toth >> Sent: Wednesday, July 15, 2015 2:59 AM >> To: Kohei KaiGai >> Cc: Robert Haas; Adam Brightwell; Andres Freund; pgsql-hackers@postgresql.org; >> Alvaro Herrera >> Subject: Re: [HACKERS] security labels on databases are bad for dump & restore >> >> So if I label a table with an SELinux context and the type of my >> client connection does not have policy to be able to access the table >> type will an AVC be generated and the access denied? >> >> Ted >> >> On Tue, Jul 14, 2015 at 12:53 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: >> > 2015-07-15 2:39 GMT+09:00 Ted Toth <txtoth@gmail.com>: >> >> That's exactly what I'm talking about like I said KaiGais branch was >> >> never merged into the mainline so I do not believe that it is used at >> >> all. >> >> >> > It depends on the definition of "integrated". >> > The PostgreSQL core offers an infrastructure for label based security >> > mechanism, not only selinux. Also, one extension module that is >> > usually distributed with PosgreSQL bridges the world of database and >> > the world of selinux (even though all the features I initially designed >> > are not yet implemented). I like to say it is integrated. >> > >> >> On Tue, Jul 14, 2015 at 12:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >>> On Tue, Jul 14, 2015 at 1:22 PM, Ted Toth <txtoth@gmail.com> wrote: >> >>>> I'm sort of new to this so maybe I'm missing something but since the >> >>>> sepgsql SELinux userspace object manager was never integrated into >> >>>> postgresql (AFAIK KaiGais branch was never merged into the mainline) >> >>>> who uses these labels? What use are they? >> >>> >> >>> See contrib/sepgsql >> >>> >> >>> -- >> >>> Robert Haas >> >>> EnterpriseDB: http://www.enterprisedb.com >> >>> The Enterprise PostgreSQL Company >> > >> > >> > >> > -- >> > KaiGai Kohei <kaigai@kaigai.gr.jp> >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers
> That doesn't answer my question. I'm talking about a client and server > running on the same system with SELinux MLS policy so that getpeercon > will return the context of the client process unless it has explicitly > sets the socket create context . So again will postgresql if the > sepgsql module is loaded call a function in sepgsql to compute the > access vector for the source (getpeercon label) contexts access to the > target context (tables context set by SECURITY LABEL) and fail the > operation generating an AVC if access is denied because there is no > policy? > Yes. You may see AVC denial/allowed message on PostgreSQL log, like: LOG: SELinux: allowed { create } scontext=unconfined_u:unconfined_r:unconfined_t:s0 tcontext=unconfined_u:object_r:sepgsql_table_t:s0tclass=db_table name="regtest_schema.regtest_table" scontext comes from getpeercon(3), tcontext comes from the configuration by SECURITY LABEL command. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015-07-14 13:09:26 -0400, Adam Brightwell wrote: > All, > > >> I won't have time to do anything about this anytime soon, but I think we > >> should fix that at some point. Shall I put this on the todo? Or do we > >> want to create an 'open items' page that's not major version specific? > > > > I think adding it to the TODO would be great. Done. It's rather telling that it took me a fair while to find a spot in the todo list where it fits... > I'd be willing to look/dive into this one further. Cool. One thing worth mentioning is that arguably the problem is caused by the fact that we're here emitting database level information in pg_dump, normally only done for dumpall. Greetings, Andres Freund
Andres Freund wrote: > One thing worth mentioning is that arguably the problem is caused by the > fact that we're here emitting database level information in pg_dump, > normally only done for dumpall. ... the reason for which is probably the lack of CURRENT_DATABASE as a database specifier. It might make sense to add the rest of database-level information to pg_dump output, when we get that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2015-07-15 12:04:40 +0300, Alvaro Herrera wrote: > Andres Freund wrote: > > One thing worth mentioning is that arguably the problem is caused by the > > fact that we're here emitting database level information in pg_dump, > > normally only done for dumpall. > > ... the reason for which is probably the lack of CURRENT_DATABASE as a > database specifier. It might make sense to add the rest of > database-level information to pg_dump output, when we get that. I'm not sure. I mean, it's not that an odd idea to assign a label to a database and then restore data into it, and expect the explicitly assigned label to survive. Not sure if there actually is a good behaviour either way here :/
On Wed, Jul 15, 2015 at 11:08:53AM +0200, Andres Freund wrote: > On 2015-07-15 12:04:40 +0300, Alvaro Herrera wrote: > > Andres Freund wrote: > > > One thing worth mentioning is that arguably the problem is caused by the > > > fact that we're here emitting database level information in pg_dump, > > > normally only done for dumpall. Consistency with existing practice would indeed have pg_dump ignore pg_shseclabel and have pg_dumpall reproduce its entries. > > ... the reason for which is probably the lack of CURRENT_DATABASE as a > > database specifier. It might make sense to add the rest of > > database-level information to pg_dump output, when we get that. > > I'm not sure. I mean, it's not that an odd idea to assign a label to a > database and then restore data into it, and expect the explicitly > assigned label to survive. Not sure if there actually is a good > behaviour either way here :/ In a greenfield, I would make "pg_dump --create" reproduce pertinent entries from datacl, pg_db_role_setting, pg_shseclabel and pg_shdescription. I would make non-creating pg_dump reproduce none of those. Moreover, I would enable --create by default. Restoring into a user-provided shell database is specialized compared to reproducing a database from scratch.
> Consistency with existing practice would indeed have pg_dump ignore > pg_shseclabel and have pg_dumpall reproduce its entries. I think that makes sense, but what about other DATABASE level info such as COMMENT? Should that also be ignored by pg_dump as well? I'm specifically thinking of the discussion from the following thread: http://www.postgresql.org/message-id/20150317172459.GM3636@alvh.no-ip.org If COMMENT is included then why not SECURITY LABEL or others? > In a greenfield, I would make "pg_dump --create" reproduce pertinent entries > from datacl, pg_db_role_setting, pg_shseclabel and pg_shdescription. I would > make non-creating pg_dump reproduce none of those. I think the bigger question is "Where is the line drawn between pg_dump and pg_dumpall?". At what point does one tool become the other? -Adam -- Adam Brightwell - adam.brightwell@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
On Mon, Jul 20, 2015 at 07:01:14PM -0400, Adam Brightwell wrote: > > Consistency with existing practice would indeed have pg_dump ignore > > pg_shseclabel and have pg_dumpall reproduce its entries. > > I think that makes sense, but what about other DATABASE level info > such as COMMENT? Should that also be ignored by pg_dump as well? I'm > specifically thinking of the discussion from the following thread: > > http://www.postgresql.org/message-id/20150317172459.GM3636@alvh.no-ip.org > > If COMMENT is included then why not SECURITY LABEL or others? In any given situation, we should indeed restore both pg_database comments and pg_database security labels, or we should restore neither. Restoring neither is most consistent with history, but several people like the idea of restoring both. I won't mind either conclusion. > > In a greenfield, I would make "pg_dump --create" reproduce pertinent entries > > from datacl, pg_db_role_setting, pg_shseclabel and pg_shdescription. I would > > make non-creating pg_dump reproduce none of those. > > I think the bigger question is "Where is the line drawn between > pg_dump and pg_dumpall?". At what point does one tool become the > other? That question may be too big for me.
Noah Misch wrote: > On Mon, Jul 20, 2015 at 07:01:14PM -0400, Adam Brightwell wrote: > > I think that makes sense, but what about other DATABASE level info > > such as COMMENT? Should that also be ignored by pg_dump as well? I'm > > specifically thinking of the discussion from the following thread: > > > > http://www.postgresql.org/message-id/20150317172459.GM3636@alvh.no-ip.org > > > > If COMMENT is included then why not SECURITY LABEL or others? > > In any given situation, we should indeed restore both pg_database comments and > pg_database security labels, or we should restore neither. Agreed. > > > In a greenfield, I would make "pg_dump --create" reproduce pertinent entries > > > from datacl, pg_db_role_setting, pg_shseclabel and pg_shdescription. I would > > > make non-creating pg_dump reproduce none of those. > > > > I think the bigger question is "Where is the line drawn between > > pg_dump and pg_dumpall?". At what point does one tool become the > > other? > > That question may be too big for me. I don't think there's any line near pg_dumpall. That tool seems to have grown out of desperation without much actual design. I think it makes more sense to plan around that's the best pg_dump behavior for the various use cases. I like Noah's proposal of having pg_dump --create reproduce all database-level state. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> I don't think there's any line near pg_dumpall. That tool seems to > have grown out of desperation without much actual design. I think it > makes more sense to plan around that's the best pg_dump behavior for the > various use cases. Ok. > I like Noah's proposal of having pg_dump --create reproduce all > database-level state. Should it be enabled by default? If so, then wouldn't it make more sense to call it --no-create and do the opposite? So, --no-create would exclude rather than include database-level information? Would enabling it by default cause issues with the current expected use of the tool by end users? How would this handle related global objects? It seems like this part could get a little tricky. Taking it one step further, would a --all option that dumps all databases make sense as well? Of course I know that's probably a considerable undertaking and certainly beyond the current scope. Though, I thought I'd throw it out there. Also, I think this would potentially conflict with what Fabrízio is doing with CURRENT_DATABASE on COMMENT, though, I think it might be a preferable solution. https://commitfest.postgresql.org/5/229/ -Adam -- Adam Brightwell - adam.brightwell@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
<div dir="ltr"><div class="gmail_extra">On Wed, Jul 22, 2015 at 4:42 PM, Adam Brightwell <<a href="mailto:adam.brightwell@crunchydatasolutions.com">adam.brightwell@crunchydatasolutions.com</a>>wrote:<br />> <br/>> [...]<br />><br />> Also, I think this would potentially conflict with what Fabrízio is<br />> doing withCURRENT_DATABASE on COMMENT, though, I think it might be a<br />> preferable solution.<br />><br />> <a href="https://commitfest.postgresql.org/5/229/">https://commitfest.postgresql.org/5/229/</a><br/>><br /><br /></div><divclass="gmail_extra">Unfortunately this code is a bit weird and will be better to move to the next commitfest(I have no time to improve it yet), so we can join efforts and implement all ideas and make the reviewers lifeeasier with a more consistency patch. <br /><br /></div><div class="gmail_extra">Seems reasonable?<br /></div><div class="gmail_extra"><br/></div><div class="gmail_extra">Regards,<br /></div><div class="gmail_extra"><br />--<br />Fabríziode Royes Mello<br />Consultoria/Coaching PostgreSQL<br />>> Timbira: <a href="http://www.timbira.com.br">http://www.timbira.com.br</a><br/>>> Blog: <a href="http://fabriziomello.github.io">http://fabriziomello.github.io</a><br/>>> Linkedin: <a href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/>>> Twitter: <a href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a><br/>>> Github: <a href="http://github.com/fabriziomello">http://github.com/fabriziomello</a></div></div>
On Wed, Jul 22, 2015 at 03:42:58PM -0400, Adam Brightwell wrote: > > I like Noah's proposal of having pg_dump --create reproduce all > > database-level state. > > Should it be enabled by default? If so, then wouldn't it make more > sense to call it --no-create and do the opposite? So, --no-create > would exclude rather than include database-level information? Would > enabling it by default cause issues with the current expected use of > the tool by end users? While I'd favor optional --no-create if we were designing fresh, it's not worth breaking user scripts by changing that now. > How would this handle related global objects? It seems like this part > could get a little tricky. Like roles and tablespaces? No need to change their treatment. > Taking it one step further, would a --all option that dumps all > databases make sense as well? Of course I know that's probably a > considerable undertaking and certainly beyond the current scope. I agree it's outside the scope of fixing $subject. Thanks, nm
> While I'd favor optional --no-create if we were designing fresh, it's not > worth breaking user scripts by changing that now. Agreed. So, --create would not be enabled by default. >> How would this handle related global objects? It seems like this part >> could get a little tricky. > > Like roles and tablespaces? No need to change their treatment. Yes, those. Ok. -Adam -- Adam Brightwell - adam.brightwell@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
On Wed, Jul 22, 2015 at 3:42 PM, Adam Brightwell <adam.brightwell@crunchydatasolutions.com> wrote: >> I don't think there's any line near pg_dumpall. That tool seems to >> have grown out of desperation without much actual design. I think it >> makes more sense to plan around that's the best pg_dump behavior for the >> various use cases. > > Ok. > >> I like Noah's proposal of having pg_dump --create reproduce all >> database-level state. > > Should it be enabled by default? If so, then wouldn't it make more > sense to call it --no-create and do the opposite? So, --no-create > would exclude rather than include database-level information? Would > enabling it by default cause issues with the current expected use of > the tool by end users? This seems a bit hairy to me. If we want to transfer responsibility for dumping this stuff from pg_dumpall to pg_dump, I have no problem with that at all. But doing it only when --create is specified seems odd. Then, does pg_dumpall -g dump it or not? If it does, then we're sorta dumping it in two places when --create is used. If it doesn't, then when --create is not used we're doing it nowhere. I may be confused. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jul 23, 2015 at 12:14:14PM -0400, Robert Haas wrote: > On Wed, Jul 22, 2015 at 3:42 PM, Adam Brightwell <adam.brightwell@crunchydatasolutions.com> wrote: > >> I like Noah's proposal of having pg_dump --create reproduce all > >> database-level state. > > > > Should it be enabled by default? If so, then wouldn't it make more > > sense to call it --no-create and do the opposite? So, --no-create > > would exclude rather than include database-level information? Would > > enabling it by default cause issues with the current expected use of > > the tool by end users? > > This seems a bit hairy to me. If we want to transfer responsibility > for dumping this stuff from pg_dumpall to pg_dump, I have no problem > with that at all. But doing it only when --create is specified seems > odd. Then, does pg_dumpall -g dump it or not? The principle I had in mind was to dump ACLs, pg_db_role_setting entries, comments and security labels if and only if we emit a CREATE statement for the object they modify. That is already the rule for objects located inside databases. Since "pg_dumpall -g" does not emit CREATE DATABASE statements[1], it would not dump these attributes of databases. > If it does, then we're > sorta dumping it in two places when --create is used. If it doesn't, > then when --create is not used we're doing it nowhere. Yep. Plain "pg_dump" dumps the contents of a database without dumping the database itself. I don't like that as a default, but we're stuck with it. [1] "pg_dumpall -g --binary-upgrade" _does_ emit CREATE DATABASE statements, so _it_ would dump these attributes.
On 20 July 2015 at 01:18, Noah Misch <noah@leadboat.com> wrote: > On Wed, Jul 15, 2015 at 11:08:53AM +0200, Andres Freund wrote: >> On 2015-07-15 12:04:40 +0300, Alvaro Herrera wrote: >> > Andres Freund wrote: >> > > One thing worth mentioning is that arguably the problem is caused by the >> > > fact that we're here emitting database level information in pg_dump, >> > > normally only done for dumpall. > > Consistency with existing practice would indeed have pg_dump ignore > pg_shseclabel and have pg_dumpall reproduce its entries. Existing practice is pretty broken though, and not necessarily a good guide. COMMENT ON DATABASE and SECURITY LABEL FOR DATABASE are dumped by pg_dump, but always refer to the database's name at the time it was dumped, so restoring it can break. GRANTs on databases are ignored and not dumped by pg_dump or by pg_dumpall --globals-only. The only way to dump them seems to be to use pg_dumpall, which nobody uses in the real world. I'd be strongly in favour of teaching GRANT, SECURITY LABEL, COMMENT ON DATABASE, etc to recognise CURRENT_DATABASE as a keyword. Then dumping them in pg_dump --create, and in pg_dump -Fc . In practice I see zero real use of pg_dumpall without --globals-only, and almost everyone does pg_dump -Fc . I'd like to see that method case actually preserve the whole state of the system and do the right thing sensibly. A pg_restore option to skip database-level settings could be useful, but I think by default they should be restored. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Jul 26, 2015 at 11:43 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 20 July 2015 at 01:18, Noah Misch <noah@leadboat.com> wrote: >> On Wed, Jul 15, 2015 at 11:08:53AM +0200, Andres Freund wrote: >>> On 2015-07-15 12:04:40 +0300, Alvaro Herrera wrote: >>> > Andres Freund wrote: >>> > > One thing worth mentioning is that arguably the problem is caused by the >>> > > fact that we're here emitting database level information in pg_dump, >>> > > normally only done for dumpall. >> >> Consistency with existing practice would indeed have pg_dump ignore >> pg_shseclabel and have pg_dumpall reproduce its entries. > > Existing practice is pretty broken though, and not necessarily a good guide. > > COMMENT ON DATABASE and SECURITY LABEL FOR DATABASE are dumped by > pg_dump, but always refer to the database's name at the time it was > dumped, so restoring it can break. > > GRANTs on databases are ignored and not dumped by pg_dump or by > pg_dumpall --globals-only. The only way to dump them seems to be to > use pg_dumpall, which nobody uses in the real world. > > I'd be strongly in favour of teaching GRANT, SECURITY LABEL, COMMENT > ON DATABASE, etc to recognise CURRENT_DATABASE as a keyword. Then > dumping them in pg_dump --create, and in pg_dump -Fc . > > In practice I see zero real use of pg_dumpall without --globals-only, > and almost everyone does pg_dump -Fc . I'd like to see that method > case actually preserve the whole state of the system and do the right > thing sensibly. > > A pg_restore option to skip database-level settings could be useful, > but I think by default they should be restored. Yes, I think we should make restoring the database's properties the job of pg_dump and remove it completely from pg_dumpall, unless we can find a case where that's really going to break things. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Sun, Jul 26, 2015 at 11:43 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > > On 20 July 2015 at 01:18, Noah Misch <noah@leadboat.com> wrote: > >> On Wed, Jul 15, 2015 at 11:08:53AM +0200, Andres Freund wrote: > >>> On 2015-07-15 12:04:40 +0300, Alvaro Herrera wrote: > >>> > Andres Freund wrote: > >>> > > One thing worth mentioning is that arguably the problem is caused by the > >>> > > fact that we're here emitting database level information in pg_dump, > >>> > > normally only done for dumpall. > >> > >> Consistency with existing practice would indeed have pg_dump ignore > >> pg_shseclabel and have pg_dumpall reproduce its entries. > > > > Existing practice is pretty broken though, and not necessarily a good guide. > > > > COMMENT ON DATABASE and SECURITY LABEL FOR DATABASE are dumped by > > pg_dump, but always refer to the database's name at the time it was > > dumped, so restoring it can break. > > > > GRANTs on databases are ignored and not dumped by pg_dump or by > > pg_dumpall --globals-only. The only way to dump them seems to be to > > use pg_dumpall, which nobody uses in the real world. > > > > I'd be strongly in favour of teaching GRANT, SECURITY LABEL, COMMENT > > ON DATABASE, etc to recognise CURRENT_DATABASE as a keyword. Then > > dumping them in pg_dump --create, and in pg_dump -Fc . > > > > In practice I see zero real use of pg_dumpall without --globals-only, > > and almost everyone does pg_dump -Fc . I'd like to see that method > > case actually preserve the whole state of the system and do the right > > thing sensibly. > > > > A pg_restore option to skip database-level settings could be useful, > > but I think by default they should be restored. > > Yes, I think we should make restoring the database's properties the > job of pg_dump and remove it completely from pg_dumpall, unless we can > find a case where that's really going to break things. I believe that means, as discussed, that we'll need to support "CURRENT_DATABASE" or similar for all database properties, but that seems like a wholly good thing to do anyway, provided we can do so without causing problems. In other words, I agree. Thanks! Stephen
On 2015-07-28 14:58:26 -0400, Robert Haas wrote: > Yes, I think we should make restoring the database's properties the > job of pg_dump and remove it completely from pg_dumpall, unless we can > find a case where that's really going to break things. CREATE DATABASE blarg; SECURITY LABEL ON blarg IS 'noaccess'; ALTER DATABASE blarg SET default_tablespace = space_with_storage; pg_restore -> SECURITY LABEL ON blarg IS 'allow_access'; -> ALTER DATABASE blarg SET default_tablespace = space_without_storage; That's probably not sufficient reasons not to go that way, but I do think there's a bunch more issues like that. At the very least all these need to be emitted as ALTER DATABASE current_database ... et al. Otherwise it's impossible to rename databases, which definitely would not be ok. Andres
On Tue, Jul 28, 2015 at 3:03 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-07-28 14:58:26 -0400, Robert Haas wrote: >> Yes, I think we should make restoring the database's properties the >> job of pg_dump and remove it completely from pg_dumpall, unless we can >> find a case where that's really going to break things. > > CREATE DATABASE blarg; > SECURITY LABEL ON blarg IS 'noaccess'; > ALTER DATABASE blarg SET default_tablespace = space_with_storage; > pg_restore > -> SECURITY LABEL ON blarg IS 'allow_access'; > -> ALTER DATABASE blarg SET default_tablespace = space_without_storage; > > That's probably not sufficient reasons not to go that way, but I do > think there's a bunch more issues like that. Could you use some complete sentences to describe what the actual issue is? I can't make heads or tails of what you wrote there. > At the very least all these need to be emitted as ALTER DATABASE > current_database ... et al. Otherwise it's impossible to rename > databases, which definitely would not be ok. Yep, I think that's the plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-07-28 15:05:01 -0400, Robert Haas wrote: > On Tue, Jul 28, 2015 at 3:03 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2015-07-28 14:58:26 -0400, Robert Haas wrote: > >> Yes, I think we should make restoring the database's properties the > >> job of pg_dump and remove it completely from pg_dumpall, unless we can > >> find a case where that's really going to break things. > > > > CREATE DATABASE blarg; > > SECURITY LABEL ON blarg IS 'noaccess'; > > ALTER DATABASE blarg SET default_tablespace = space_with_storage; > > pg_restore > > -> SECURITY LABEL ON blarg IS 'allow_access'; > > -> ALTER DATABASE blarg SET default_tablespace = space_without_storage; > > > > That's probably not sufficient reasons not to go that way, but I do > > think there's a bunch more issues like that. > > Could you use some complete sentences to describe what the actual > issue is? I can't make heads or tails of what you wrote there. DBA creates a database and sets some properties (security labels, gucs, acls) on it. Then goes on to restore a backup. Unfortunately that backup might, or might not, overwrite the properties he configured depending on whether the restored database already contains them and from which version the backup originates. Greetings, Andres Freund
On Tue, Jul 28, 2015 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-07-28 15:05:01 -0400, Robert Haas wrote: >> On Tue, Jul 28, 2015 at 3:03 PM, Andres Freund <andres@anarazel.de> wrote: >> > On 2015-07-28 14:58:26 -0400, Robert Haas wrote: >> >> Yes, I think we should make restoring the database's properties the >> >> job of pg_dump and remove it completely from pg_dumpall, unless we can >> >> find a case where that's really going to break things. >> > >> > CREATE DATABASE blarg; >> > SECURITY LABEL ON blarg IS 'noaccess'; >> > ALTER DATABASE blarg SET default_tablespace = space_with_storage; >> > pg_restore >> > -> SECURITY LABEL ON blarg IS 'allow_access'; >> > -> ALTER DATABASE blarg SET default_tablespace = space_without_storage; >> > >> > That's probably not sufficient reasons not to go that way, but I do >> > think there's a bunch more issues like that. >> >> Could you use some complete sentences to describe what the actual >> issue is? I can't make heads or tails of what you wrote there. > > DBA creates a database and sets some properties (security labels, gucs, > acls) on it. Then goes on to restore a backup. Unfortunately that backup > might, or might not, overwrite the properties he configured depending on > whether the restored database already contains them and from which > version the backup originates. Well, I think that's just a potential incompatibility between 9.6 and previous versions, and a relatively minor one at that. We can't and don't guarantee that a dump taken using the 9.3 version of pg_dump will restore correctly on any server version except 9.3. It might work OK on a newer or older version, but then again it might not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-07-28 15:14:11 -0400, Robert Haas wrote: > On Tue, Jul 28, 2015 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: > > DBA creates a database and sets some properties (security labels, gucs, > > acls) on it. Then goes on to restore a backup. Unfortunately that backup > > might, or might not, overwrite the properties he configured depending on > > whether the restored database already contains them and from which > > version the backup originates. > > Well, I think that's just a potential incompatibility between 9.6 and > previous versions, and a relatively minor one at that. We can't and > don't guarantee that a dump taken using the 9.3 version of pg_dump > will restore correctly on any server version except 9.3. It might > work OK on a newer or older version, but then again it might not. Even within a single major version it'll be a bit confusing that one time a restore yielded the desired result (previously set property survives) and the next restore it doesn't, because now the backup does contain the property.
On 07/28/2015 11:58 AM, Robert Haas wrote: > I'd be strongly in favour of teaching GRANT, SECURITY LABEL, COMMENT >> ON DATABASE, etc to recognise CURRENT_DATABASE as a keyword. Then >> dumping them in pg_dump --create, and in pg_dump -Fc . >> >> In practice I see zero real use of pg_dumpall without --globals-only, >> and almost everyone does pg_dump -Fc . I'd like to see that method >> case actually preserve the whole state of the system and do the right >> thing sensibly. >> >> A pg_restore option to skip database-level settings could be useful, >> but I think by default they should be restored. +++++1 Let's get rid of pg_dumpall -g. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Andres Freund (andres@anarazel.de) wrote: > On 2015-07-28 15:14:11 -0400, Robert Haas wrote: > > On Tue, Jul 28, 2015 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: > > > DBA creates a database and sets some properties (security labels, gucs, > > > acls) on it. Then goes on to restore a backup. Unfortunately that backup > > > might, or might not, overwrite the properties he configured depending on > > > whether the restored database already contains them and from which > > > version the backup originates. > > > > Well, I think that's just a potential incompatibility between 9.6 and > > previous versions, and a relatively minor one at that. We can't and > > don't guarantee that a dump taken using the 9.3 version of pg_dump > > will restore correctly on any server version except 9.3. It might > > work OK on a newer or older version, but then again it might not. > > Even within a single major version it'll be a bit confusing that one > time a restore yielded the desired result (previously set property > survives) and the next restore it doesn't, because now the backup does > contain the property. I'm not sure that I agree with this at all- you might create one SSL certificate after you install PG and then you use one of the various utilities to restore a prior cluster and, blam, you get a different certificate because that's what was in the backup. I might see having an option to enable/disable restoring the database level properies which exist inside a backup as that may be useful flexibility, but I don't believe this concern should stop us from including the database properties in the database backup. Thanks! Stephen
Josh Berkus wrote: > On 07/28/2015 11:58 AM, Robert Haas wrote: > > I'd be strongly in favour of teaching GRANT, SECURITY LABEL, COMMENT > >> ON DATABASE, etc to recognise CURRENT_DATABASE as a keyword. Then > >> dumping them in pg_dump --create, and in pg_dump -Fc . > >> > >> In practice I see zero real use of pg_dumpall without --globals-only, > >> and almost everyone does pg_dump -Fc . I'd like to see that method > >> case actually preserve the whole state of the system and do the right > >> thing sensibly. > >> > >> A pg_restore option to skip database-level settings could be useful, > >> but I think by default they should be restored. > > +++++1 > > Let's get rid of pg_dumpall -g. Quite the opposite, I think --- let's get rid of pg_dumpall EXCEPT when invoked as pg_dumpall -g. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jul 28, 2015 at 3:16 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-07-28 15:14:11 -0400, Robert Haas wrote: >> On Tue, Jul 28, 2015 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: >> > DBA creates a database and sets some properties (security labels, gucs, >> > acls) on it. Then goes on to restore a backup. Unfortunately that backup >> > might, or might not, overwrite the properties he configured depending on >> > whether the restored database already contains them and from which >> > version the backup originates. >> >> Well, I think that's just a potential incompatibility between 9.6 and >> previous versions, and a relatively minor one at that. We can't and >> don't guarantee that a dump taken using the 9.3 version of pg_dump >> will restore correctly on any server version except 9.3. It might >> work OK on a newer or older version, but then again it might not. > > Even within a single major version it'll be a bit confusing that one > time a restore yielded the desired result (previously set property > survives) and the next restore it doesn't, because now the backup does > contain the property. How would that happen? We're not gonna back-patch this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-07-28 15:27:51 -0400, Robert Haas wrote: > On Tue, Jul 28, 2015 at 3:16 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2015-07-28 15:14:11 -0400, Robert Haas wrote: > >> On Tue, Jul 28, 2015 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: > >> > DBA creates a database and sets some properties (security labels, gucs, > >> > acls) on it. Then goes on to restore a backup. Unfortunately that backup > >> > might, or might not, overwrite the properties he configured depending on > >> > whether the restored database already contains them and from which > >> > version the backup originates. > >> > >> Well, I think that's just a potential incompatibility between 9.6 and > >> previous versions, and a relatively minor one at that. We can't and > >> don't guarantee that a dump taken using the 9.3 version of pg_dump > >> will restore correctly on any server version except 9.3. It might > >> work OK on a newer or older version, but then again it might not. > > > > Even within a single major version it'll be a bit confusing that one > > time a restore yielded the desired result (previously set property > > survives) and the next restore it doesn't, because now the backup does > > contain the property. > > How would that happen? We're not gonna back-patch this. Hm? Let me try again: If the admin does a ALTER DATABASE ... SET guc = ... *before* restoring a backup and the backup does contain a setting for the same guc, but with a different value it'll overwrite the previous explicit action by the DBA without any warning. If the backup does *not* contain that guc the previous action survives. That's confusing, because you're more likely to be in the 'the backup does not contain the guc' situation when testing where it thus will work.
On Tue, Jul 28, 2015 at 3:33 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-07-28 15:27:51 -0400, Robert Haas wrote: >> On Tue, Jul 28, 2015 at 3:16 PM, Andres Freund <andres@anarazel.de> wrote: >> > On 2015-07-28 15:14:11 -0400, Robert Haas wrote: >> >> On Tue, Jul 28, 2015 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: >> >> > DBA creates a database and sets some properties (security labels, gucs, >> >> > acls) on it. Then goes on to restore a backup. Unfortunately that backup >> >> > might, or might not, overwrite the properties he configured depending on >> >> > whether the restored database already contains them and from which >> >> > version the backup originates. >> >> >> >> Well, I think that's just a potential incompatibility between 9.6 and >> >> previous versions, and a relatively minor one at that. We can't and >> >> don't guarantee that a dump taken using the 9.3 version of pg_dump >> >> will restore correctly on any server version except 9.3. It might >> >> work OK on a newer or older version, but then again it might not. >> > >> > Even within a single major version it'll be a bit confusing that one >> > time a restore yielded the desired result (previously set property >> > survives) and the next restore it doesn't, because now the backup does >> > contain the property. >> >> How would that happen? We're not gonna back-patch this. > > Hm? Let me try again: If the admin does a ALTER DATABASE ... SET guc = > ... *before* restoring a backup and the backup does contain a setting > for the same guc, but with a different value it'll overwrite the > previous explicit action by the DBA without any warning. If the backup > does *not* contain that guc the previous action survives. That's > confusing, because you're more likely to be in the 'the backup does not > contain the guc' situation when testing where it thus will work. True. But I don't think modifying a database before restoring into it is terribly supported. Even pg_dump --clean, which is supposed to do this sort of thing, doesn't seem to work terribly reliably. We could try to fix this by having a command like ALTER DATABASE ... RESET ALL that we issue before restoring the settings, but I'm afraid that will take us into all sorts of unreasonable scenarios that are better just labeled as "don't do that". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jul 28, 2015 at 03:36:13PM -0400, Robert Haas wrote: > On Tue, Jul 28, 2015 at 3:33 PM, Andres Freund <andres@anarazel.de> wrote: > > Hm? Let me try again: If the admin does a ALTER DATABASE ... SET guc = > > ... *before* restoring a backup and the backup does contain a setting > > for the same guc, but with a different value it'll overwrite the > > previous explicit action by the DBA without any warning. If the backup > > does *not* contain that guc the previous action survives. That's > > confusing, because you're more likely to be in the 'the backup does not > > contain the guc' situation when testing where it thus will work. > > True. But I don't think modifying a database before restoring into it > is terribly supported. Even pg_dump --clean, which is supposed to do > this sort of thing, doesn't seem to work terribly reliably. We could > try to fix this by having a command like ALTER DATABASE ... RESET ALL > that we issue before restoring the settings, but I'm afraid that will > take us into all sorts of unreasonable scenarios that are better just > labeled as "don't do that". Andres's example is a harbinger of the semantic morass ahead. Excepting database objects and the "public" schema object, pg_dump and pg_dumpall mutate only the objects they CREATE. They consistently restore object properties (owner, ACLs, security label, etc.) if and only if issuing a CREATE statement for the object. For example, restoring objects contained in a schema without restoring the schema itself changes none of those schema properties. pg_dump and pg_dumpall have mostly followed that rule for databases, too, but they depart from it for comment and security label. That was a mistake. We can't in general mutate an existing database to match, because we can't mutate the encoding, datcollate or datctype. Even discounting that problem, I value consistency with the rest of the dumpable object types. I suppose many folks using pg_dump without --create think of it as a dump of a database itself, not a dump of a database's contents. They would benefit from a dump that mutates the target database to match the properties of the source database. Helping those folks does not offset the aforementioned losses. Thanks, nm
On Wed, Jul 29, 2015 at 12:39 AM, Noah Misch <noah@leadboat.com> wrote: > On Tue, Jul 28, 2015 at 03:36:13PM -0400, Robert Haas wrote: >> On Tue, Jul 28, 2015 at 3:33 PM, Andres Freund <andres@anarazel.de> wrote: >> > Hm? Let me try again: If the admin does a ALTER DATABASE ... SET guc = >> > ... *before* restoring a backup and the backup does contain a setting >> > for the same guc, but with a different value it'll overwrite the >> > previous explicit action by the DBA without any warning. If the backup >> > does *not* contain that guc the previous action survives. That's >> > confusing, because you're more likely to be in the 'the backup does not >> > contain the guc' situation when testing where it thus will work. >> >> True. But I don't think modifying a database before restoring into it >> is terribly supported. Even pg_dump --clean, which is supposed to do >> this sort of thing, doesn't seem to work terribly reliably. We could >> try to fix this by having a command like ALTER DATABASE ... RESET ALL >> that we issue before restoring the settings, but I'm afraid that will >> take us into all sorts of unreasonable scenarios that are better just >> labeled as "don't do that". > > Andres's example is a harbinger of the semantic morass ahead. Excepting > database objects and the "public" schema object, pg_dump and pg_dumpall mutate > only the objects they CREATE. They consistently restore object properties > (owner, ACLs, security label, etc.) if and only if issuing a CREATE statement > for the object. For example, restoring objects contained in a schema without > restoring the schema itself changes none of those schema properties. pg_dump > and pg_dumpall have mostly followed that rule for databases, too, but they > depart from it for comment and security label. That was a mistake. We can't > in general mutate an existing database to match, because we can't mutate the > encoding, datcollate or datctype. Even discounting that problem, I value > consistency with the rest of the dumpable object types. What we've proven so far (if Craig's comments are to be believed) is that the oft-recommended formula of pg_dumpall -g plus pg_dump of each database doesn't completely work. That's absolutely gotta be fixed. I'm open to whatever ideas you or others may have about how to fix that, but it doesn't seem to me that changing the behavior of pg_dump only when -c is specified gets us any closer to that goal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jul 29, 2015 at 10:50:53AM -0400, Robert Haas wrote: > On Wed, Jul 29, 2015 at 12:39 AM, Noah Misch <noah@leadboat.com> wrote: > > On Tue, Jul 28, 2015 at 03:36:13PM -0400, Robert Haas wrote: > >> On Tue, Jul 28, 2015 at 3:33 PM, Andres Freund <andres@anarazel.de> wrote: > >> > Hm? Let me try again: If the admin does a ALTER DATABASE ... SET guc = > >> > ... *before* restoring a backup and the backup does contain a setting > >> > for the same guc, but with a different value it'll overwrite the > >> > previous explicit action by the DBA without any warning. If the backup > >> > does *not* contain that guc the previous action survives. That's > >> > confusing, because you're more likely to be in the 'the backup does not > >> > contain the guc' situation when testing where it thus will work. > >> > >> True. But I don't think modifying a database before restoring into it > >> is terribly supported. Even pg_dump --clean, which is supposed to do > >> this sort of thing, doesn't seem to work terribly reliably. We could > >> try to fix this by having a command like ALTER DATABASE ... RESET ALL > >> that we issue before restoring the settings, but I'm afraid that will > >> take us into all sorts of unreasonable scenarios that are better just > >> labeled as "don't do that". > > > > Andres's example is a harbinger of the semantic morass ahead. Excepting > > database objects and the "public" schema object, pg_dump and pg_dumpall mutate > > only the objects they CREATE. They consistently restore object properties > > (owner, ACLs, security label, etc.) if and only if issuing a CREATE statement > > for the object. For example, restoring objects contained in a schema without > > restoring the schema itself changes none of those schema properties. pg_dump > > and pg_dumpall have mostly followed that rule for databases, too, but they > > depart from it for comment and security label. That was a mistake. We can't > > in general mutate an existing database to match, because we can't mutate the > > encoding, datcollate or datctype. Even discounting that problem, I value > > consistency with the rest of the dumpable object types. > > What we've proven so far (if Craig's comments are to be believed) is > that the oft-recommended formula of pg_dumpall -g plus pg_dump of each > database doesn't completely work. That's absolutely gotta be fixed. What exact formula did you have in mind? It must not be merely 1. "pg_dumpall -g" 2. "pg_dump" (without --create) per database which _never_ works: it emits no CREATE DATABASE statements. Perhaps this? 1. "pg_dumpall -g" 2. Issue a handwritten CREATE DATABASE statement per database with correct encoding, lc_ctype and lc_collate parameters. All other database properties can be wrong; the dump will fix them. 3. "pg_dump" (without --create) per database That neglects numerous database properties today, but we could make it work. Given the problems I described upthread, it's an inferior formula that I recommend against propping up. I much prefer making this work completely: 1. "pg_dumpall -g" 2. "pg_dump --create" per database Another formula I wouldn't mind offering: 1. "pg_dumpall -g" 2. pg_dumpall --empty-databases 3. "pg_dump" (without --create) per database Code for an --empty-databases option already exists for "pg_dumpall -g --binary-upgrade". A patch turning that into a user-facing feature might be quite compact. I don't see much point given a complete "pg_dump --create", but I wouldn't object. Thanks, nm
Noah Misch wrote: > What exact formula did you have in mind? It must not be merely > > 1. "pg_dumpall -g" > 2. "pg_dump" (without --create) per database > > which _never_ works: it emits no CREATE DATABASE statements. Perhaps this? > > 1. "pg_dumpall -g" > 2. Issue a handwritten CREATE DATABASE statement per database with correct > encoding, lc_ctype and lc_collate parameters. All other database > properties can be wrong; the dump will fix them. > 3. "pg_dump" (without --create) per database > > That neglects numerous database properties today, but we could make it work. > Given the problems I described upthread, it's an inferior formula that I > recommend against propping up. Agreed, and IMO it's embarrasing that it's so complicated to get a fully working backup. > I much prefer making this work completely: > > 1. "pg_dumpall -g" > 2. "pg_dump --create" per database My full support for this proposal. > Another formula I wouldn't mind offering: > > 1. "pg_dumpall -g" > 2. pg_dumpall --empty-databases > 3. "pg_dump" (without --create) per database > > Code for an --empty-databases option already exists for "pg_dumpall -g > --binary-upgrade". A patch turning that into a user-facing feature might be > quite compact. I don't mind if this one is also made to work, but I don't care about this case all that much. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jul 30, 2015 at 2:49 AM, Noah Misch <noah@leadboat.com> wrote: > What exact formula did you have in mind? It must not be merely > > 1. "pg_dumpall -g" > 2. "pg_dump" (without --create) per database > > which _never_ works: it emits no CREATE DATABASE statements. Perhaps this? > > 1. "pg_dumpall -g" > 2. Issue a handwritten CREATE DATABASE statement per database with correct > encoding, lc_ctype and lc_collate parameters. All other database > properties can be wrong; the dump will fix them. > 3. "pg_dump" (without --create) per database > > That neglects numerous database properties today, but we could make it work. > Given the problems I described upthread, it's an inferior formula that I > recommend against propping up. I much prefer making this work completely: > > 1. "pg_dumpall -g" > 2. "pg_dump --create" per database Gah, OK, I see your point. But we better document this, because if you need a PhD in PostgreSQL-ology to take a backup, we're not in a good place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>> 1. "pg_dumpall -g" >> 2. "pg_dump --create" per database > > Gah, OK, I see your point. But we better document this, because if > you need a PhD in PostgreSQL-ology to take a backup, we're not in a > good place. Agreed. Though, honestly, I find this to be a cumbersome approach. I think it just makes things more confusing, even if it is well documented. Perhaps it might be necessary as a bridge to get to a better place. But my first question as an end user would be, 'why can't one tool do this?'. Also, by using 'pg_dumpall -g' aren't you potentially getting things that you don't want/need/care about? For instance, if database 'foo' is owned by 'user1' and database 'bar' is owned by 'user2' and neither have any knowledge/relation of/to the other, then when I dump 'foo', in this manner, wouldn't I also be including 'user2'? Said differently, a restore of a 'foo'-only dump would also include a 'bar' related role. That seems like a bad idea, IMHO. Maybe it can't be avoided, but I'd expect that only relevant information for the database being dumped would be included. -Adam -- Adam Brightwell - adam.brightwell@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
On Thu, Jul 30, 2015 at 10:37:33PM -0400, Adam Brightwell wrote: > On Thu, Jul 30, 2015 at 02:26:34PM -0400, Robert Haas wrote: > >> 1. "pg_dumpall -g" > >> 2. "pg_dump --create" per database > > > > Gah, OK, I see your point. But we better document this, because if > > you need a PhD in PostgreSQL-ology to take a backup, we're not in a > > good place. Definitely. > Agreed. Though, honestly, I find this to be a cumbersome approach. I > think it just makes things more confusing, even if it is well > documented. Perhaps it might be necessary as a bridge to get to a > better place. But my first question as an end user would be, 'why > can't one tool do this?'. "pg_dumpall" (without -g) is that one tool. It has excellent dump fidelity. It lacks the slicing and dump format options of pg_dump, which are important to many sites. > Also, by using 'pg_dumpall -g' aren't you > potentially getting things that you don't want/need/care about? For > instance, if database 'foo' is owned by 'user1' and database 'bar' is > owned by 'user2' and neither have any knowledge/relation of/to the > other, then when I dump 'foo', in this manner, wouldn't I also be > including 'user2'? Said differently, a restore of a 'foo'-only dump > would also include a 'bar' related role. That seems like a bad idea, > IMHO. Maybe it can't be avoided, but I'd expect that only relevant > information for the database being dumped would be included. Nothing in core PostgreSQL attempts to answer the high-level question "Is user1 relevant to database bar?" PostgreSQL has no one concept of a role's relevance to databases. Some reasonable heuristics come to mind, but nothing I'd be eager to memorialize in a core tool like pg_dumpall. Thanks, nm
On Tue, Jul 28, 2015 at 04:23:36PM -0300, Alvaro Herrera wrote: > Josh Berkus wrote: > > On 07/28/2015 11:58 AM, Robert Haas wrote: > > > I'd be strongly in favour of teaching GRANT, SECURITY LABEL, COMMENT > > >> ON DATABASE, etc to recognise CURRENT_DATABASE as a keyword. Then > > >> dumping them in pg_dump --create, and in pg_dump -Fc . > > >> > > >> In practice I see zero real use of pg_dumpall without --globals-only, > > >> and almost everyone does pg_dump -Fc . I'd like to see that method > > >> case actually preserve the whole state of the system and do the right > > >> thing sensibly. > > >> > > >> A pg_restore option to skip database-level settings could be useful, > > >> but I think by default they should be restored. > > > > +++++1 > > > > Let's get rid of pg_dumpall -g. > > Quite the opposite, I think --- let's get rid of pg_dumpall EXCEPT when > invoked as pg_dumpall -g. Is this a TODO? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Mon, Aug 31, 2015 at 05:46:08PM -0400, Bruce Momjian wrote: > On Tue, Jul 28, 2015 at 04:23:36PM -0300, Alvaro Herrera wrote: > > Josh Berkus wrote: > > > On 07/28/2015 11:58 AM, Robert Haas wrote: > > > > I'd be strongly in favour of teaching GRANT, SECURITY LABEL, COMMENT > > > >> ON DATABASE, etc to recognise CURRENT_DATABASE as a keyword. Then > > > >> dumping them in pg_dump --create, and in pg_dump -Fc . > > > >> > > > >> In practice I see zero real use of pg_dumpall without --globals-only, > > > >> and almost everyone does pg_dump -Fc . I'd like to see that method > > > >> case actually preserve the whole state of the system and do the right > > > >> thing sensibly. > > > >> > > > >> A pg_restore option to skip database-level settings could be useful, > > > >> but I think by default they should be restored. > > > > > > +++++1 > > > > > > Let's get rid of pg_dumpall -g. > > > > Quite the opposite, I think --- let's get rid of pg_dumpall EXCEPT when > > invoked as pg_dumpall -g. > > Is this a TODO? Most ideas from this thread had been on TODO for 5+ years. I've just now linked the main existing item to this thread. Removing modes of pg_dumpall isn't on TODO, but I don't think it has enough support to belong there. Thanks, nm