Thread: [GSOC] questions about idea "rewrite pg_dump as library"

[GSOC] questions about idea "rewrite pg_dump as library"

From

帅

Date:

10 April 2013, 14:54:25

<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial">Hi all, <br /><div class="moz-text-html"
lang="x-western"><divclass="moz-forward-container">     I'd like to introduce myself to the dev community. I am Shuai
Fan,a student from Dalian University of Technology, DLUT , for short, China. And I am interested in working with
PostgreSQLproject in GSOC2013.<br />     I'm interested in the idea "Rewrite (add) pg_dump and pg_restore utilities as
libraries(.so, .dll & .dylib)".<span style="color: rgb(0, 0,         0); font-family: sans-serif; font-size: 13px;
font-style:        normal; font-variant: normal; font-weight: normal;         letter-spacing: normal; line-height:
19px;orphans: 2;         text-align: -webkit-auto; text-indent: 0px; text-transform:         none; white-space: normal;
widows:2; word-spacing: 0px;         -webkit-text-size-adjust: auto; -webkit-text-stroke-width:         0px;
background-color:rgb(255, 255, 255); display: inline         !important; float: none; "> </span><br /><br />     These
days,I had a talk with Mr. Pavel Golub by email, the author of this post. And asked some questions about this idea. He
advicedme to post the question to this mail list. <br /><br />     My question is:<br />     There are lots of
functionsin "pg_dump.c". If I rewrite this file as library. I should split "pg_dump.c" into two or more files(Mr. Pavel
Golub'sadvice). However, some functions, do have return value. e.g. <br /><br />     static DumpableObject
*createBoundaryObjects(void);<br/><br />     I thought, these functions must get return value through function argument
bypassing pointer to it, when using library. But, the question is: If I did so, function prototype may be changed. And
Mr.Pavel Golub said "it's better to keep all functions with the same declaration", and so "we will only have one code
basefor both console application and library". I think he is right. <br />     But then, how can I deal with this
situation?From my point of view, I can't malloc a variable (e.g. DumpableObject) in one library (e.g. pg_dumplib.so),
andthen return it's pointer to another library (or excutable program). Maybe, it's not safe(?). Or has something to do
with"memory leak"(?). I'm not sure.<br />     <br />     Do you have any suggestions?<br /><br /><br /> Best wishes,<br
/>    Shuai<br /></div></div></div><br /><br /><span title="neteasefooter"><span
id="netease_mail_footer"></span></span>

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Peter Eisentraut

Date:

10 April 2013, 20:32:34

On 4/10/13 10:54 AM, ˧ wrote:
> I'm interested in the idea "Rewrite (add) pg_dump and pg_restore
> utilities as libraries (.so, .dll & .dylib)".

The pg_dump code is a giant mess, and refactoring it as a library is
perhaps not a project for a new hacker.

Independent of that, I think the first consideration in such a project
would be, who else would be using that library?  What are the use cases?And then come up with an interface around that,
andthen see about 
refactoring pg_dump.

I think the main uses cases mentioned in connection with this idea are
usually in the direction of finer-grained control over what gets dumped
and how.  But making pg_dump into a library would not necessarily
address that.

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Alvaro Herrera

Date:

10 April 2013, 21:03:05

Peter Eisentraut wrote:

> I think the main uses cases mentioned in connection with this idea are
> usually in the direction of finer-grained control over what gets dumped
> and how.  But making pg_dump into a library would not necessarily
> address that.

There's also the matter of embedding pg_dump into other programs.  For
example, calling the pg_dump executable from inside pgAdmin is a rather
messy solution to the problem.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Hannu Krosing

Date:

10 April 2013, 21:16:25

On 04/10/2013 11:02 PM, Alvaro Herrera wrote:
> Peter Eisentraut wrote:
>
>> I think the main uses cases mentioned in connection with this idea are
>> usually in the direction of finer-grained control over what gets dumped
>> and how.  But making pg_dump into a library would not necessarily
>> address that.
> There's also the matter of embedding pg_dump into other programs.  For
> example, calling the pg_dump executable from inside pgAdmin is a rather
> messy solution to the problem.
>
Natural solution to this seems to move most of pg_dump functionality
into backend as functions, so we have pg_dump_xxx() for everything
we want to dump plus a topological sort function for getting the
objects in right order.

The main things left into pg_dump the library would be support various ways
to format the dump results into text, tar and "dump" files.

--------------------
Hannu

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Alvaro Herrera

Date:

10 April 2013, 21:25:03

Hannu Krosing wrote:
> On 04/10/2013 11:02 PM, Alvaro Herrera wrote:
> >Peter Eisentraut wrote:
> >
> >>I think the main uses cases mentioned in connection with this idea are
> >>usually in the direction of finer-grained control over what gets dumped
> >>and how.  But making pg_dump into a library would not necessarily
> >>address that.
> >There's also the matter of embedding pg_dump into other programs.  For
> >example, calling the pg_dump executable from inside pgAdmin is a rather
> >messy solution to the problem.
>
> Natural solution to this seems to move most of pg_dump functionality
> into backend as functions, so we have pg_dump_xxx() for everything
> we want to dump plus a topological sort function for getting the
> objects in right order.

This idea doesn't work because of back-patch considerations (i.e. we
would not be able to create the functions in back branches, and so this
new style of pg_dump would only work with future server versions).  So
pg_dump itself would have to retain capability to dump stuff from old
servers.  This seems unlikely to fly at all, because we'd be then
effectively maintaining pg_dump in two places, both backend and the
pg_dump source code.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Tom Lane

Date:

10 April 2013, 22:18:11

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Hannu Krosing wrote:
>> Natural solution to this seems to move most of pg_dump functionality
>> into backend as functions, so we have pg_dump_xxx() for everything
>> we want to dump plus a topological sort function for getting the
>> objects in right order.

> This idea doesn't work because of back-patch considerations (i.e. we
> would not be able to create the functions in back branches, and so this
> new style of pg_dump would only work with future server versions).  So
> pg_dump itself would have to retain capability to dump stuff from old
> servers.  This seems unlikely to fly at all, because we'd be then
> effectively maintaining pg_dump in two places, both backend and the
> pg_dump source code.

There are other issues too, in particular that most of the backend's
code tends to work on SnapshotNow time whereas pg_dump would really
prefer it was all done according to the transaction snapshot.  We have
got bugs of that ilk already in pg_dump, but we shouldn't introduce a
bunch more.  Doing this right would therefore mean that we'd have to
write a lot of duplicative code in the backend, ie, it's not clear that
we gain any synergy by pushing the functionality over.  It might
simplify cross-backend-version issues (at least for backend versions
released after we'd rewritten all that code) but otherwise I'm afraid
it'd just be pushing the problems somewhere else.

In any case, "push it to the backend" offers no detectable help with the
core design issue here, which is figuring out what functionality needs
to be exposed with what API.
        regards, tom lane

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Pavel Golub

Date:

11 April 2013, 05:58:42

Hello, 帅.

You wrote:

帅> Hi all,
帅>     I'd like to introduce myself to the dev community. I am Shuai
帅> Fan, a student from Dalian University of Technology, DLUT , for
帅> short, China. And I am interested in working with PostgreSQL project in GSOC2013.
帅>     I'm interested in the idea "Rewrite (add) pg_dump and
帅> pg_restore utilities as libraries (.so, .dll & .dylib)".

帅>     These days, I had a talk with Mr. Pavel Golub by email, the
帅> author of this post. And asked some questions about this idea. He
帅> adviced me to post the question to this mail list.

帅>     My question is:
帅>     There are lots of functions in "pg_dump.c". If I rewrite this
帅> file as library. I should split "pg_dump.c" into two or more
帅> files(Mr. Pavel Golub's advice). However, some functions, do have return value. e.g.

帅>     static DumpableObject *createBoundaryObjects(void);

帅>     I thought, these functions must get return value through
帅> function argument by passing pointer to it, when using library.
帅> But, the question is: If I did so, function prototype may be
帅> changed. And Mr. Pavel Golub said "it's better to keep all
帅> functions with the same declaration", and so "we will only have one
帅> code base for both console application and library". I think he is right.
帅>     But then, how can I deal with this situation? From my point of
帅> view, I can't malloc a variable (e.g. DumpableObject) in one
帅> library (e.g. pg_dumplib.so), and then return it's pointer to
帅> another library (or excutable program). Maybe, it's not safe(?). Or
帅> has something to do with "memory leak"(?). I'm not sure.
帅>
帅>     Do you have any suggestions?

From my point of view the new library should export only two
functions:

1. The execution function:

ExecStatusType PGdumpdbParams(const char * const *keywords,                const char * const *values);

Return type may be other, but ExecStatusType seems to be OK for this
purpose: PGRES_TUPLES_OK - for success, PGRES_FATAL_ERROR - for fail.

Parameters will remain the same as usual command-line options for
pg_dump. Thus we will have less work for existing application, e.g.
pgAdmin.

2. Logging or notice processing function:

typedef void (*PGdumplogProcessor) (char *relname, int done, char *message);

PGdumplogProcessor PQsetNoticeProcessor(PGdumplogProcessor proc,                    void *arg);

The purpose of this function is process output of the dump. The first
argument is callback-function which accepts information about current
relname (or operation, or stage), done indicates how much work done
(for progress bars etc.), message contains some extra information.

That's all! Only two functions. Indeed we don't need all those
low-level dump functions like createBoundaryObjects etc. There will be
the only one entry to the whole logic. And if one wants the only one
table, funcction or schema - combination of correct parameters should
be passed to PGdumpdbParams.

帅> Best wishes,
帅>     Shuai




--
With best wishes,Pavel                          mailto:pavel@gf.microolap.com

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Tom Lane

Date:

11 April 2013, 13:51:42

Pavel Golub <pavel@microolap.com> writes:
> From my point of view the new library should export only two
> functions:

> 1. The execution function:

> ExecStatusType PGdumpdbParams(const char * const *keywords,
>                  const char * const *values);

No, this is exactly *wrong*.  You might as well not bother to refactor,
if the only API the library presents is exactly equivalent to what you
could get with system("pg_dump ...").

I don't know what the right answer is, but this isn't it.  Most people
who are interested in this topic are interested because they want to get
output that is different from anything pg_dump would produce on its own,
for instance applying a more complex object-selection rule than anything
pg_dump offers.  Right now, the only way they can do that is lobby to
add new switch options to pg_dump.  With a change like this, it'd still
be the case that they can't get what they want except by adding new
switch options to pg_dump.  I don't see any advantage gained.
        regards, tom lane

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Pavel Golub

Date:

11 April 2013, 14:51:18

Hello, Tom.

You wrote:

TL> Pavel Golub <pavel@microolap.com> writes:
>> From my point of view the new library should export only two
>> functions:

>> 1. The execution function:

>> ExecStatusType PGdumpdbParams(const char * const *keywords,
>>                  const char * const *values);

TL> No, this is exactly *wrong*.  You might as well not bother to refactor,
TL> if the only API the library presents is exactly equivalent to what you
TL> could get with system("pg_dump ...").

Well, yes. You're absolutely right. But should this be a starting
point?

TL> I don't know what the right answer is, but this isn't it.  Most people
TL> who are interested in this topic are interested because they want to get
TL> output that is different from anything pg_dump would produce on its own,
TL> for instance applying a more complex object-selection rule than anything
TL> pg_dump offers.  Right now, the only way they can do that is lobby to
TL> add new switch options to pg_dump.  With a change like this, it'd still
TL> be the case that they can't get what they want except by adding new
TL> switch options to pg_dump.  I don't see any advantage gained.

TL>                         regards, tom lane

-- 
With best wishes,Pavel                          mailto:pavel@gf.microolap.com

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Andrew Dunstan

Date:

11 April 2013, 15:48:58

On 04/11/2013 09:51 AM, Tom Lane wrote:
> Pavel Golub <pavel@microolap.com> writes:
>>  From my point of view the new library should export only two
>> functions:
>> 1. The execution function:
>> ExecStatusType PGdumpdbParams(const char * const *keywords,
>>                   const char * const *values);
> No, this is exactly *wrong*.  You might as well not bother to refactor,
> if the only API the library presents is exactly equivalent to what you
> could get with system("pg_dump ...").
>
> I don't know what the right answer is, but this isn't it.  Most people
> who are interested in this topic are interested because they want to get
> output that is different from anything pg_dump would produce on its own,
> for instance applying a more complex object-selection rule than anything
> pg_dump offers.  Right now, the only way they can do that is lobby to
> add new switch options to pg_dump.  With a change like this, it'd still
> be the case that they can't get what they want except by adding new
> switch options to pg_dump.  I don't see any advantage gained.
>
>             

Well, either they want that or they want that output more accessibly, 
and without all the baggage that pg_dump necessarily brings to the 
table. pg_dump does a lot of stuff that's basically designed for bulk 
operations, and often what people want is a way to get, say, the 
creation DDL for some object, without any locks than the usual locks any 
transaction takes. Last year I started writing a package to provide such 
functions, which i called RetailDDL, and it was well received at the 
conference where I talked about it, but I have not had time since then 
to work on it, as JSON development among other things has had a rather 
higher priority. But I think it's very well worth doing. I think in 
principle having database functions for the creation DDL for its own 
objects is a good idea.

And yes, that would mean keeping knowledge of how to produce such output 
in two places - pg_dump is going to need to keep historical knowledge, 
for one thing. But I can live with that.

It could be interesting to have a library that would output database 
metadata in some machine readable and manipulatable format such as JSON 
or XML. One thing that's annoying about the text output pg_dump produces 
is that it's not at all structured, so if you want, for example, to 
restore a table but to a table of a different name, or to a different 
schema, then you're reduced to having to mangle the SQL by using hand 
editing or regular expression matching. Something with the potential to 
ease that pain would be worth having.

cheers

andrew

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Stephen Frost

Date:

11 April 2013, 16:00:36

* Andrew Dunstan (andrew@dunslane.net) wrote:
> On 04/11/2013 09:51 AM, Tom Lane wrote:
> >No, this is exactly *wrong*.  You might as well not bother to refactor,
> >if the only API the library presents is exactly equivalent to what you
> >could get with system("pg_dump ...").

Agreed.

> Well, either they want that or they want that output more
> accessibly, and without all the baggage that pg_dump necessarily
> brings to the table. pg_dump does a lot of stuff that's basically
> designed for bulk operations, and often what people want is a way to
> get, say, the creation DDL for some object, without any locks than
> the usual locks any transaction takes.

Yes- being able to get that from a simple database function would be
very nice.  I wonder if some of what's been done with the "event"
triggers would inform us about what that API should look like.

> And yes, that would mean keeping knowledge of how to produce such
> output in two places - pg_dump is going to need to keep historical
> knowledge, for one thing. But I can live with that.

Agreed.  If it doesn't live in two places, for some period, we'll never
actually have it outside of pg_dump.

> It could be interesting to have a library that would output database
> metadata in some machine readable and manipulatable format such as
> JSON or XML.

The DB function to produce the DDL command might have options to produce
it in multiple formats..?  Seems like it'd be nice to have, though
perhaps not initially.
Thanks,
    Stephen

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Michael Paquier

Date:

12 April 2013, 00:17:13

<div dir="ltr">On Fri, Apr 12, 2013 at 1:00 AM, Stephen Frost <span dir="ltr"><<a href="mailto:sfrost@snowman.net"
target="_blank">sfrost@snowman.net</a>></span>wrote:<br /><div class="gmail_extra"><div
class="gmail_quote"><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">>Well, either they want that or they want that output more<br /><div class="im"> >
accessibly,and without all the baggage that pg_dump necessarily<br /> > brings to the table. pg_dump does a lot of
stuffthat's basically<br /> > designed for bulk operations, and often what people want is a way to<br /> > get,
say,the creation DDL for some object, without any locks than<br /> > the usual locks any transaction takes.<br /><br
/></div>Yes-being able to get that from a simple database function would be<br /> very nice.  I wonder if some of
what'sbeen done with the "event"<br /> triggers would inform us about what that API should look like.<br
/></blockquote></div>Irecall discussions about reverse engineering of a parsed query tree in<br /></div><div
class="gmail_extra">theevent trigger threads but nothing has been committed I think. Also, you<br /></div><div
class="gmail_extra">need to consider that implementing such reverse engineering mechanism in<br />core might not be a
goodthing for new features and maintenance, as it<br />would mean that it is necessary to change those APIs
consistentlywith what<br /> is added on the parsing side.<br /></div><div class="gmail_extra">It could make more sense
tohave such a set of functions created as a<br /></div><div class="gmail_extra">separate project.<br /><br /></div><div
class="gmail_extra">My 2c.<br /></div><div class="gmail_extra">-- <br />Michael<br /></div></div>

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Tatsuo Ishii

Date:

12 April 2013, 00:29:30

>> > Well, either they want that or they want that output more
>> > accessibly, and without all the baggage that pg_dump necessarily
>> > brings to the table. pg_dump does a lot of stuff that's basically
>> > designed for bulk operations, and often what people want is a way to
>> > get, say, the creation DDL for some object, without any locks than
>> > the usual locks any transaction takes.
>>
>> Yes- being able to get that from a simple database function would be
>> very nice.  I wonder if some of what's been done with the "event"
>> triggers would inform us about what that API should look like.
>>
> I recall discussions about reverse engineering of a parsed query tree in
> the event trigger threads but nothing has been committed I think. Also, you
> need to consider that implementing such reverse engineering mechanism in
> core might not be a good thing for new features and maintenance, as it
> would mean that it is necessary to change those APIs consistently with what
> is added on the parsing side.
> It could make more sense to have such a set of functions created as a
> separate project.

This may or may not related to, but...

pgpool-II already does "reverse engineering" from a parsed query. It
parses a query, genetrates raw parse tree, rewrites it for certain
purpose and generates text query. If you are interested, you could
take a look at pgpool-II source code.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Dimitri Fontaine

Date:

12 April 2013, 09:45:36

Tom Lane <tgl@sss.pgh.pa.us> writes:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> This idea doesn't work because of back-patch considerations (i.e. we
>> would not be able to create the functions in back branches, and so this
>> new style of pg_dump would only work with future server versions).  So

That is a policy question, not a technical one.

We could either add the new functions in the backend binary itself, or
provide it as an extension that pg_dump would know to install when
needed, if we decided it's ok.

My understanding is that will need to change that policy anyways the day
we have a page disk format change and pg_upgrade needs to flag the old
cluster pages with the old page version number before being able to run,
or something.

> There are other issues too, in particular that most of the backend's
> code tends to work on SnapshotNow time whereas pg_dump would really
> prefer it was all done according to the transaction snapshot.  We have

Would that be solved by having MVCC catalogs, or the backend code you're
talking about wouldn't be included in there? (which would be surprising
to me, as much as trumping the benefits of MVCC catalogs, but well).

> In any case, "push it to the backend" offers no detectable help with the
> core design issue here, which is figuring out what functionality needs
> to be exposed with what API.

Andrew did begin to work on that parts with the "Retail DDL" project. We
know of several "getddl" implementation, and you can also have a look at
the pg_dump -Fs (split format) patch that didn't make it for 9.3, where
some API work has been done.

The need exists and some thinking over the API to get here did happen.
Some more certainly needs to be done, granted.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Dimitri Fontaine

Date:

12 April 2013, 09:50:45

Michael Paquier <michael.paquier@gmail.com> writes:
> I recall discussions about reverse engineering of a parsed query tree in
> the event trigger threads but nothing has been committed I think. Also, you

Yes. The name used in there was "Normalized Command String".

> need to consider that implementing such reverse engineering mechanism in
> core might not be a good thing for new features and maintenance, as it
> would mean that it is necessary to change those APIs consistently with what
> is added on the parsing side.

The goal is to retain the same API, which is quite simple:
 function get_command_string(Node *parsetree) returns text

At the SQL level, the Node * is of datatype "internal" and you can't
forge it, you need to be given it in some ways. In the Event Trigger
case we though of a TG_PARSETREE magic variable, or maybe another
function get_current_parsetree() that only work when called from an
event trigger.

The other part of the API of course is how to represent the data, and as
we're talking about a Normalized Command String, there's no choice but
issuing a valid SQL command string that the server would know how to
execute and which would have the same side effects.

So of course a 9.3 and a 9.4 server equiped with that hypothetical
function would behave differently when the syntax did change. And that's
exactly why I think it the best choice here is to have that code
embedded and maintained in core.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Shuai Fan

Date:

12 April 2013, 10:36:06

On 04/11/2013 11:48 PM, Andrew Dunstan wrote:
> It could be interesting to have a library that would output database 
> metadata in some machine readable and manipulatable format such as 
> JSON or XML. One thing that's annoying about the text output pg_dump 
> produces is that it's not at all structured, so if you want, for 
> example, to restore a table but to a table of a different name, or to 
> a different schema, then you're reduced to having to mangle the SQL by 
> using hand editing or regular expression matching. Something with the 
> potential to ease that pain would be worth having.
>
Yes. This is really interesting. Current code in pg_dump, supports 4 
ArchiveFormat, e.g. archCustom, archTar, archNull and archDirectory. 
These formats are implementation of interface "pg_backup". Maybe I could 
try to add two implementation of "XML" and "JSON".    It is worth to mention that I wrote a program to parse XML format

file into csv one using library "libxercesc" a month ago, Although, this 
program is just like helloworld. But, maybe I could get benefit from 
that small program, because both of them use XML format. And what I need 
to do is try another xml library.    I had a look at JSON on wiki. The format is a little like XML. Both 
of them are nested. And there are some library could be used, e.g. 
libjson (or json.c, or other json library writting in C) and libxml2 (or 
something else).
    BTW, could it be an idea for GSOC? If so, I can have a try. Add XML 
and JSON output format for pg_dump.
    Thank you all for your attention.


Best regards,     Shuai

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Hannu Krosing

Date:

12 April 2013, 12:07:50

On 04/11/2013 12:17 AM, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Hannu Krosing wrote:
>>> Natural solution to this seems to move most of pg_dump functionality
>>> into backend as functions, so we have pg_dump_xxx() for everything
>>> we want to dump plus a topological sort function for getting the
>>> objects in right order.
>> This idea doesn't work because of back-patch considerations (i.e. we
>> would not be able to create the functions in back branches, and so this
>> new style of pg_dump would only work with future server versions).  So
>> pg_dump itself would have to retain capability to dump stuff from old
>> servers.  This seems unlikely to fly at all, because we'd be then
>> effectively maintaining pg_dump in two places, both backend and the
>> pg_dump source code.
> There are other issues too, in particular that most of the backend's
> code tends to work on SnapshotNow time whereas pg_dump would really
> prefer it was all done according to the transaction snapshot.
I was just thinking of moving the queries the pg_dump currently
uses into UDF-s, which do _not_ use catalog cache, but will use
the same SQL to query catalogs as pg_dump currently does
using whatever snapshot mode is currently set .

the pg_dump will need to still have the same queries for older
versions of postgresql but for new versions pg_dump  can become
catalog-agnostic.

and I think that we can retire pg_dump support for older
postgresql versions the same way we drop support for
older versions of postgresql itself.

Hannu

> We have
> got bugs of that ilk already in pg_dump, but we shouldn't introduce a
> bunch more.  Doing this right would therefore mean that we'd have to
> write a lot of duplicative code in the backend, ie, it's not clear that
> we gain any synergy by pushing the functionality over.  It might
> simplify cross-backend-version issues (at least for backend versions
> released after we'd rewritten all that code) but otherwise I'm afraid
> it'd just be pushing the problems somewhere else.
>
> In any case, "push it to the backend" offers no detectable help with the
> core design issue here, which is figuring out what functionality needs
> to be exposed with what API.
main things I see would be
 * get_list_of_objects(object_type, pattern or namelist) * get_sql_def_for_object(object_type, object_name) *
sort_by_dependency(listof [obj_type, obj_name])
 

from this you could easily construct most uses, especially if
sort_by_dependency(list of [obj_type, obj_name])
would be smart enough to break circular dependencies, like
turning to tables with mutual FK-s into tabledefs without
FKs + separate constraints.

Or we could always have constraints separately, so that
the ones depending on non-exported objects would be easy
to leave out

My be the dependency API analysis itself is something
worth a GSOC effort ?

Hannu
>
>             regards, tom lane

Re: [GSOC] questions about idea "rewrite pg_dump as library"

From

Joel Jacobson

Date:

20 April 2013, 20:27:51

On Fri, Apr 12, 2013 at 1:07 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:

I was just thinking of moving the queries the pg_dump currently
uses into UDF-s, which do _not_ use catalog cache, but will use
the same SQL to query catalogs as pg_dump currently does
using whatever snapshot mode is currently set .

the pg_dump will need to still have the same queries for older
versions of postgresql but for new versions pg_dump can become
catalog-agnostic.

and I think that we can retire pg_dump support for older
postgresql versions the same way we drop support for
older versions of postgresql itself.

main things I see would be

* get_list_of_objects(object_type, pattern or namelist)
* get_sql_def_for_object(object_type, object_name)
* sort_by_dependency(list of [obj_type, obj_name])

from this you could easily construct most uses, especially if
sort_by_dependency(list of [obj_type, obj_name])
would be smart enough to break circular dependencies, like
turning to tables with mutual FK-s into tabledefs without
FKs + separate constraints.

This is an excellent idea. This would allow doing all kinds of crazy things outside of the scope of pg_dump.

2 years ago I was working on a system to version control the schema, inside the database.

Don't know if it's a good idea or not, but one thing which bugged me a lot was the lack of pg_get_[object type]def(oid) functions for all different object types.

It also turned out to be quite complicated to do the pg_depend topological sort yourself. I managed eventually, but it was running to slow because I had to pass the entire content of pg_depend to a plperl function I wrote.

With this in place I would be motivated enough to resume my old project, which is still online at https://github.com/gluefinance/pov if anyone is interested.

Is it really necessary to write all the missing pg_get_[object type]def(oid) functions in C? I think it would be quite easy to put them together using pure SQL, you wouldn't even need PL/pgSQL.

This old view I once wrote manage to produce working create and drop statements for most object types using SQL only:

https://github.com/gluefinance/pov/blob/master/sql/schema/pov/views/pg_depend_definitions.sql

It would also be nice with functions which returned the proper command to DROP an object. I need it in this project in order to do schema modifications where objects have to be dropped/recreated in a particular order to not break dependencies. Perhaps there are other use cases out there.