Thread: splitting *_desc routines

splitting *_desc routines

From
Alvaro Herrera
Date:
So Andres Freund floated a proposal to create an xlogdump sort of tool,
in core, and posted a patch.

One advantage of having xlogdump in core is that we can have buildfarm
test that part of the Xlog code, which currently sees no testing at all.
In fact we could use it as a testbed for XLog in general, say by having
it run during "make installcheck" or some such.

I looked at Andres' patch and the general idea is rather horrible: it
links all backend files into the output executable.  This is so that the
*_desc functions can be used from their respective object files, which
obviously have a lot of requirements for backend infrastructure.

However, after looking at the _desc routines themselves, they turn out
to be quite light on requirements: the only thing they really care about
is having a StringInfo around.  (There are also two of them that call
relpathperm(), but I think that's easily fixable.)

(There is also a bug of sorts in gin_desc, which calls elog(PANIC) when
finding a record it doesn't know about; but all other routines simply
call appendStringInfo("unkwown") instead, so I think the right fix here
is to change gin to do the same -- and this should be backpatched all the
way back.)

So it is my intention to split out the desc() functions out of their
current files, into their own file, allowing xlogdump to link against
that and not be forced to link the rest of the backend.

To solve the stringinfo problem, we would provide a separate
implementation of it, one that doesn't depend on palloc and elog.
Initially I would just put it in src/bin/xlogdump, but if people think
we could use it elsewhere, we could do that too.  Alternatively we could
use a shim layer on top of PQExpBuffer.  In fact, we could change the
desc() routines, so that #ifdef FRONTEND they use PQExpBuffer, and
StringInfo otherwise.

One further decision here is whether the various desc() routines should
be each in its own file, or have them all in a single file.  If we use a
single file (which is what I'm inclined to do) we'd need to rename some
static routines that have the same name, such as "out_target"; but this
seems a rather trivial exercise.

Thoughts?

--
Álvaro Herrera <alvherre@alvh.no-ip.org>



Re: splitting *_desc routines

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> I looked at Andres' patch and the general idea is rather horrible: it
> links all backend files into the output executable.  This is so that the
> *_desc functions can be used from their respective object files, which
> obviously have a lot of requirements for backend infrastructure.

Check.

> So it is my intention to split out the desc() functions out of their
> current files, into their own file, allowing xlogdump to link against
> that and not be forced to link the rest of the backend.

Meh.  Can we compromise on one file per xlog rmgr?  I don't much care
for the "one big file" idea.

> To solve the stringinfo problem, we would provide a separate
> implementation of it, one that doesn't depend on palloc and elog.

Shim on top of PQExpBuffer seems like the way to go.

An alternative thing that might be worth considering before you go all
in on this is whether the xlogdump functionality shouldn't just be part
of the regular server executable, ie you'd call it with
postgres --xlogdump <arguments>

and the only extra code needed is two lines for another redirect in
main.c.  We've already done that for things like --describe-config.
This'd likely be a lot less work than getting the _desc routines to
be operable standalone ...
        regards, tom lane



Re: splitting *_desc routines

From
Andres Freund
Date:
On Wednesday, August 29, 2012 10:06:16 PM Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > I looked at Andres' patch and the general idea is rather horrible: it
> > links all backend files into the output executable.  This is so that the
> > *_desc functions can be used from their respective object files, which
> > obviously have a lot of requirements for backend infrastructure.
> 
> Check.
I said it was a preliminary hack though ;). Especially the way I assembled the 
object files... 
The xlogdump utility itself is equally crappy atm, it was just a demonstration 
which suited me enough for debugging... But it really doesn't need that much 
more.

> An alternative thing that might be worth considering before you go all
> in on this is whether the xlogdump functionality shouldn't just be part
> of the regular server executable, ie you'd call it with
> 
>     postgres --xlogdump <arguments>
> 
> and the only extra code needed is two lines for another redirect in
> main.c.  We've already done that for things like --describe-config.
> This'd likely be a lot less work than getting the _desc routines to
> be operable standalone ...
It definitely would be simpler. It doesn't seem nice to pile more and more 
utilities into the main postgres binary though.

Note the ugliness some the testing tools in src/backend go through just to 
link to a few files... Yuck.

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: splitting *_desc routines

From
Robert Haas
Date:
On Thu, Aug 30, 2012 at 12:14 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> An alternative thing that might be worth considering before you go all
>> in on this is whether the xlogdump functionality shouldn't just be part
>> of the regular server executable, ie you'd call it with
>>
>>       postgres --xlogdump <arguments>
>>
>> and the only extra code needed is two lines for another redirect in
>> main.c.  We've already done that for things like --describe-config.
>> This'd likely be a lot less work than getting the _desc routines to
>> be operable standalone ...
> It definitely would be simpler. It doesn't seem nice to pile more and more
> utilities into the main postgres binary though.

Agreed.  Another advantage of making this standalone is that other
people might then be able to write code that does interesting things
with it.  If we just add the functionality into core then nobody's any
better off than before.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: splitting *_desc routines

From
Alvaro Herrera
Date:
Here's a small WIP patch that does the proposed splitting.  This is a
first step towards the objective of having a separately compilable
xlogdump -- more work is needed before that can be made to work fully.

Now, per previous discussion, I have split each rmgr's desc function
into its own file.  This is easiest, but it leaves us with several very
small files in some directories; for example we have

./src/backend/access/transam/clog_desc.c
./src/backend/access/transam/xact_desc.c
./src/backend/access/transam/xlog_desc.c
./src/backend/access/transam/mxact_desc.c

and also
./src/backend/commands/dbase_desc.c
./src/backend/commands/seq_desc.c
./src/backend/commands/tablespace_desc.c

Is people okay with that, or should we consider merging each subdir's
files into a single one? (say transam_desc.c and cmds_desc.c).

The other question is whether the function and struct declarations are
in the best possible locations considering that we will want the files
to be compilable without a backend environment.  I am using xlogdump as
a testbed to ensure that everything is kosher (it's not yet there for
other reasons -- I might end up using something other than
xlog_internal.h, for example).

Once we settle these issues, I will finish up the patch by adding nice
file header comments and the like.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: splitting *_desc routines

From
Simon Riggs
Date:
On 24 October 2012 21:44, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Here's a small WIP patch that does the proposed splitting.  This is a
> first step towards the objective of having a separately compilable
> xlogdump -- more work is needed before that can be made to work fully.
>
> Now, per previous discussion, I have split each rmgr's desc function
> into its own file.  This is easiest, but it leaves us with several very
> small files in some directories; for example we have
>
> ./src/backend/access/transam/clog_desc.c
> ./src/backend/access/transam/xact_desc.c
> ./src/backend/access/transam/xlog_desc.c
> ./src/backend/access/transam/mxact_desc.c
>
> and also
> ./src/backend/commands/dbase_desc.c
> ./src/backend/commands/seq_desc.c
> ./src/backend/commands/tablespace_desc.c
>
> Is people okay with that, or should we consider merging each subdir's
> files into a single one? (say transam_desc.c and cmds_desc.c).

One file per rmgr is the right level of modularity.

I'd put these in a separate directory to avoid annoyance. Transam is
already too large.

src/backend/access/rmgrdesc/xlog_desc.c
...
src/backend/access/rmgrdesc/seq_desc.c

No difference between commands and other stuff. Just one file per
rmgr, using the rmgr name as listed in rmgr.c


> The other question is whether the function and struct declarations are
> in the best possible locations considering that we will want the files
> to be compilable without a backend environment.  I am using xlogdump as
> a testbed to ensure that everything is kosher (it's not yet there for
> other reasons -- I might end up using something other than
> xlog_internal.h, for example).




-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: splitting *_desc routines

From
Robert Haas
Date:
On Thu, Oct 25, 2012 at 4:25 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I'd put these in a separate directory to avoid annoyance. Transam is
> already too large.

+1.  A further advantage of that is that it means that that everything
in that new directory will be intended to end up as
all-separately-compilable, which should make it easier to remember
what the rules are, and easier to design an automated framework that
continuously verifies that it still works that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: splitting *_desc routines

From
Alvaro Herrera
Date:
Robert Haas escribió:
> On Thu, Oct 25, 2012 at 4:25 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > I'd put these in a separate directory to avoid annoyance. Transam is
> > already too large.
>
> +1.  A further advantage of that is that it means that that everything
> in that new directory will be intended to end up as
> all-separately-compilable, which should make it easier to remember
> what the rules are, and easier to design an automated framework that
> continuously verifies that it still works that way.

So incorporating these ideas, the layout now looks like this:

./commands/seq_desc.c
./commands/dbase_desc.c
./commands/tablespace_desc.c
./catalog/storage_desc.c
./utils/cache/relmap_desc.c
./access/rmgrdesc/mxact_desc.c
./access/rmgrdesc/spgdesc.c
./access/rmgrdesc/xact_desc.c
./access/rmgrdesc/heapdesc.c
./access/rmgrdesc/tupdesc.c
./access/rmgrdesc/xlog_desc.c
./access/rmgrdesc/gistdesc.c
./access/rmgrdesc/clog_desc.c
./access/rmgrdesc/hashdesc.c
./access/rmgrdesc/gindesc.c
./access/rmgrdesc/nbtdesc.c
./storage/ipc/standby_desc.c

There are two files that are two subdirs down from the backend
directory:

./storage/ipc/standby_desc.c
./utils/cache/relmap_desc.c

We could keep them in there, or we could alternatively create more
"rmgrdesc" subdirs for them, so

./storage/rmgrdesc/standby_desc.c
./utils/rmgrdesc/relmap_desc.c

This approach appears more future-proof if we ever want to create desc
routines in other subdirs in storage/ and utils/, though it's a bit
annoying to have subdirs that will only hold a single file each (and a
very tiny file to boot).

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: splitting *_desc routines

From
Alvaro Herrera
Date:
And here's the updated patch.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: splitting *_desc routines

From
Simon Riggs
Date:
On 23 November 2012 22:52, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> ./access/rmgrdesc/xact_desc.c
> ./access/rmgrdesc/heapdesc.c

Could we have either all underscores _ or none at all for the naming?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: splitting *_desc routines

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> So incorporating these ideas, the layout now looks like this:

> ./commands/seq_desc.c
> ./commands/dbase_desc.c
> ./commands/tablespace_desc.c
> ./catalog/storage_desc.c
> ./utils/cache/relmap_desc.c
> ./access/rmgrdesc/mxact_desc.c
> ./access/rmgrdesc/spgdesc.c
> ./access/rmgrdesc/xact_desc.c
> ./access/rmgrdesc/heapdesc.c
> ./access/rmgrdesc/tupdesc.c
> ./access/rmgrdesc/xlog_desc.c
> ./access/rmgrdesc/gistdesc.c
> ./access/rmgrdesc/clog_desc.c
> ./access/rmgrdesc/hashdesc.c
> ./access/rmgrdesc/gindesc.c
> ./access/rmgrdesc/nbtdesc.c
> ./storage/ipc/standby_desc.c

FWIW, I'd vote for dumping all of these into *one* rmgrdesc directory
(which may as well be under access/ since that's where the xlog code is),
regardless of where the corresponding replay code is in the source tree.
I don't think splitting them up into half a dozen directories adds
anything except confusion.  If you want, the header comment for each
file could mention where the corresponding replay code lives.
        regards, tom lane



Re: splitting *_desc routines

From
Alvaro Herrera
Date:
Simon Riggs escribió:
> On 23 November 2012 22:52, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> > ./access/rmgrdesc/xact_desc.c
> > ./access/rmgrdesc/heapdesc.c
>
> Could we have either all underscores _ or none at all for the naming?

Sure, removed them all.


Tom Lane escribió:

> FWIW, I'd vote for dumping all of these into *one* rmgrdesc directory
> (which may as well be under access/ since that's where the xlog code is),
> regardless of where the corresponding replay code is in the source tree.
> I don't think splitting them up into half a dozen directories adds
> anything except confusion.  If you want, the header comment for each
> file could mention where the corresponding replay code lives.

Done that way.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: splitting *_desc routines

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane escribi�:
>> FWIW, I'd vote for dumping all of these into *one* rmgrdesc directory
>> (which may as well be under access/ since that's where the xlog code is),
>> regardless of where the corresponding replay code is in the source tree.
>> I don't think splitting them up into half a dozen directories adds
>> anything except confusion.  If you want, the header comment for each
>> file could mention where the corresponding replay code lives.

> Done that way.

Looks pretty sane to me.  The only thing that seems worth discussing is
whether to put the smgr-related XLOG record declarations in storage.h
as you have here, or to invent a new, more private header for them.

There are XLOG record declarations cluttering a lot of other fairly
public headers; but given that we've invented files like heapam_xlog.h
of late, maybe we should start trying to push those declarations to
less-visible spots.  Or maybe it's not worth the trouble.
        regards, tom lane



Re: splitting *_desc routines

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Tom Lane escribi�:
> >> FWIW, I'd vote for dumping all of these into *one* rmgrdesc directory
> >> (which may as well be under access/ since that's where the xlog code is),
> >> regardless of where the corresponding replay code is in the source tree.
> >> I don't think splitting them up into half a dozen directories adds
> >> anything except confusion.  If you want, the header comment for each
> >> file could mention where the corresponding replay code lives.
>
> > Done that way.
>
> Looks pretty sane to me.  The only thing that seems worth discussing is
> whether to put the smgr-related XLOG record declarations in storage.h
> as you have here, or to invent a new, more private header for them.
>
> There are XLOG record declarations cluttering a lot of other fairly
> public headers; but given that we've invented files like heapam_xlog.h
> of late, maybe we should start trying to push those declarations to
> less-visible spots.  Or maybe it's not worth the trouble.

I think it makes sense in the long term to separate things, even if we
don't go out of our ways in the current patch to clean all existing
uses.  It's fairly simple for the case at hand (attached).  With this
there are a couple of files that don't need storage.h anymore (only
storage_xlog.h).  Not a mind-boggling change, I admit.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment