Thread: splitting *_desc routines
So Andres Freund floated a proposal to create an xlogdump sort of tool, in core, and posted a patch. One advantage of having xlogdump in core is that we can have buildfarm test that part of the Xlog code, which currently sees no testing at all. In fact we could use it as a testbed for XLog in general, say by having it run during "make installcheck" or some such. I looked at Andres' patch and the general idea is rather horrible: it links all backend files into the output executable. This is so that the *_desc functions can be used from their respective object files, which obviously have a lot of requirements for backend infrastructure. However, after looking at the _desc routines themselves, they turn out to be quite light on requirements: the only thing they really care about is having a StringInfo around. (There are also two of them that call relpathperm(), but I think that's easily fixable.) (There is also a bug of sorts in gin_desc, which calls elog(PANIC) when finding a record it doesn't know about; but all other routines simply call appendStringInfo("unkwown") instead, so I think the right fix here is to change gin to do the same -- and this should be backpatched all the way back.) So it is my intention to split out the desc() functions out of their current files, into their own file, allowing xlogdump to link against that and not be forced to link the rest of the backend. To solve the stringinfo problem, we would provide a separate implementation of it, one that doesn't depend on palloc and elog. Initially I would just put it in src/bin/xlogdump, but if people think we could use it elsewhere, we could do that too. Alternatively we could use a shim layer on top of PQExpBuffer. In fact, we could change the desc() routines, so that #ifdef FRONTEND they use PQExpBuffer, and StringInfo otherwise. One further decision here is whether the various desc() routines should be each in its own file, or have them all in a single file. If we use a single file (which is what I'm inclined to do) we'd need to rename some static routines that have the same name, such as "out_target"; but this seems a rather trivial exercise. Thoughts? -- Álvaro Herrera <alvherre@alvh.no-ip.org>
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > I looked at Andres' patch and the general idea is rather horrible: it > links all backend files into the output executable. This is so that the > *_desc functions can be used from their respective object files, which > obviously have a lot of requirements for backend infrastructure. Check. > So it is my intention to split out the desc() functions out of their > current files, into their own file, allowing xlogdump to link against > that and not be forced to link the rest of the backend. Meh. Can we compromise on one file per xlog rmgr? I don't much care for the "one big file" idea. > To solve the stringinfo problem, we would provide a separate > implementation of it, one that doesn't depend on palloc and elog. Shim on top of PQExpBuffer seems like the way to go. An alternative thing that might be worth considering before you go all in on this is whether the xlogdump functionality shouldn't just be part of the regular server executable, ie you'd call it with postgres --xlogdump <arguments> and the only extra code needed is two lines for another redirect in main.c. We've already done that for things like --describe-config. This'd likely be a lot less work than getting the _desc routines to be operable standalone ... regards, tom lane
On Wednesday, August 29, 2012 10:06:16 PM Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > I looked at Andres' patch and the general idea is rather horrible: it > > links all backend files into the output executable. This is so that the > > *_desc functions can be used from their respective object files, which > > obviously have a lot of requirements for backend infrastructure. > > Check. I said it was a preliminary hack though ;). Especially the way I assembled the object files... The xlogdump utility itself is equally crappy atm, it was just a demonstration which suited me enough for debugging... But it really doesn't need that much more. > An alternative thing that might be worth considering before you go all > in on this is whether the xlogdump functionality shouldn't just be part > of the regular server executable, ie you'd call it with > > postgres --xlogdump <arguments> > > and the only extra code needed is two lines for another redirect in > main.c. We've already done that for things like --describe-config. > This'd likely be a lot less work than getting the _desc routines to > be operable standalone ... It definitely would be simpler. It doesn't seem nice to pile more and more utilities into the main postgres binary though. Note the ugliness some the testing tools in src/backend go through just to link to a few files... Yuck. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Aug 30, 2012 at 12:14 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> An alternative thing that might be worth considering before you go all >> in on this is whether the xlogdump functionality shouldn't just be part >> of the regular server executable, ie you'd call it with >> >> postgres --xlogdump <arguments> >> >> and the only extra code needed is two lines for another redirect in >> main.c. We've already done that for things like --describe-config. >> This'd likely be a lot less work than getting the _desc routines to >> be operable standalone ... > It definitely would be simpler. It doesn't seem nice to pile more and more > utilities into the main postgres binary though. Agreed. Another advantage of making this standalone is that other people might then be able to write code that does interesting things with it. If we just add the functionality into core then nobody's any better off than before. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Here's a small WIP patch that does the proposed splitting. This is a first step towards the objective of having a separately compilable xlogdump -- more work is needed before that can be made to work fully. Now, per previous discussion, I have split each rmgr's desc function into its own file. This is easiest, but it leaves us with several very small files in some directories; for example we have ./src/backend/access/transam/clog_desc.c ./src/backend/access/transam/xact_desc.c ./src/backend/access/transam/xlog_desc.c ./src/backend/access/transam/mxact_desc.c and also ./src/backend/commands/dbase_desc.c ./src/backend/commands/seq_desc.c ./src/backend/commands/tablespace_desc.c Is people okay with that, or should we consider merging each subdir's files into a single one? (say transam_desc.c and cmds_desc.c). The other question is whether the function and struct declarations are in the best possible locations considering that we will want the files to be compilable without a backend environment. I am using xlogdump as a testbed to ensure that everything is kosher (it's not yet there for other reasons -- I might end up using something other than xlog_internal.h, for example). Once we settle these issues, I will finish up the patch by adding nice file header comments and the like. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 24 October 2012 21:44, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Here's a small WIP patch that does the proposed splitting. This is a > first step towards the objective of having a separately compilable > xlogdump -- more work is needed before that can be made to work fully. > > Now, per previous discussion, I have split each rmgr's desc function > into its own file. This is easiest, but it leaves us with several very > small files in some directories; for example we have > > ./src/backend/access/transam/clog_desc.c > ./src/backend/access/transam/xact_desc.c > ./src/backend/access/transam/xlog_desc.c > ./src/backend/access/transam/mxact_desc.c > > and also > ./src/backend/commands/dbase_desc.c > ./src/backend/commands/seq_desc.c > ./src/backend/commands/tablespace_desc.c > > Is people okay with that, or should we consider merging each subdir's > files into a single one? (say transam_desc.c and cmds_desc.c). One file per rmgr is the right level of modularity. I'd put these in a separate directory to avoid annoyance. Transam is already too large. src/backend/access/rmgrdesc/xlog_desc.c ... src/backend/access/rmgrdesc/seq_desc.c No difference between commands and other stuff. Just one file per rmgr, using the rmgr name as listed in rmgr.c > The other question is whether the function and struct declarations are > in the best possible locations considering that we will want the files > to be compilable without a backend environment. I am using xlogdump as > a testbed to ensure that everything is kosher (it's not yet there for > other reasons -- I might end up using something other than > xlog_internal.h, for example). -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 25, 2012 at 4:25 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > I'd put these in a separate directory to avoid annoyance. Transam is > already too large. +1. A further advantage of that is that it means that that everything in that new directory will be intended to end up as all-separately-compilable, which should make it easier to remember what the rules are, and easier to design an automated framework that continuously verifies that it still works that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > On Thu, Oct 25, 2012 at 4:25 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > I'd put these in a separate directory to avoid annoyance. Transam is > > already too large. > > +1. A further advantage of that is that it means that that everything > in that new directory will be intended to end up as > all-separately-compilable, which should make it easier to remember > what the rules are, and easier to design an automated framework that > continuously verifies that it still works that way. So incorporating these ideas, the layout now looks like this: ./commands/seq_desc.c ./commands/dbase_desc.c ./commands/tablespace_desc.c ./catalog/storage_desc.c ./utils/cache/relmap_desc.c ./access/rmgrdesc/mxact_desc.c ./access/rmgrdesc/spgdesc.c ./access/rmgrdesc/xact_desc.c ./access/rmgrdesc/heapdesc.c ./access/rmgrdesc/tupdesc.c ./access/rmgrdesc/xlog_desc.c ./access/rmgrdesc/gistdesc.c ./access/rmgrdesc/clog_desc.c ./access/rmgrdesc/hashdesc.c ./access/rmgrdesc/gindesc.c ./access/rmgrdesc/nbtdesc.c ./storage/ipc/standby_desc.c There are two files that are two subdirs down from the backend directory: ./storage/ipc/standby_desc.c ./utils/cache/relmap_desc.c We could keep them in there, or we could alternatively create more "rmgrdesc" subdirs for them, so ./storage/rmgrdesc/standby_desc.c ./utils/rmgrdesc/relmap_desc.c This approach appears more future-proof if we ever want to create desc routines in other subdirs in storage/ and utils/, though it's a bit annoying to have subdirs that will only hold a single file each (and a very tiny file to boot). -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
And here's the updated patch. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 23 November 2012 22:52, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > ./access/rmgrdesc/xact_desc.c > ./access/rmgrdesc/heapdesc.c Could we have either all underscores _ or none at all for the naming? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > So incorporating these ideas, the layout now looks like this: > ./commands/seq_desc.c > ./commands/dbase_desc.c > ./commands/tablespace_desc.c > ./catalog/storage_desc.c > ./utils/cache/relmap_desc.c > ./access/rmgrdesc/mxact_desc.c > ./access/rmgrdesc/spgdesc.c > ./access/rmgrdesc/xact_desc.c > ./access/rmgrdesc/heapdesc.c > ./access/rmgrdesc/tupdesc.c > ./access/rmgrdesc/xlog_desc.c > ./access/rmgrdesc/gistdesc.c > ./access/rmgrdesc/clog_desc.c > ./access/rmgrdesc/hashdesc.c > ./access/rmgrdesc/gindesc.c > ./access/rmgrdesc/nbtdesc.c > ./storage/ipc/standby_desc.c FWIW, I'd vote for dumping all of these into *one* rmgrdesc directory (which may as well be under access/ since that's where the xlog code is), regardless of where the corresponding replay code is in the source tree. I don't think splitting them up into half a dozen directories adds anything except confusion. If you want, the header comment for each file could mention where the corresponding replay code lives. regards, tom lane
Simon Riggs escribió: > On 23 November 2012 22:52, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > ./access/rmgrdesc/xact_desc.c > > ./access/rmgrdesc/heapdesc.c > > Could we have either all underscores _ or none at all for the naming? Sure, removed them all. Tom Lane escribió: > FWIW, I'd vote for dumping all of these into *one* rmgrdesc directory > (which may as well be under access/ since that's where the xlog code is), > regardless of where the corresponding replay code is in the source tree. > I don't think splitting them up into half a dozen directories adds > anything except confusion. If you want, the header comment for each > file could mention where the corresponding replay code lives. Done that way. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane escribi�: >> FWIW, I'd vote for dumping all of these into *one* rmgrdesc directory >> (which may as well be under access/ since that's where the xlog code is), >> regardless of where the corresponding replay code is in the source tree. >> I don't think splitting them up into half a dozen directories adds >> anything except confusion. If you want, the header comment for each >> file could mention where the corresponding replay code lives. > Done that way. Looks pretty sane to me. The only thing that seems worth discussing is whether to put the smgr-related XLOG record declarations in storage.h as you have here, or to invent a new, more private header for them. There are XLOG record declarations cluttering a lot of other fairly public headers; but given that we've invented files like heapam_xlog.h of late, maybe we should start trying to push those declarations to less-visible spots. Or maybe it's not worth the trouble. regards, tom lane
Tom Lane escribió: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Tom Lane escribi�: > >> FWIW, I'd vote for dumping all of these into *one* rmgrdesc directory > >> (which may as well be under access/ since that's where the xlog code is), > >> regardless of where the corresponding replay code is in the source tree. > >> I don't think splitting them up into half a dozen directories adds > >> anything except confusion. If you want, the header comment for each > >> file could mention where the corresponding replay code lives. > > > Done that way. > > Looks pretty sane to me. The only thing that seems worth discussing is > whether to put the smgr-related XLOG record declarations in storage.h > as you have here, or to invent a new, more private header for them. > > There are XLOG record declarations cluttering a lot of other fairly > public headers; but given that we've invented files like heapam_xlog.h > of late, maybe we should start trying to push those declarations to > less-visible spots. Or maybe it's not worth the trouble. I think it makes sense in the long term to separate things, even if we don't go out of our ways in the current patch to clean all existing uses. It's fairly simple for the case at hand (attached). With this there are a couple of files that don't need storage.h anymore (only storage_xlog.h). Not a mind-boggling change, I admit. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services