Thread: [PATCH] XLogReader v2

[PATCH] XLogReader v2

From
Andres Freund
Date:
Hi,

Attached is v2 of the patch.

Changes are:
* more comments
* significantly cleaned/simpliefied coded
* crc validation
* addition of XLogReaderReadOne

Definitely needed are:
* better validation of records
* customizable error handling

The first is just work that needs to be done, nothing complicated. 
The second is a bit more complicated:
- We could have an bool had_error and a static char that contains the error 
message, the caller can handle that as wanted
- We could have a callback for error handling

I think I prefer the callback solution.


The second attached patch is a very, very preliminary xlog dumping utility 
which currently is more of a debugging facility (as evidenced by the fact that 
it needs and existing /tmp/xlog directory for writing out data) for the 
XLogReader. It reuses the builtin xlog dumping logic and thus has to link with 
backend code. I couldn't find a really sensible way to do this:

xlogdump: $(OBJS) $(shell find ../../backend ../../timezone -name 
objfiles.txt|xargs cat|tr -s " " "\012"|grep -v /main.o|sed 's/^/..\/..\/..       $(CC) $(CFLAGS) $^ $(LDFLAGS)
$(LDFLAGS_EX)$(LIBS) -o $@$(X)
 

Perhaps somebody has a better idea? I think having an xlogdump utility in 
core/contrib would be a good idea now that it can be done without a huge 
amount of code duplication. I plan to check Satoshi-san's version of xlogdump 
whether I can crib some of the commandline interface and some code from there.

Greetings,

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] XLogReader v2

From
Satoshi Nagayasu
Date:
2012/07/19 19:29, Andres Freund wrote:
> Hi,
>
> Attached is v2 of the patch.
>
> Changes are:
> * more comments
> * significantly cleaned/simpliefied coded
> * crc validation
> * addition of XLogReaderReadOne
>
> Definitely needed are:
> * better validation of records
> * customizable error handling
>
> The first is just work that needs to be done, nothing complicated.
> The second is a bit more complicated:
> - We could have an bool had_error and a static char that contains the error
> message, the caller can handle that as wanted
> - We could have a callback for error handling
>
> I think I prefer the callback solution.
>
>
> The second attached patch is a very, very preliminary xlog dumping utility
> which currently is more of a debugging facility (as evidenced by the fact that
> it needs and existing /tmp/xlog directory for writing out data) for the
> XLogReader. It reuses the builtin xlog dumping logic and thus has to link with
> backend code. I couldn't find a really sensible way to do this:
>
> xlogdump: $(OBJS) $(shell find ../../backend ../../timezone -name
> objfiles.txt|xargs cat|tr -s " " "\012"|grep -v /main.o|sed 's/^/..\/..\/..
>          $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
>
> Perhaps somebody has a better idea? I think having an xlogdump utility in
> core/contrib would be a good idea now that it can be done without a huge
> amount of code duplication. I plan to check Satoshi-san's version of xlogdump
> whether I can crib some of the commandline interface and some code from there.

I agree with that we need more sophisticated way to share the code
between the backend and several utilities (including xlogdump),
but AFAIK, a contrib module must allow to be built *without* the core
source tree.

Any contrib module must be able to be built with only the header files
and the shared libraries when using PGXS. So, it could not assume
that it has the core source tree. (If we need to assume that, I think
xlogdump needs to be put into the core/bin directory.)

On the other hand, I have an issue to improve maintainancability of
the duplicated code at the xlogdump project.

Gather all the code which has been copied from the core.
https://github.com/snaga/xlogdump/issues/26

So, I agree with that we need another way to share the code
between the backend and the related utilities. Any good ideas?


I have one more concern for putting xlogdump into the core.

xlogdump is intended to deliver any new features and enhancements
to all the users who are using not only the latest major version,
but also older major versions maintained by the community, because
xlogdump must be a quite important tool when DBA needs it.

In fact, the latest xlogdump is now supporting 5 major versions,
from 8.3 to 9.2.
https://github.com/snaga/xlogdump/blob/master/README.xlogdump

But AFAIK, putting xlogdump into the core/contrib would mean that
a source tree of each major version could not have a large modification
after each release (or each code freeze, actually).

It would mean that the users using older major version could not take
advantage of new features and enhancements of the latest xlogdump,
but it's not what I wanted, actually.

Regards,

>
> Greetings,
>
> Andres
>
>
>
>


-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp




Re: [PATCH] XLogReader v2

From
Andres Freund
Date:
Hi,

On Thursday, July 19, 2012 07:18:08 PM Satoshi Nagayasu wrote:
> I agree with that we need more sophisticated way to share the code
> between the backend and several utilities (including xlogdump),
> but AFAIK, a contrib module must allow to be built *without* the core
> source tree.
I don't think thats reasonable. The amount of code duplication required to 
support that usecase is just not reasonable. Especially if you want to support 
pre 9.3 and 9.3+.

> So, I agree with that we need another way to share the code
> between the backend and the related utilities. Any good ideas?
Well, the primary patch in the above email was infrastructure to at least make 
reading xlog possible without much internal knowledge. But that will still 
leave the debugging routines... Which imo already is too much.

> It would mean that the users using older major version could not take
> advantage of new features and enhancements of the latest xlogdump,
> but it's not what I wanted, actually.
I personally don't see that as a big problem. If xlogdump really reuses the 
normal infrastructure of the server the amount of code you need to backport is 
*way* much smaller should it ever be actually needed.

> Any contrib module must be able to be built with only the header files
> and the shared libraries when using PGXS. So, it could not assume
> that it has the core source tree. (If we need to assume that, I think
> xlogdump needs to be put into the core/bin directory.)
We possibly could get away by defining an extra .a containing the necessary 
object files. Not nice, but...
Imo it *should* be in src/bin though.

Greetings,

Andres

-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: [PATCH] XLogReader v2

From
Robert Haas
Date:
On Mon, Jul 23, 2012 at 3:19 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Thursday, July 19, 2012 07:18:08 PM Satoshi Nagayasu wrote:
>> I agree with that we need more sophisticated way to share the code
>> between the backend and several utilities (including xlogdump),
>> but AFAIK, a contrib module must allow to be built *without* the core
>> source tree.
> I don't think thats reasonable. The amount of code duplication required to
> support that usecase is just not reasonable. Especially if you want to support
> pre 9.3 and 9.3+.

It seems like the direction this is going is that the xlog reading
stuff should be a library which is used by both the backend and 1 or
more xlog decoding tools.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PATCH] XLogReader v2

From
Andres Freund
Date:
On Monday, July 23, 2012 04:17:39 PM Robert Haas wrote:
> On Mon, Jul 23, 2012 at 3:19 AM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> > On Thursday, July 19, 2012 07:18:08 PM Satoshi Nagayasu wrote:
> >> I agree with that we need more sophisticated way to share the code
> >> between the backend and several utilities (including xlogdump),
> >> but AFAIK, a contrib module must allow to be built *without* the core
> >> source tree.
> > 
> > I don't think thats reasonable. The amount of code duplication required
> > to support that usecase is just not reasonable. Especially if you want
> > to support pre 9.3 and 9.3+.
> 
> It seems like the direction this is going is that the xlog reading
> stuff should be a library which is used by both the backend and 1 or
> more xlog decoding tools.
Thats fine for the xlogreader itself - it only uses stuff from headers. The 
problem is that the xlog debugging/printing infrastructure is pretty much 
guaranteed to include just about the whole backend...

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


Re: [PATCH] XLogReader v2

From
Robert Haas
Date:
On Mon, Jul 23, 2012 at 11:04 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Monday, July 23, 2012 04:17:39 PM Robert Haas wrote:
>> On Mon, Jul 23, 2012 at 3:19 AM, Andres Freund <andres@2ndquadrant.com>
> wrote:
>> > On Thursday, July 19, 2012 07:18:08 PM Satoshi Nagayasu wrote:
>> >> I agree with that we need more sophisticated way to share the code
>> >> between the backend and several utilities (including xlogdump),
>> >> but AFAIK, a contrib module must allow to be built *without* the core
>> >> source tree.
>> >
>> > I don't think thats reasonable. The amount of code duplication required
>> > to support that usecase is just not reasonable. Especially if you want
>> > to support pre 9.3 and 9.3+.
>>
>> It seems like the direction this is going is that the xlog reading
>> stuff should be a library which is used by both the backend and 1 or
>> more xlog decoding tools.
> Thats fine for the xlogreader itself - it only uses stuff from headers. The
> problem is that the xlog debugging/printing infrastructure is pretty much
> guaranteed to include just about the whole backend...

Could that be fixed by moving the debugging routines into a separate
set of files, instead of having them lumped in with the code that
applies those xlog records?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PATCH] XLogReader v2

From
Andres Freund
Date:
On Monday, July 23, 2012 05:11:20 PM Robert Haas wrote:
> On Mon, Jul 23, 2012 at 11:04 AM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> > On Monday, July 23, 2012 04:17:39 PM Robert Haas wrote:
> >> On Mon, Jul 23, 2012 at 3:19 AM, Andres Freund <andres@2ndquadrant.com>
> > 
> > wrote:
> >> > On Thursday, July 19, 2012 07:18:08 PM Satoshi Nagayasu wrote:
> >> >> I agree with that we need more sophisticated way to share the code
> >> >> between the backend and several utilities (including xlogdump),
> >> >> but AFAIK, a contrib module must allow to be built *without* the core
> >> >> source tree.
> >> > 
> >> > I don't think thats reasonable. The amount of code duplication
> >> > required to support that usecase is just not reasonable. Especially
> >> > if you want to support pre 9.3 and 9.3+.
> >> 
> >> It seems like the direction this is going is that the xlog reading
> >> stuff should be a library which is used by both the backend and 1 or
> >> more xlog decoding tools.
> > 
> > Thats fine for the xlogreader itself - it only uses stuff from headers.
> > The problem is that the xlog debugging/printing infrastructure is pretty
> > much guaranteed to include just about the whole backend...
> 
> Could that be fixed by moving the debugging routines into a separate
> set of files, instead of having them lumped in with the code that
> applies those xlog records?
Its a major effort. Those function use elog(), stringinfo and lots of other 
stuff... I am hesitant to start working on that.
On the other hand - I think an in-core xlogdump would be great and sensible 
thing; but I can live with using my hacked up version that simply links to the 
backend...

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


Re: [PATCH] XLogReader v2

From
Robert Haas
Date:
On Mon, Jul 23, 2012 at 12:13 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Could that be fixed by moving the debugging routines into a separate
>> set of files, instead of having them lumped in with the code that
>> applies those xlog records?
> Its a major effort. Those function use elog(), stringinfo and lots of other
> stuff... I am hesitant to start working on that.
> On the other hand - I think an in-core xlogdump would be great and sensible
> thing; but I can live with using my hacked up version that simply links to the
> backend...

The stringinfo thing has long been an annoyance to me.  libpq has
PQExpBuffer which is the exact same thing.  I don't like that we have
two implementations of that in two different code bases, and you have
to remember to spell it right depending on where you are.  I'm not
sure exactly what the best way to fix that is, but it sure is a pain
in the neck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PATCH] XLogReader v2

From
Satoshi Nagayasu
Date:
2012/07/24 1:15, Robert Haas wrote:
> On Mon, Jul 23, 2012 at 12:13 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>>> Could that be fixed by moving the debugging routines into a separate
>>> set of files, instead of having them lumped in with the code that
>>> applies those xlog records?
>> Its a major effort. Those function use elog(), stringinfo and lots of other
>> stuff... I am hesitant to start working on that.
>> On the other hand - I think an in-core xlogdump would be great and sensible
>> thing; but I can live with using my hacked up version that simply links to the
>> backend...
>
> The stringinfo thing has long been an annoyance to me.  libpq has
> PQExpBuffer which is the exact same thing.  I don't like that we have
> two implementations of that in two different code bases, and you have
> to remember to spell it right depending on where you are.  I'm not
> sure exactly what the best way to fix that is, but it sure is a pain
> in the neck.

Does it make sense to make some static library which can be
referred from both the backend and several client utilities,
including libpq? Or just a dynamic link be preferred?

Despite I do not have a clear idea right now, is it time to
start thinking of it?

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp




Re: [PATCH] XLogReader v2

From
Robert Haas
Date:
On Mon, Jul 23, 2012 at 1:03 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>> The stringinfo thing has long been an annoyance to me.  libpq has
>> PQExpBuffer which is the exact same thing.  I don't like that we have
>> two implementations of that in two different code bases, and you have
>> to remember to spell it right depending on where you are.  I'm not
>> sure exactly what the best way to fix that is, but it sure is a pain
>> in the neck.
>
> Does it make sense to make some static library which can be
> referred from both the backend and several client utilities,
> including libpq? Or just a dynamic link be preferred?
>
> Despite I do not have a clear idea right now, is it time to
> start thinking of it?

IMHO, yes.  I'm not sure exactly what the right way to do it is, but I
think we need something along these lines.  We've got some pg_dump
code - in dumputils.c - that is also linked into other applications
such as psql, too, which is another pile of grottiness for which we
need a better solution.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PATCH] XLogReader v2

From
Alvaro Herrera
Date:
Excerpts from Andres Freund's message of jue jul 19 06:29:03 -0400 2012:
> Hi,
>
> Attached is v2 of the patch.

Hello,

I gave this code a quick read some days ago.  Here's the stuff I would
change:

* There are way too many #ifdef VERBOSE_DEBUG stuff for my taste.  It
might look better if you had macros such as elog_debug() that are defined
to empty if VERBOSE_DEBUG is not defined.  (The problem with such an
approach is that you have to get into the business of creating one macro
for each different param count, so elog_debug1(), elog_debug2() and so
on.  It also means you have to count the number of args in each call to
ensure you're calling the right one.)

* In the code beautification front, there are a number of cuddled braces
and improperly indented function declarations.

* I noticed that you have the IDENTIFICATION tag wrong in both .c and .h
files: evidently you renamed the files from readxlog.[ch] to xlogreader.

* There are a few elog(PANIC) calls.  I am not sure that's a very good
idea.  It seems to me that you should be using elog(FATAL) there instead
... or do you really want to make the whole server crash?  OTOH if we
want to make it a true client program, all those elog() calls need to
go.

* XLogReaderRead() seems a bit too long to me.  I would split it with
auxiliary functions -- say "read a header" and "read a record".  (I
mentioned this to Andres on IM and he says he tried that but couldn't
find any nice way to do it.  I may still try to do it.)

* xlogdump's Makefile trick to get all backend object files is ... ugly
(an understatement).  Really we need the *_desc() routines split so that
it can use only those functions, and have a client-side replacement for
StringInfo (discussed elsewhere) and some auxilliary functions such as
relpathbackend() so that it can compile like a normal client.

* why do we pass timeline_id to xlogdump?  I don't see that it's used
anywhere, but maybe I'm missing something?

This is not a full review.  After a new version with these fixes is
published (either by Andres or myself) some more review might find more
serious issues -- I didn't hunt for architectural problems in
XLogReader.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] XLogReader v2

From
Andres Freund
Date:
Hi Alvaro, hi all,

On Tuesday, September 04, 2012 09:33:54 PM Alvaro Herrera wrote:
> Excerpts from Andres Freund's message of jue jul 19 06:29:03 -0400 2012:
> > Hi,
> > 
> > Attached is v2 of the patch.
> 
> Hello,
> 
> I gave this code a quick read some days ago.  Here's the stuff I would
> change:
> 
> * There are way too many #ifdef VERBOSE_DEBUG stuff for my taste.  It
> might look better if you had macros such as elog_debug() that are defined
> to empty if VERBOSE_DEBUG is not defined.  (The problem with such an
> approach is that you have to get into the business of creating one macro
> for each different param count, so elog_debug1(), elog_debug2() and so
> on.  It also means you have to count the number of args in each call to
> ensure you're calling the right one.)
Hm. I am generally not very happy with the logging as is. I don't want to rely 
on elog() at all because that means the code suddently depends on just about 
the whole backend which sucks (see my god ulgy makefile hack for that...).

If we were to use that approach is there a platform that stops us from using 
vararg macros? I *think* it is C99...

I though about having a ->log(format, ...) callback, but that would mean loads 
of places add a unneccesary indirect function call :(

> * In the code beautification front, there are a number of cuddled braces
> and improperly indented function declarations.
I never seem to get those right. I really tried to make a pass over the whole 
file correcting them...

> * I noticed that you have the IDENTIFICATION tag wrong in both .c and .h
> files: evidently you renamed the files from readxlog.[ch] to xlogreader.
Yup. Readxlog was the name before someone (I think Simon) reminded me gently 
that it would be better placed in access/transam/ than replication/logical and 
that seemed to conform better to the local naming rules.

> * There are a few elog(PANIC) calls.  I am not sure that's a very good
> idea.  It seems to me that you should be using elog(FATAL) there instead
> ... or do you really want to make the whole server crash?  OTOH if we
> want to make it a true client program, all those elog() calls need to
> go.
The whole error handling needs to be changed. It really depends on the use-
case how failures should be handled. I am just not sure what the best way 
is...

Just a ->error(severity, message, format) callback?

> * XLogReaderRead() seems a bit too long to me.  I would split it with
> auxiliary functions -- say "read a header" and "read a record".  (I
> mentioned this to Andres on IM and he says he tried that but couldn't
> find any nice way to do it.  I may still try to do it.)
When I tried it the code got even more state-machinery with individual parts 
returning status codes and switch()es around that handling the control flow 
from that... Maybe I have stared at it too long to see the way forward.

> * xlogdump's Makefile trick to get all backend object files is ... ugly
> (an understatement).  Really we need the *_desc() routines split so that
> it can use only those functions, and have a client-side replacement for
> StringInfo (discussed elsewhere) and some auxilliary functions such as
> relpathbackend() so that it can compile like a normal client.
You seem to have a good grasp on that in the other thread...

> * why do we pass timeline_id to xlogdump?  I don't see that it's used
> anywhere, but maybe I'm missing something?
Its only unused because xlogdump as it submitted is just a POC hack... You 
need the timeline id to know which files to open. The only reason the parameter 
isn't parsed is that it is currently hardcoded in the callsites for 
XLogDumpXLogRead/write. At least there are FIXMEs arround it...

> This is not a full review.  After a new version with these fixes is
> published (either by Andres or myself) some more review might find more
> serious issues -- I didn't hunt for architectural problems in
> XLogReader.
Have you already started doing anything about it? I can redo a version but 
before we agree on the strategy for logging & error handling the only thing 
that would change is the cuddly braces and the IDENTIFCATION tags...

Thanks!

Andres
-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] XLogReader v2

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On Tuesday, September 04, 2012 09:33:54 PM Alvaro Herrera wrote:
>> * There are way too many #ifdef VERBOSE_DEBUG stuff for my taste.  It
>> might look better if you had macros such as elog_debug() that are defined
>> to empty if VERBOSE_DEBUG is not defined.  (The problem with such an
>> approach is that you have to get into the business of creating one macro
>> for each different param count, so elog_debug1(), elog_debug2() and so
>> on.  It also means you have to count the number of args in each call to
>> ensure you're calling the right one.)

> Hm. I am generally not very happy with the logging as is. I don't want to rely 
> on elog() at all because that means the code suddently depends on just about 
> the whole backend which sucks (see my god ulgy makefile hack for that...).

elog/ereport are already basically macros.  Can't they be redefined for
use in a standalone program, with just minimal backing code?

> If we were to use that approach is there a platform that stops us from using 
> vararg macros? I *think* it is C99...

C90 is still the project standard, and this is a pretty lame reason to
want to change it.

>> * In the code beautification front, there are a number of cuddled braces
>> and improperly indented function declarations.

> I never seem to get those right. I really tried to make a pass over the whole
> file correcting them...

Install pgindent?
        regards, tom lane



Re: [PATCH] XLogReader v2

From
Andres Freund
Date:
On Sunday, September 09, 2012 08:40:38 PM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On Tuesday, September 04, 2012 09:33:54 PM Alvaro Herrera wrote:
> >> * There are way too many #ifdef VERBOSE_DEBUG stuff for my taste.  It
> >> might look better if you had macros such as elog_debug() that are
> >> defined to empty if VERBOSE_DEBUG is not defined.  (The problem with
> >> such an approach is that you have to get into the business of creating
> >> one macro for each different param count, so elog_debug1(),
> >> elog_debug2() and so on.  It also means you have to count the number of
> >> args in each call to ensure you're calling the right one.)
> > 
> > Hm. I am generally not very happy with the logging as is. I don't want to
> > rely on elog() at all because that means the code suddently depends on
> > just about the whole backend which sucks (see my god ulgy makefile hack
> > for that...).
> 
> elog/ereport are already basically macros.  Can't they be redefined for
> use in a standalone program, with just minimal backing code?
True, its not too hard. I had a *very minimal* version that just forwarded to 
vfprintf before ditching that because I needed to link to *_desc anyway.

Its a bit ugly though if you want to use the same object file for backend and 
standalone code. It means everybody using XLogReader's logging output is tied 
to elog internals.

> > If we were to use that approach is there a platform that stops us from
> > using vararg macros? I *think* it is C99...
> 
> C90 is still the project standard, and this is a pretty lame reason to
> want to change it.
Well, for the most part its a debugging utility, nothing enabled during normal 
builds... But I don't think its an important issue, if it comes to that we can 
do it just the same as elog.h does it. I.e. using a parameterless macro.

> >> * In the code beautification front, there are a number of cuddled braces
> >> and improperly indented function declarations.
> > 
> > I never seem to get those right. I really tried to make a pass over the
> > whole file correcting them...
> 
> Install pgindent?
I have, but it so often generates too much noise in unrelated parts that I 
stopped bothering. Which is a bad excuse in this case because its a new 
file...

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services