Re: trying again to get incremental backup - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: trying again to get incremental backup |
Date | |
Msg-id | CA+TgmoZfyTs5JR18omKd+C50rQaXSRKRbDmbqYQ=m-nt1+HZKQ@mail.gmail.com Whole thread Raw |
In response to | Re: trying again to get incremental backup (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: trying again to get incremental backup
|
List | pgsql-hackers |
On Wed, Oct 25, 2023 at 7:54 AM Andrew Dunstan <andrew@dunslane.net> wrote: > Robert asked me to work on this quite some time ago, and most of this > work was done last year. > > Here's my WIP for an incremental JSON parser. It works and passes all > the usual json/b tests. It implements Algorithm 4.3 in the Dragon Book. > The reason I haven't posted it before is that it's about 50% slower in > pure parsing speed than the current recursive descent parser in my > testing. I've tried various things to make it faster, but haven't made > much impact. One of my colleagues is going to take a fresh look at it, > but maybe someone on the list can see where we can save some cycles. > > If we can't make it faster, I guess we could use the RD parser for > non-incremental cases and only use the non-RD parser for incremental, > although that would be a bit sad. However, I don't think we can make the > RD parser suitable for incremental parsing - there's too much state > involved in the call stack. Yeah, this is exactly why I didn't want to use JSON for the backup manifest in the first place. Parsing such a manifest incrementally is complicated. If we'd gone with my original design where the manifest consisted of a bunch of lines each of which could be parsed separately, we'd already have incremental parsing and wouldn't be faced with these difficult trade-offs. Unfortunately, I'm not in a good position either to figure out how to make your prototype faster, or to evaluate how painful it is to keep both in the source tree. It's probably worth considering how likely it is that we'd be interested in incremental JSON parsing in other cases. Maintaining two JSON parsers is probably not a lot of fun regardless, but if each of them gets used for a bunch of things, that feels less bad than if one of them gets used for a bunch of things and the other one only ever gets used for backup manifests. Would we be interested in JSON-format database dumps? Incrementally parsing JSON LOBs? Either seems tenuous, but those are examples of the kind of thing that could make us happy to have incremental JSON parsing as a general facility. If nobody's very excited by those kinds of use cases, then this just boils down to whether we want to (a) accept that users with very large numbers of relation files won't be able to use pg_verifybackup or incremental backup, (b) accept that we're going to maintain a second JSON parser just to enable that use cas and with no other benefit, or (c) undertake to change the manifest format to something that is straightforward to parse incrementally. I think (a) is reasonable short term, but at some point I think we should do better. I'm not really that enthused about (c) because it means more work for me and possibly more arguing, but if (b) is going to cause a lot of hassle then we might need to consider it. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: