Re: logical changeset generation v6.2 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: logical changeset generation v6.2 |
Date | |
Msg-id | CA+TgmoatGr1nqgbqj8UT55m1jSS_uN8n=jp0m6iOduqJHg9LOg@mail.gmail.com Whole thread Raw |
In response to | Re: logical changeset generation v6.2 (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: logical changeset generation v6.2
Re: logical changeset generation v6.2 |
List | pgsql-hackers |
On Mon, Oct 14, 2013 at 9:51 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Well, I just think relying on specific symbol names in the .so file is >> kind of unfortunate. It means that, for example, you can't have >> multiple output plugins provided by a single .so. And in general I >> think it's something that we've tried to minimize. > > But that's not really different when you rely on _PG_init doing it's > thing, right? Sure, that's true. But in general I think magic symbol names aren't a particularly good design. >> But there's only so much information available here. Why not just >> have a format that logs it all? > > Because we do not know what "all" is? Also, how would we handle > replication sets and such that all of the existing replication solutions > have generically? I don't see how you can fail to know what "all" is. There's only a certain set of facts available. I mean you could log irrelevant crap like a random number that you just picked or the sum of all numeric values in the column, but nobody's likely to want that. What people are going to want is the operation performed (insert, update, or delete), all the values in the new tuple, the key values from the old tuple, the transaction ID, and maybe some meta-information about the transaction (such as the commit timestamp). What I'd probably do is emit the data in CSV format, with the first column of each line being a single character indicating what sort of row this is: H means a header row, defining the format of subsequent rows (H,table_name,new_column1,...,new_columnj,old_key_column1,...,old_key_columnk; a new header row is emitted only when the column list changes); I, U, or D means an insert, update, or delete, with column 2 being the transaction ID, column 3 being the table name, and the remaining columns matching the last header row for emitted for that table, T means meta-information about a transaction, whatever we have (e.g. T,txn_id,commit_time). There's probably some further tweaking of that that could be done, and I might be overlooking some salient details, like maybe we want to indicate the column types as well as their names, but the range of things that someone can want to do here is not unlimited. The point, for me anyway, is that someone can write a crappy Perl script to apply changes from a file like this in a day. My contention is that there are a lot of people who will want to do just that, for one reason or another. The plugin interface has awesome power and flexibility, and really high-performance replication solutions will really benefit from that. But regular people don't want to write C code; they just want to write a crappy Perl script. And I think we can facilitate that without too much work. >> Oh, yuck. So that means you have to write an extra WAL record for >> EVERY heap insert, update, or delete to a catalog table? OUCH. > > Yes. We could integrate it into the main record without too many > problems, but it didn't seem like an important optimization and it would > have higher chances of slowing down wal_level < logical. Hmm. I don't know whether that's an important optimization or not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: