Re: Postgres-R: tuple serialization - Mailing list pgsql-hackers

From Decibel!
Subject Re: Postgres-R: tuple serialization
Date
Msg-id D697DBAF-D495-4602-A1EF-E013DB804B0F@decibel.org
Whole thread Raw
In response to Postgres-R: tuple serialization  (Markus Wanner <markus@bluegap.ch>)
Responses Re: Postgres-R: tuple serialization  (Markus Wanner <markus@bluegap.ch>)
List pgsql-hackers
On Jul 22, 2008, at 3:04 AM, Markus Wanner wrote:
> yesterday, I promised to outline the requirements of Postgres-R for  
> tuple serialization, which we have been talking about before. There  
> are basically three types of how to serialize tuple changes,  
> depending on whether they originate from an INSERT, UPDATE or  
> DELETE. For updates and deletes, it saves the old pkey as well as  
> the origin (a global transaction id) of the tuple (required for  
> consistent serialization on remote nodes). For inserts and updates,  
> all added or changed attributes need to be serialized as well.
>
>            pkey+origin    changes
>   INSERT        -            x
>   UPDATE        x            x
>   DELETE        x            -
>
> Note, that the pkey attributes may never be null, so an isnull bit  
> field can be skipped for those attributes. For the insert case, all  
> attributes (including primary key attributes) are serialized.  
> Updates require an additional bit field (well, I'm using chars ATM)  
> to store which attributes have changed. Only those should be  
> transferred.
>
> I'm tempted to unify that, so that inserts are serialized as the  
> difference against the default vaules or NULL. That would make  
> things easier for Postgres-R. However, how about other uses of such  
> a fast tuple applicator? Does such a use case exist at all? I mean,  
> for parallelizing COPY FROM STDIN, one certainly doesn't want to  
> serialize all input tuples into that format before feeding multiple  
> helper backends. Instead, I'd recommend letting the helper backends  
> do the parsing and therefore parallelize that as well.
>
> For other features, like parallel pg_dump or even parallel query  
> execution, this tuple serialization code doesn't help much, IMO. So  
> I'm thinking that optimizing it for Postgres-R's internal use is  
> the best way to go.
>
> Comments? Opinions?

ISTM that both londiste and Slony would be able to make use of these  
improvements as well. A modular replication system should be able to  
use a variety of methods for logging data changes and then applying  
them on a subscriber, so long as some kind of common transport can be  
agreed upon (such as text). So having a change capture and apply  
mechanism that isn't dependent on a lot of extra stuff would be  
generally useful to any replication mechanism.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Do we really want to migrate plproxy and citext into PG core distribution?
Next
From: Markus Wanner
Date:
Subject: Re: Transaction-controlled robustness for replication