Postgres-R: tuple serialization - Mailing list pgsql-hackers

From Markus Wanner
Subject Postgres-R: tuple serialization
Date
Msg-id 48859484.70306@bluegap.ch
Whole thread Raw
Responses Re: Postgres-R: tuple serialization  (Decibel! <decibel@decibel.org>)
List pgsql-hackers
Hi,

yesterday, I promised to outline the requirements of Postgres-R for 
tuple serialization, which we have been talking about before. There are 
basically three types of how to serialize tuple changes, depending on 
whether they originate from an INSERT, UPDATE or DELETE. For updates and 
deletes, it saves the old pkey as well as the origin (a global 
transaction id) of the tuple (required for consistent serialization on 
remote nodes). For inserts and updates, all added or changed attributes 
need to be serialized as well.
           pkey+origin    changes  INSERT        -            x  UPDATE        x            x  DELETE        x
 -
 

Note, that the pkey attributes may never be null, so an isnull bit field 
can be skipped for those attributes. For the insert case, all attributes 
(including primary key attributes) are serialized. Updates require an 
additional bit field (well, I'm using chars ATM) to store which 
attributes have changed. Only those should be transferred.

I'm tempted to unify that, so that inserts are serialized as the 
difference against the default vaules or NULL. That would make things 
easier for Postgres-R. However, how about other uses of such a fast 
tuple applicator? Does such a use case exist at all? I mean, for 
parallelizing COPY FROM STDIN, one certainly doesn't want to serialize 
all input tuples into that format before feeding multiple helper 
backends. Instead, I'd recommend letting the helper backends do the 
parsing and therefore parallelize that as well.

For other features, like parallel pg_dump or even parallel query 
execution, this tuple serialization code doesn't help much, IMO. So I'm 
thinking that optimizing it for Postgres-R's internal use is the best 
way to go.

Comments? Opinions?

Regards

Markus


pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: [WIP] collation support revisited (phase 1)
Next
From: Peter Eisentraut
Date:
Subject: Re: Do we really want to migrate plproxy and citext into PG core distribution?