Getting rid of cmin and cmax - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Getting rid of cmin and cmax
Date
Msg-id 450FF1D6.5070500@enterprisedb.com
Whole thread Raw
Responses Re: Getting rid of cmin and cmax  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
We currently use 4 int32s to store xmin, xmax, cmin, cmax and xvac on 
every heap tuple. That's a lot of overhead, especially on tables with 
narrow rows. Reduction in header size would give us considerable space 
and I/O savings.

I'm thinking of removing cmin and cmax, and keeping that information in 
backend-private memory instead. cmin and cmax are only interesting to 
the inserting/deleting transaction, so using precious tuple header space 
for that is a waste. This has been discussed before, and Manfred Koizar 
even had a patch for 7.4 but didn't submit it because it was incomplete 
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg00172.php). 
BTW: Manfred, do you still have the patch? It'd be interesting to look 
at, even if it's not finished.

This reduces the tuple header size by 4 bytes, or 8 bytes if we can 
later get rid of xvac as well.

There's some interesting properties we can exploit in implementing the 
backend-private storage:

1. in small OLTP transactions that touch few rows, the cmin/cmax 
information will easily fit in memory.

2. if current commandid == 1, every tuple with xmin == current xid is 
not visible. Similarly, every tuple with xmax = current xid is visible. 
So we don't need to store anything for the first command in a transaction.

3. we can forget modifications by command X as soon as there's no live 
snapshots with curcid <= X.

4. we don't need to record the information, if there's no live 
snapshots, and we know that the current command is not going to read 
existing rows.

These optimizations take care of bulk inserts nicely. In particular, 
pg_restore and similar applications wouldn't need to keep track of 
inserted records.

By choosing a clever data structure, we might be able to get away 
without spilling to disk. For example, a hash table works great for 
small transactions. If a transaction does bulk modifications, a bitmap 
per relation could be used. And to optimize for the cases when the 
information is not actually used, we can just collect the information to 
an array, and turn it into a more lookup-friendly data structure the 
first time it's needed.

Even if we do have to spill to disk on large transactions that do 
massive updates and selects, it would still be a win for most databases. 
And it penalizes the transactions that do massive updates, instead of 
every transaction in the system as we do now.

Comments?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Tom Dunstan
Date:
Subject: Re: An Idea for OID conflicts
Next
From: Gevik Babakhani
Date:
Subject: Re: [PATCHES] Patch for UUID datatype (beta)