Re: Reducing tuple overhead - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Reducing tuple overhead
Date
Msg-id CANP8+jKV09FwSwjYtcAhonFKZDP7LO_BgWTSh2=-jWTQywT76Q@mail.gmail.com
Whole thread Raw
In response to Reducing tuple overhead  (Andres Freund <andres@anarazel.de>)
Responses Re: Reducing tuple overhead  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 23 April 2015 at 17:24, Andres Freund <andres@anarazel.de> wrote:
Split into a new thread, the other one is already growing fast
enough. This discussion started at
http://archives.postgresql.org/message-id/55391469.5010506%40iki.fi

On April 23, 2015 6:48:57 PM GMT+03:00, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>Stop right there. You need to reserve enough space on the page to store
>
>an xmax for *every* tuple on the page. Because if you don't, what are
>you going to do when every tuple on the page is deleted by a different
>transaction.
>
>Even if you store the xmax somewhere else than the page header, you
>need
>to reserve the same amount of space for them, so it doesn't help at
>all.

Depends on how you do it and what you optimize for (disk space, runtime,
code complexity..).  You can e.g. use apply a somewhat similar trick to
xmin/xmax as done to cmin/cmax; only that the data structure needs to be
persistent.

In fact, we already have combocid like structure for xids that's
persistent - multixacts. We could just have one xid saved that's either
xmin or xmax (indicated by bits) or a multixact.  When a tuple is
updated/deleted whose xmin is still required we could replace the former
xmin with a multixact, otherwise just change the flag that it's now a
xmax without a xmin.  To check visibility and if the xid is a multixact
we'd just have to look for the relevant member for the actual xmin and
xmax.

To avoid exessive overhead when a tuple is repeatedly updated within one
session we could store some of the data in the combocid entry that we
anyway need in that case.

Whether that's feasible complexity wise is debatable, but it's certainly
possible.


I do wonder what, in realistic cases, is actually the bigger contributor
to the overhead. The tuple header or the padding we liberally add in
many cases...

It's hard to see how to save space there without reference to a specific use case. I see different solutions depending upon whether we assume a low number of transactions or a high number of transactions.

A case we could optimize for is insert-mostly tables. But in that case if you get rid of the xmax then you still have to freeze the tuples later.

I would have thought a better optimization would be to use the xmax for the xid epoch by default, so that such rows would never need freezing. Then at least we are using the xmax for something useful in a larger insert-mostly database rather than just leaving it at zero.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Reducing tuple overhead
Next
From: Tomas Vondra
Date:
Subject: Re: [Proposal] More Vacuum Statistics