Hi Tom,
(Please correct me where I'm wrong)
Is it possible to reduce the performance impact of dead tuples esp when the
index is used? Right now performance goes down gradually till we vacuum
(something like a 1/x curve).
My limited understanding of current behaviour is the search for a valid
row's tuple goes from older tuples to newer ones via forward links (based
on some old docs[1]).
How about searching from newer tuples to older tuples instead, using
backward links?
My assumption is newer tuples are more likely to be wanted than older ones
- and so the number of tuples to search through will be less this way.
**If index update is ok.
If a tuple is inserted, the index record is updated to point to inserted
tuple, and the inserted tuple is made to point to a previous tuple.
e.g.
Index-> old tuple->older tuple->oldest tuple
Index-> New tuple->old tuple->older tuple->oldest tuple
**if index update not desirable
Index points to first tuple (valid or not).
If a tuple is inserted, the first tuple is updated to point to inserted
tuple, and the inserted tuple is made to point to a previous tuple.
e.g.
Index-> first tuple->old tuple->older tuple->oldest tuple
Index-> first tuple-> New tuple->old tuple->older tuple->oldest tuple
If this is done performance might not deterioriate as much when using index
scans right? I'm not sure if a backward links would help for sequential
scans, which are usually best done forward.
Regards,
Link.
[1] http://developer.postgresql.org/pdf/transactions.pdf
Tuple headers contain:
xmin: transaction ID of inserting transaction
xmax: transaction ID of replacing/ deleting transaction (initially NULL)
forward link: link to newer version of same logical row, if any
Basic idea: tuple is visible if xmin is valid and xmax is not. "Valid"
means
"either committed or the current transaction".
If we plan to update rather than delete, we first add new version of row
to table, then set xmax and forward link in old tuple. Forward link will
be needed by concurrent updaters (but not by readers).
At 10:53 AM 5/1/02 -0400, Tom Lane wrote:
>estimates. [ thinks... ] Actually I think we might just be
>double-counting if we did. The dead tuples surely should not count as
>part of the number of returned rows. We already do account for the
>I/O effort to read them (because I/O is estimated based on the total