Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAFiTN-u6siWffXo=tEa62pVCw9S67pRTDQbD=UZ+WqTO_ftmNQ@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs
List pgsql-hackers
On Fri, Jul 19, 2019 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jul 11, 2019 at 9:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Jul 11, 2019 at 12:38 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > I don't like the fact that undoaccess.c has a new global,
> > > undo_compression_info.  I haven't read the code thoroughly, but do we
> > > really need that?  I think it's never modified (so it could just be
> > > declared const),
> >
> > Actually, this will get modified otherwise across undo record
> > insertion how we will know what was the values of the common fields in
> > the first record of the page.  Another option could be that every time
> > we insert the record, read the value from the first complete undo
> > record on the page but that will be costly because for every new
> > insertion we need to read the first undo record of the page.
> >
>
> This information won't be shared across transactions, so can't we keep
> it in top transaction's state?   It seems to me that will be better
> than to maintain it as a global state.

I think this idea is good for the DO time but during REDO time it will
not work as we will not have the transaction state.  Having said that
the current idea of keeping in the global variable will also not work
during REDO time because the WAL from the different transaction can be
interleaved.  There are few ideas to handle this issue

1.  At DO time keep in TopTransactionState as you suggested and during
recovery time read from the first complete record on the page.
2.  Just to keep the code uniform always read from the first complete
record of the page.

After putting more though I am more inclined towards idea-2.  Because
we are anyway inserting our current record into that page basically we
have read the buffer and also holds the exclusive lock on the buffer.
So reading a few extra bytes from the buffer will not hurt us IMHO.

If someone has a better solution please suggest.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Contribution to Perldoc for TestLib module in Postgres
Next
From: Michael Paquier
Date:
Subject: Re: using explicit_bzero