Re: Frequent Update Project: Design Overview of HOT Updates - Mailing list pgsql-hackers

From Zeugswetter Andreas ADI SD
Subject Re: Frequent Update Project: Design Overview of HOT Updates
Date
Msg-id E1539E0ED7043848906A8FF995BDA579017C0928@m0143.s-mxs.net
Whole thread Raw
In response to Re: Frequent Update Project: Design Overview of HOT Updates  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Frequent Update Project: Design Overview of HOTUpdates
List pgsql-hackers
> > 1. It doubles the IO (original page + hot page), if the new row
would
> >     have fit into the original page.
>
> That's an awfully big IF there. Even if you use a fillfactor
> of 50% in which case you're paying a 100% performance penalty

I don't see where the 50% come from ? That's only needed if you update
all
rows on the page. And that in a timeframe, that does not allow reuse of
other
outdated tuples.
> > 4. although at first it might seem so I see no advantage for vacuum
> > with overflow
>
> The main problem with vacuum now is that it must scan the
> entire table (and the entire index) even if only a few
> records are garbage. If we isolate the garbage in a separate
> area then vacuum doesn't have to scan unrelated tuples.
>
> I'm not sure this really solves that problem because there
> are still DELETEs to consider but it does remove one factor
> that exacerbates it unnecessarily.

Yea, so you still need to vaccum the large table regularly.

> I think the vision is that the overflow table would never be
> very large because it can be vacuumed very aggressively. It
> has only tuples that are busy and will need vacuuming as soon
> as a transaction ends. Unlike the main table which is mostly
> tuples that don't need vacuuming.

Ok, but you have to provide an extra vacuum that does only that then
(and it randomly touches heap pages, and only does partial work there).

> > 5. the size reduction of heap is imho moot because you trade it for
a
> > growing overflow
> >     (size reduction only comes from reusing dead tuples and
> not adding
> > index tuples --> SITC)
>
> I think you're comparing the wrong thing.

I mean unless you do individually vacuum the overflow more frequently

> Size isn't a problem in itself, size is a problem because it causes
extra
> i/o.

Yes, and I state that at all possible occations :-) OnDisk size is a
problem, really.

> So a heap that's double in size necessary takes twice as
> long as necessary to scan. The fact that the overflow tables
> are taking up space isn't interesting if they don't have to
> be scanned.

The overflow does have to be read for each seq scan. And it was stated
that it would
be accessed with random access (follow tuple chain).
But maybe we can read the overflow same as if it where an additional
segment file ?

> Hitting the overflow tables should be quite rare, it only
> comes into play when looking at concurrently updated tuples.
> It certainly happens but most tuples in the table will be
> committed and not being concurrently updated by anyone else.

The first update moves the row to overflow, only the 2nd next might be
able to pull it back.
So on average you would have at least 66% of all updated rows after last
vacuum in the overflow.

The problem with needing very frequent vacuums is, that you might not be
able to do any work because of long transactions.

Andreas


pgsql-hackers by date:

Previous
From: "Heikki Linnakangas"
Date:
Subject: Re: Protocol specs
Next
From: "Zeugswetter Andreas ADI SD"
Date:
Subject: Re: Frequent Update Project: Design Overview of HOTUpdates