Re: zheap: a new storage format for PostgreSQL - Mailing list pgsql-hackers

From Robert Haas
Subject Re: zheap: a new storage format for PostgreSQL
Date
Msg-id CA+TgmoZza33GTmbdEgGgMfvSVOJ9Uav8ZMFt7f1bX-55=m2Sgw@mail.gmail.com
Whole thread Raw
In response to Re: zheap: a new storage format for PostgreSQL  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: zheap: a new storage format for PostgreSQL  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
List pgsql-hackers
On Fri, Mar 2, 2018 at 5:35 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> I would propose "zero-bloat heap" disambiguation of zheap.  Seems like fair
> enough explanation for me without need to rename :)

It will be possible to bloat a zheap table in certain usage patterns.
For example, if you bulk-load the table with a ton of data, commit the
transaction, delete every other row, and then never insert any more
rows ever again, the table is bloated: it's twice as large as it
really needs to be, and we have no provision for shrinking it.  In
general, I think it's very hard to keep bulk deletes from leaving
bloat in the table, and to the extent that it *is* possible, we're not
doing it.  One could imagine, for example, an index-organized table
that automatically combines adjacent pages when they're empty enough,
and that also relocates data to physically lower-numbered pages
whenever possible.  Such a storage engine might automatically shrink
the on-disk footprint after a large delete, but we have no plans to go
in that direction.

Rather, our assumption is that the bloat most people care about comes
from updates.  By performing updates in-place as often as possible, we
hope to avoid bloating both the heap (because we're not adding new row
versions to it which then have to be removed) and the indexes (because
if we don't add new row versions at some other TID, then we don't need
to add index pointers to that new TID either, or remove the old index
pointers to the old TID).  Without delete-marking, we can basically
optimize the case that is currently handled via HOT updates: no
indexed columns have changed.  However, the in-place update has a
major advantage that it still works even when the page is completely
full, provided that the row does not expand.  As Amit's results show,
that can hugely reduce bloat and increase performance in the face of
long-running concurrent transactions.  With delete-marking, we can
also optimize the case where indexed columns have been changed.  We
don't know exactly how well this will work yet because the code isn't
written and therefore can't be benchmarked, but am hopeful that that
in-place updates will be a big win here too.

So, I would not describe a zheap table as zero-bloat, but it should
involve a lot less bloat than our standard heap.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [HACKERS] path toward faster partition pruning
Next
From: Andres Freund
Date:
Subject: Re: Optimize Arm64 crc32c implementation in Postgresql