Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE
Date
Msg-id CA+TgmoYPpG5_hDRGyO_PB--Mwbsr2WMjeDSN-xh_d8bjiBTcBw@mail.gmail.com
Whole thread Raw
In response to Re: Commit 86dc90056 - Rework planning and execution of UPDATE and DELETE  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Apr 19, 2021 at 1:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That doco is explaining the users-eye view of it.  Places addressed
> to datatype developers, such as the CREATE TYPE reference page, see
> it a bit differently.  CREATE TYPE for instance points out that
>
>     All storage values other than plain imply that the functions of the
>     data type can handle values that have been toasted, as described in ...

Interesting. It feels to me like SET STORAGE PLAIN feels like it is
really trying to be two different things. Either you want to inhibit
compression and external storage for performance reasons, or your data
type can't support either one. Maybe we should separate those
concepts, since there's no mode right now that says "don't ever
compress, and externalize only if there's absolutely no other way,"
and there's no way to disable compression and externalization without
also killing off short headers. :-(

> The notion that short header doesn't cost anything seems extremely Intel-centric to me.

I don't think so. It's true that Intel is very forgiving about
unaligned accesses compared to some other architectures, but I think
if you have a terabyte of data, you want it to fit into as few disk
pages as possible pretty much no matter what architecture you're
using. The dominant costs are going to be the I/O costs, not the CPU
costs of dealing with unaligned bytes. In fact, even if you have a
gigabyte of data, I bet it's *still* better to use a more compact
on-disk representation. Now, the dominant cost is going to be pumping
the data through the L3 CPU cache, which is still - I think - going to
be quite a lot more important than the CPU costs of dealing with
unaligned bytes. The CPU bus is an I/O bottleneck not unlike the disk
itself, just at a higher rate of speed which is still way slower than
the CPU speed. Now if you have a megabyte of data, or better yet a
kilobyte of data, then I think optimizing for CPU efficiency may well
be the right thing to do. I don't know how much 4-byte varlena headers
really save there, but if I were designing a storage representation
for very small data sets, I'd definitely be thinking about how I could
waste space to shave cycles.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Ondřej Žižka
Date:
Subject: Synchronous commit behavior during network outage
Next
From: Mark Dilger
Date:
Subject: Re: pg_amcheck option to install extension