Dimitrios Apostolou <jimis@gmx.net> writes:
> Regarding the attached patch (rebased and edited commit message), it
> basically replaces seek(up to 1MB forward) with read(). The 1MB number
> comes a bit out of the top of my head. But tweaking it between 128KB and
> 1MB wouldn't really change anything, given that the block size is now
> 128KB: The read() will always be chosen against the seek(). Do you know
> of a real-world case with block sizes >128KB?
Yeah, with the recent changes I'd expect table data to pretty much
always consist of blocks around 128K, unless the table is smaller
than that of course.
I experimented with this patch locally and came away not too
impressed; it seems the results may be highly platform-dependent.
In the interests of having a common benchmark case that's easy to
replicate, here's precisely what I did:
Use non-assert build of current HEAD (4bea91f21 at the moment).
$ createdb bench
$ time pgbench -i -s 10000 bench
real 14m40.474s
user 1m26.717s
sys 0m5.045s
$ psql bench
...
bench=# create table zedtable(f1 int);
CREATE TABLE
bench=# insert into zedtable values(42);
INSERT 0 1
bench=# \q
$ time pg_dump -Fc --compress=none bench | cat >bench10000.dump
real 7m48.969s
user 0m36.334s
sys 1m35.209s
(At this -s value, the database occupies about 147G and the dump
file about 95G. It's important the dump file not fit in RAM.)
$ time pg_restore -f /dev/null -t zedtable bench10000.dump
real 1m12.646s
user 0m0.355s
sys 0m5.083s
This compares rather favorably to "cat":
$ time cat bench10000.dump >/dev/null
real 3m6.627s
user 0m0.167s
sys 0m30.999s
I then applied your patch and repeated the restore run:
$ time pg_restore -f /dev/null -t zedtable bench10000.dump
real 2m39.138s
user 0m0.386s
sys 0m28.493s
So for me, the proposed patch actually makes it 2X slower.
Watching it with "iostat 1", I'm seeing about 40MB/s disk read
rate with HEAD, and 500MB/s with the patch; "cat" also shows
read rate around 500MB/s. So yeah, we can saturate the disk
interface by doing all reads and no seeks, but that doesn't
net out faster.
I did this on a few-years-old Dell Precision 5820 workstation.
The specs for it are a bit vague about the disk subsystem:
Storage Drive Controllers
Integrated Intel AHCI SATA chipset controller (8x 6.0Gb/s), SW RAID 0,1,5,10
Storage Drive
2.5 1.92TB SATA AG Enterprise Solid State Drive
and hdparm isn't enormously helpful either:
ATA device, with non-removable media
Model Number: SSDSC2KB019T8R
Serial Number: PHYF1291017A1P9DGN
Firmware Revision: XCV1DL69
Media Serial Num:
Media Manufacturer:
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Used: unknown (minor revision code 0x006d)
Supported: 10 9 8 7 6 5
Likely used: 10
I'm running RHEL 8.10, file system is xfs.
So I find this a bit discouraging. It's not clear why you're seeing a
win and I'm not, and it's even less clear whether there'd be enough
of a win across enough platforms to make it worth trying to engineer
a solution that helps many more people than it hurts.
regards, tom lane