Thread: Loading the latest N rows into the cache seems way too fast.

Loading the latest N rows into the cache seems way too fast.

From
Ron Johnson
Date:
PG 9.6.24 and PG 14.15, if it matters.
(Yes, 9.6 is really EOL.  I don't control that.)

(I could use pg_prewarm, but the table is much bigger than RAM, and last_block value only has the newest record if data has never been deleted.  The oldest records regularly get deleted, and then the table is vacuumed; thus, new records can be anywhere in the table.)

Thus, roll my own cache-loading statement.

The bigint "id" column in "mytbl" is populated from a sequence, and so is monotonically increasing: the newest records will have the biggest id values.
The table also has a bytea column that averages about 100KB.

Loading 200K rows is more than 200MB.  I expected this "prewarm" statement to take much longer than 1/2 second.  Am I still in the dark ages of computer speed, or is this statement not doing what I hope it's doing?

$ time psql -h foo bar -Xc "DO \$\$ BEGIN PERFORM * FROM mytbl ORDER BY id DESC LIMIT 200000 ; END \$\$;"
DO

real    0m0.457s
user    0m0.005s
sys     0m0.004s

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Loading the latest N rows into the cache seems way too fast.

From
Christoph Moench-Tegeder
Date:
## Ron Johnson (ronljohnsonjr@gmail.com):

> Loading 200K rows is more than 200MB.  I expected this "prewarm" statement
> to take much longer than 1/2 second.  Am I still in the dark ages of
> computer speed, or is this statement not doing what I hope it's doing?
> 
> $ time psql -h foo bar -Xc "DO \$\$ BEGIN PERFORM * FROM mytbl ORDER BY id
> DESC LIMIT 200000 ; END \$\$;"

You can check what that statement does - e.g. in pg_stat_statements,
or (on an idle database, so the effects aren't lost in the noise) in
pg_stat_database or pg_statio_user_tables.
Between what the storage components of the last decade (e.g. those
SATA SSDs which are already being replaced in the market by NVME)
can deliver (>400MB/s, often marketed as ">500 MB/s" but on SATA that's
optimistic) and the fact that there are most likely some blocks
in the database' buffer and/or the OS buffer, the observed throughput
is not neccessarily unrealistic. With modern "server" hardware, getting
throughput in the "gigabytes per second" range is considered normal and
expected.

Regards,
Christoph

-- 
Spare Space



Re: Loading the latest N rows into the cache seems way too fast.

From
Tom Lane
Date:
Ron Johnson <ronljohnsonjr@gmail.com> writes:
> The bigint "id" column in "mytbl" is populated from a sequence, and so is
> monotonically increasing: the newest records will have the biggest id
> values.
> The table also has a bytea column that averages about 100KB.

> Loading 200K rows is more than 200MB.  I expected this "prewarm" statement
> to take much longer than 1/2 second.  Am I still in the dark ages of
> computer speed, or is this statement not doing what I hope it's doing?

It's not pulling in the TOAST storage where the bytea column lives.
(pg_prewarm wouldn't have either, without special pushups.)

            regards, tom lane



Re: Loading the latest N rows into the cache seems way too fast.

From
Ron Johnson
Date:
On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ron Johnson <ronljohnsonjr@gmail.com> writes:
> The bigint "id" column in "mytbl" is populated from a sequence, and so is
> monotonically increasing: the newest records will have the biggest id
> values.
> The table also has a bytea column that averages about 100KB.

> Loading 200K rows is more than 200MB.  I expected this "prewarm" statement
> to take much longer than 1/2 second.  Am I still in the dark ages of
> computer speed, or is this statement not doing what I hope it's doing?

It's not pulling in the TOAST storage where the bytea column lives.
(pg_prewarm wouldn't have either, without special pushups.)

Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the bytea column's name?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Loading the latest N rows into the cache seems way too fast.

From
Tom Lane
Date:
Ron Johnson <ronljohnsonjr@gmail.com> writes:
> On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's not pulling in the TOAST storage where the bytea column lives.
>> (pg_prewarm wouldn't have either, without special pushups.)

> Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the
> bytea column's name?

You'd have to do something that actually used the column's value,
perhaps "md5(byteacol)" or such.  (The obvious candidate would be
length(), but I think that is optimized to not fetch or decompress
the whole value.)

            regards, tom lane



Re: Loading the latest N rows into the cache seems way too fast.

From
"David G. Johnston"
Date:
On Mon, Feb 17, 2025 at 2:41 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ron Johnson <ronljohnsonjr@gmail.com> writes:
> The bigint "id" column in "mytbl" is populated from a sequence, and so is
> monotonically increasing: the newest records will have the biggest id
> values.
> The table also has a bytea column that averages about 100KB.

> Loading 200K rows is more than 200MB.  I expected this "prewarm" statement
> to take much longer than 1/2 second.  Am I still in the dark ages of
> computer speed, or is this statement not doing what I hope it's doing?

It's not pulling in the TOAST storage where the bytea column lives.
(pg_prewarm wouldn't have either, without special pushups.)

Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the bytea column's name?


It's more about the system optimizing away data retrieval because you've indicated you don't care about the contents due to using PERFORM.  All it needs is a pointer to represent the future data, not the data itself.  And PERFORM will never resolve that pointer by itself - so as Tom said your query would need to force pointer resolution by computing on the data.

David J.

Re: Loading the latest N rows into the cache seems way too fast.

From
Ron Johnson
Date:
On Mon, Feb 17, 2025 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ron Johnson <ronljohnsonjr@gmail.com> writes:
> On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's not pulling in the TOAST storage where the bytea column lives.
>> (pg_prewarm wouldn't have either, without special pushups.)

> Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the
> bytea column's name?

You'd have to do something that actually used the column's value,
perhaps "md5(byteacol)" or such.  (The obvious candidate would be
length(), but I think that is optimized to not fetch or decompress
the whole value.)

That's definitely taking a LOT longer...

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!