Thread: Query regarding pg_prewarm extension

Query regarding pg_prewarm extension

From
Ayush Vatsa
Date:

Hi PostgreSQL Community,

I have a question regarding the use of the pg_prewarm() function [1]
in the pg_prewarm extension. The function requires a relation name
and a range of pages (e.g., pages 10 to 50) to be warmed by shifting
them from disk to the buffer cache.

How can I decide which range of pages to prewarm?
I assume that it is related to hot pages in the relation,
but how can I identify which pages are likely to be hot
before they are even in the buffer cache?
Additionally, since tuples within a page can move to
different pages over time (due to operations like VACUUM FULL or REINDEX),
how should I handle this when selecting the pages to prewarm?

Any insights would be greatly appreciated.

Regards,
Ayush Vatsa


[1] https://www.postgresql.org/docs/current/pgprewarm.html#PGPREWARM-FUNCS


Re: Query regarding pg_prewarm extension

From
Jeremy Schneider
Date:
On Fri, 13 Dec 2024 16:16:16 +0530
Ayush Vatsa <ayushvatsa1810@gmail.com> wrote:

> How can I decide which range of pages to prewarm?
> I assume that it is related to hot pages in the relation,
> but how can I identify which pages are likely to be hot
> before they are even in the buffer cache?
> Additionally, since tuples within a page can move to
> different pages over time (due to operations like VACUUM FULL or
> REINDEX), how should I handle this when selecting the pages to
> prewarm?

For my part, I've only used the block offsets when I wanted to fire off
several jobs in parallel, attempting to prewarm a relation faster. I've
never tried to track the location of specific rows for purposes of
prewarming.

You might try the "autoprewarm" feature. After adding pg_prewarm to
your shared_preload_libraries, it will automatically keep track of the
contents of the buffer cache and after a restart it will automatically
prewarm the buffer cache with the blocks that were there before.

https://www.enterprisedb.com/blog/autoprewarm-new-functionality-pgprewarm

Alternatively you could just prewarm a few of your most important hot
tables and indexes with a script after restarts.

For most smaller databases, slightly slower performance for a short
period after startup isn't a problem - while reading blocks from disk
for the first time. After the first read, blocks that are frequently
accessed will remain in the cache. The Postgres cache management
algorithm works well in general.

This is my two cents, anyway

-Jeremy

-- 
http://about.me/jeremy_schneider




Re: Query regarding pg_prewarm extension

From
David Rowley
Date:
On Sun, 29 Dec 2024 at 14:00, Bruce Momjian <bruce@momjian.us> wrote:
> It feels like we should document what the block range is used for, so
> attached is a doc patch to do that.

-   means prewarm through the last block in the relation).  The return value
-   is the number of blocks prewarmed.
+   means prewarm through the last block in the relation).  The block
+   range allows a single relation to be loaded in parallel using multiple
+   concurent function calls.  The return value is the number of blocks
+   prewarmed.

hmm, do we really need to highlight one specific usage for the range
like this? I think mentioning this could just confuse readers as it
makes it sound like using a range is going to magically run something
in parallel. I was confused to what you were talking about until I
read what Jeremy had written in his email.

Another equally legitimate use case would be if the user only wanted
to prewarm a subset of the relation... Actually, I'd imagine that's
probably more common than someone trying to speed this up by kicking
off multiple queries each with their own range. I imagine there's less
need to use the range to speed this up now that we have read steams
and likely there will be even less need when AIO is in.

I think the current wording is ok as it is. But if I'm outvoted,
"concurent" needs another 'r'.

David



Re: Query regarding pg_prewarm extension

From
Ayush Vatsa
Date:
> hmm, do we really need to highlight one specific usage for the range
> like this? I think mentioning this could just confuse readers as it
> makes it sound like using a range is going to magically run something
> in parallel.

I believe highlighting that particular use case would indeed be helpful,
but not directly through a change in the documentation. Instead, it
could be better conveyed through an example.

When I initially read the documentation, I found it unclear how someone
would practically use the range feature. For instance, how would a user
determine the specific range of pages they need in the buffer cache?
Since PostgreSQL doesn’t store data in a fixed order and the order can
change over time due to operations like vacuum or updates, this could
be confusing. Hence, an example illustrating the use case would be
valuable.

> Another equally legitimate use case would be if the user only wanted
> to prewarm a subset of the relation... Actually, I'd imagine that's
> probably more common than someone trying to speed this up by kicking
> off multiple queries each with their own range.

To me, using a range of pages to prewarm a relation doesn’t seem like a
common use case. For example, if a user calls prewarm(100, 200),
how would they decide those specific numbers? While it’s possible to
inspect the contents of those pages, as Jeremy noted, users typically
don’t track the location of specific rows for prewarming purposes.

Regards,
Ayush Vatsa
AWS Aurora

Re: Query regarding pg_prewarm extension

From
Bruce Momjian
Date:
On Mon, Dec 30, 2024 at 10:54:21PM +0530, Ayush Vatsa wrote:
> When I initially read the documentation, I found it unclear how someone
> would practically use the range feature. For instance, how would a user
> determine the specific range of pages they need in the buffer cache?
> Since PostgreSQL doesn’t store data in a fixed order and the order can
> change over time due to operations like vacuum or updates, this could
> be confusing. Hence, an example illustrating the use case would be
> valuable.
> 
> > Another equally legitimate use case would be if the user only wanted
> > to prewarm a subset of the relation... Actually, I'd imagine that's
> > probably more common than someone trying to speed this up by kicking
> > off multiple queries each with their own range.
> 
> To me, using a range of pages to prewarm a relation doesn’t seem like a
> common use case. For example, if a user calls prewarm(100, 200),
> how would they decide those specific numbers? While it’s possible to
> inspect the contents of those pages, as Jeremy noted, users typically
> don’t track the location of specific rows for prewarming purposes.

Yeah, the ranage-of-blocks case is rare, and I hoped to explain it in
the docs, but it seems like that isn't helping, so I retract my patch.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.





Re: Query regarding pg_prewarm extension

From
David Rowley
Date:
On Tue, 31 Dec 2024 at 06:24, Ayush Vatsa <ayushvatsa1810@gmail.com> wrote:
> To me, using a range of pages to prewarm a relation doesn’t seem like a
> common use case. For example, if a user calls prewarm(100, 200),
> how would they decide those specific numbers? While it’s possible to
> inspect the contents of those pages, as Jeremy noted, users typically
> don’t track the location of specific rows for prewarming purposes.

It's probably rarely useful for that exact reason, however, for
insert-only tables (where only recently inserted data is queried),
which pages recently inserted rows are in is more predictable. I can
imagine that someone might want to load some recent percentage of the
table using these parameters.

David