Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row - Mailing list pgsql-hackers
From | Hamid Akhtar |
---|---|
Subject | Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row |
Date | |
Msg-id | CANugjhtG0fygzMfuCCeyqK7dZfvWPRSSzBgSbQfbyNNWMDN9LA@mail.gmail.com Whole thread Raw |
In response to | Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row
Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row |
List | pgsql-hackers |
On Mon, 27 Jun 2022 at 15:52, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Jun 27, 2022 at 1:40 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> Hi,
>
> On 6/27/22 9:31 AM, Hamid Akhtar wrote:
>
>
> Hello Hackers,
>
> While working on one of my blogs on the B-Tree indexes, I needed to look at a range of B-Tree page statistics. So the goto solution was to use pageinspect. However, reviewing stats for multiple pages meant issuing multiple queries.
+1 to improve the API.
> I felt that there's an opportunity for improvement in the extension by extending the API to output the statistics for multiple pages with a single query.
>
> That attached patch is based on the master branch. It makes the following changes to the pageinspect contrib module:
> - Updates bt_page_stats_internal function to accept 3 arguments instead of 2.
> - The function now uses SRF macros to return a set rather than a single row. The function call now requires specifying column names.
>
> The extension version is bumped to 1.11 (PAGEINSPECT_V1_11).
> To maintain backward compatibility, for versions below 1.11, the multi-call mechanism is ended to keep the old behavior consistent.
>
> Regression test cases for the module are updated as well as part of this change. Here is a subset of queries that are added to the btree.sql test case file for pageinspect.
>
> ----
> CREATE TABLE test2 AS (SELECT generate_series(1, 5000) AS col1);
> CREATE INDEX test2_col1_idx ON test2(col1);
> SELECT * FROM bt_page_stats('test2_col1_idx', 1, 2);
>
> For example, this could be written as:
>
> select * from
> generate_series(1, 2) blkno ,
> bt_page_stats('test2_col1_idx',blkno::int);
>
> Or, if one wants to inspect to whole relation, something like:
>
> select * from
> generate_series(1, pg_relation_size('test2_col1_idx'::regclass::text) / 8192 - 1) blkno ,
> bt_page_stats('test2_col1_idx',blkno::int);
Good one. But not all may know the alternatives.
+1
Do we have any
difference in the execution times for the above query vs the new
function introduced in the v1 patch? If there's not much difference, I
would suggest adding an SQL function around the generate_series
approach in the pageinspect extension for better and easier usability.
Based on some basic SQL execution time comparison of the two approaches, I see that the API change, on average, is around 40% faster than the SQL.
CREATE TABLE test2 AS (SELECT generate_series(1, 5000000) AS col1);
CREATE INDEX test2_col1_idx ON test2(col1);
EXPLAIN ANALYZE
SELECT * FROM bt_page_stats('test2_col1_idx', 1, 5000);
EXPLAIN ANALYZE
SELECT * FROM GENERATE_SERIES(1, 5000) blkno, bt_page_stats('test2_col1_idx',blkno::int);
CREATE INDEX test2_col1_idx ON test2(col1);
EXPLAIN ANALYZE
SELECT * FROM bt_page_stats('test2_col1_idx', 1, 5000);
EXPLAIN ANALYZE
SELECT * FROM GENERATE_SERIES(1, 5000) blkno, bt_page_stats('test2_col1_idx',blkno::int);
For me, the API change returns back the data in around 74ms whereas the SQL returns it in 102ms. So considering this and as you mentioned, the alternative may not be that obvious to everyone, it is a fair improvement.
Regards,
Bharath Rupireddy.
pgsql-hackers by date: