Re: WAL usage calculation patch - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: WAL usage calculation patch
Date
Msg-id 20200401142954.GA64485@nol
Whole thread Raw
In response to Re: WAL usage calculation patch  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: WAL usage calculation patch  (Amit Kapila <amit.kapila16@gmail.com>)
Re: WAL usage calculation patch  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
Hi,

I'm replying here to all reviews that have been sent, thanks a lot!

On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
> On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes
> > (Amit's v6 for vacuum and Sawada-san's v2 for create index), with all
> > previously mentionned changes.
> >
> 
> Few other comments:
> v9-0003-Add-infrastructure-to-track-WAL-usage
> 1.
>  static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
> -
> +static void WalUsageAdd(WalUsage *dst, WalUsage *add);
> 
> Looks like a spurious line removal


Fixed.


> 2.
> + /* Report a full page imsage constructed for the WAL record */
> + *num_fpw += 1;
> 
> Typo. /imsage/image


Ah sorry I though I fixed it previously, fixed.


> 3. Doing some testing with and without parallelism to ensure WAL usage
> data is correct would be great and if possible, share the results?


I just saw that Dilip did some testing, but just in case here is some
additional one

- vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"

=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
         query          | calls | wal_bytes | wal_records | wal_num_fpw
------------------------+-------+-----------+-------------+-------------
 vacuum (parallel 3) t1 |     1 |  20098962 |       34104 |           2
 vacuum (parallel 0) t1 |     1 |  20098962 |       34104 |           2
(2 rows)

- create index, overload t1's parallel_workers, using the 1M line just
  vacuumed:

=# alter table t1 set (parallel_workers = 2);
ALTER TABLE

=# create index t1_parallel_2 on t1(id);
CREATE INDEX

=# alter table t1 set (parallel_workers = 0);
ALTER TABLE

=# create index t1_parallel_0 on t1(id);
CREATE INDEX

=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
                query                 | calls | wal_bytes | wal_records | wal_num_fpw
--------------------------------------+-------+-----------+-------------+-------------
 create index t1_parallel_0 on t1(id) |     1 |  20355540 |        2762 |        2745
 create index t1_parallel_2 on t1(id) |     1 |  20406811 |        2762 |        2758
(2 rows)

It all looks good to me.


> v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
> 4.
> +-- SELECT usage data, check WAL usage is reported, wal_records equal
> rows count for INSERT/UPDATE/DELETE
> +SELECT query, calls, rows,
> +wal_bytes > 0 as wal_bytes_generated,
> +wal_records > 0 as wal_records_generated,
> +wal_records = rows as wal_records_as_rows
> +FROM pg_stat_statements ORDER BY query COLLATE "C";
> +                              query                               |
> calls | rows | wal_bytes_generated | wal_records_generated |
> wal_records_as_rows
>
+------------------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
> + DELETE FROM pgss_test WHERE a > $1                               |
>   1 |    1 | t                   | t                     | t
> + DROP TABLE pgss_test                                             |
>   1 |    0 | t                   | t                     | f
> + INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) |
>   1 |    3 | t                   | t                     | t
> + INSERT INTO pgss_test VALUES(generate_series($1, $2), $3)        |
>   1 |   10 | t                   | t                     | t
> + SELECT * FROM pgss_test ORDER BY a                               |
>   1 |   12 | f                   | f                     | f
> + SELECT * FROM pgss_test WHERE a > $1 ORDER BY a                  |
>   2 |    4 | f                   | f                     | f
> + SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5)          |
>   1 |    8 | f                   | f                     | f
> + SELECT pg_stat_statements_reset()                                |
>   1 |    1 | f                   | f                     | f
> + SET pg_stat_statements.track_utility = FALSE                     |
>   1 |    0 | f                   | f                     | t
> + UPDATE pgss_test SET b = $1 WHERE a = $2                         |
>   6 |    6 | t                   | t                     | t
> + UPDATE pgss_test SET b = $1 WHERE a > $2                         |
>   1 |    3 | t                   | t                     | t
> +(11 rows)
> +
> 
> I am not sure if the above tests make much sense as they are just
> testing that if WAL is generated for these commands.  I understand it
> is not easy to make these tests reliable but in that case, we can
> think of some simple tests.  It seems to me that the difficulty is due
> to full_page_writes as that depends on the checkpoint.  Can we make
> full_page_writes = off for these tests and check some simple
> Insert/Update/Delete cases?  Alternatively, if you can present the
> reason why that is unstable or are tricky to write, then we can simply
> get rid of these tests because I don't see tests for BufferUsage.  Let
> not write tests for the sake of writing it unless they can detect bugs
> in the future or are meaningfully covering the new code added.


I don't think that we can have any hope in a stable amount of WAL bytes
generated, so testing a positive number looks sensible to me.  Then testing
that each 1-line-write query generates a WAL record also looks sensible, so I
kept this.  I realized that Kirill used an existing set of queries that were
previously added to validate the multi queries commands behavior, so there's no
need to have all of them again.  I just kept one of each (insert, update,
delete, select) to make sure that we do record WAL activity there, but I don't
think that more can really be done.  I still think that this is better than
nothing, but if you disagree feel free to drop those tests.


> 5.
> -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
> -               query               | calls | rows
> ------------------------------------+-------+------
> - SELECT $1::TEXT                   |     1 |    1
> - SELECT PLUS_ONE($1)               |     2 |    2
> - SELECT PLUS_TWO($1)               |     2 |    2
> - SELECT pg_stat_statements_reset() |     1 |    1
> +SELECT query, calls, rows, wal_bytes, wal_records FROM
> pg_stat_statements ORDER BY query COLLATE "C";
> +               query               | calls | rows | wal_bytes | wal_records
> +-----------------------------------+-------+------+-----------+-------------
> + SELECT $1::TEXT                   |     1 |    1 |         0 |           0
> + SELECT PLUS_ONE($1)               |     2 |    2 |         0 |           0
> + SELECT PLUS_TWO($1)               |     2 |    2 |         0 |           0
> + SELECT pg_stat_statements_reset() |     1 |    1 |         0 |           0
>  (4 rows)
> 
> Again, I am not sure if these modifications make much sense?


Those are queries that were previously executed.  As those are read-only query,
that are pretty much guaranteed to not cause any WAL activity, I don't see how
it hurts to test at the same time that that's we indeed record with
pg_stat_statements, just to be safe.  Once again, feel free to drop the extra
wal_* columns from the output if you disagree.


> 6.
>  static void pgss_shmem_startup(void);
> @@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
>      int query_location, int query_len,
>      double total_time, uint64 rows,
>      const BufferUsage *bufusage,
> +    const WalUsage* walusage,
>      pgssJumbleState *jstate);
> 
> The alignment for walusage doesn't seem to be correct. Running
> pgindent will fix this.


Indeed, fixed.

> 7.
> + values[i++] = Int64GetDatumFast(tmp.wal_records);
> + values[i++] = UInt64GetDatum(tmp.wal_num_fpw);
> 
> Why are they different?  I think we should use the same *GetDatum API
> (probably Int64GetDatumFast) for these.


Oops, that's a mistake from when I was working on the wal_bytes output, fixed.

> > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
> >
>
> One more comment related to this patch.
> +
> + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
> +
> + /* Convert to numeric. */
> + wal_bytes = DirectFunctionCall3(numeric_in,
> + CStringGetDatum(buf),
> + ObjectIdGetDatum(0),
> + Int32GetDatum(-1));
> +
> + values[i++] = wal_bytes;
>
> I see that other places that display uint64 values use BIGINT datatype
> in SQL, so why can't we do the same here?  See the usage of queryid in
> pg_stat_statements or internal_pages, *_pages exposed via
> pgstatindex.c.


That's because it's harmless to report a signed number for a hash (at least
comapred to the overhead of having it unsigned), while that's certainly not
wanted to report a negative amount of WAL bytes generated if it goes beyond
bigint limit.  See the usage of pg_lsn_mi in pg_lsn.c for instance.

On Wed, Apr 01, 2020 at 07:20:31PM +0530, Dilip Kumar wrote:
>
> I have reviewed 0003 and 0004,  I have a few comments.
> v9-0003-Add-infrastructure-to-track-WAL-usage
>
> 1.
>   /* Points to buffer usage area in DSM */
>   BufferUsage *buffer_usage;
> + /* Points to WAL usage area in DSM */
> + WalUsage   *wal_usage;
>
> Better to give one blank line between the previous statement/variable
> declaration and the next comment line.


Fixed.


> 2.
> @@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
> IndexBulkDeleteResult **stats,
>   WaitForParallelWorkersToFinish(lps->pcxt);
>
>   for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> - InstrAccumParallelQuery(&lps->buffer_usage[i]);
> + InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
>   }
>
> The existing comment above this loop, which just mentions the buffer
> usage, not the wal usage so I guess we need to change that.


Ah indeed, I thought I caught all the comments but missed this one.  Fixed.


> v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
>
> 3.
> + if (usage->wal_num_fpw > 0)
> + appendStringInfo(es->str, " full page records=%ld",
> +    usage->wal_num_fpw);
> + if (usage->wal_bytes > 0)
> + appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
> +    usage->wal_bytes);
>
> Shall we change to 'full page writes' or 'full page image' instead of
> full page records?


Indeed, I changed it in the (auto)vacuum output but missed this one.  Fixed.


> Apart from this, I have some testing to see the wal_usage with the
> parallel vacuum and the results look fine.
>
> postgres[104248]=# CREATE TABLE test (a int, b int);
> CREATE TABLE
> postgres[104248]=# INSERT INTO test SELECT i, i FROM
> GENERATE_SERIES(1,2000000) as i;
> INSERT 0 2000000
> postgres[104248]=# CREATE INDEX idx1 on test(a);
> CREATE INDEX
> postgres[104248]=# VACUUM (PARALLEL 1) test;
> VACUUM
> postgres[104248]=# select query , wal_bytes, wal_records, wal_num_fpw
> from pg_stat_statements where query like 'VACUUM%';
>           query           | wal_bytes | wal_records | wal_num_fpw
> --------------------------+-----------+-------------+-------------
>  VACUUM (PARALLEL 1) test |  72814331 |        8857 |        8855
>
>
>
> postgres[106479]=# CREATE TABLE test (a int, b int);
> CREATE TABLE
> postgres[106479]=# INSERT INTO test SELECT i, i FROM
> GENERATE_SERIES(1,2000000) as i;
> INSERT 0 2000000
> postgres[106479]=# CREATE INDEX idx1 on test(a);
> CREATE INDEX
> postgres[106479]=# VACUUM (PARALLEL 0) test;
> VACUUM
> postgres[106479]=# select query , wal_bytes, wal_records, wal_num_fpw
> from pg_stat_statements where query like 'VACUUM%';
>           query           | wal_bytes | wal_records | wal_num_fpw
> --------------------------+-----------+-------------+-------------
>  VACUUM (PARALLEL 0) test |  72814331 |        8857 |        8855


Thanks!  I did some similar testing, with also seq/parallel index creation and
got similar results.


> By tomorrow, I will try to finish reviewing 0005 and 0006.

Thanks!

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: potential stuck lock in SaveSlotToPath()
Next
From: Tom Lane
Date:
Subject: Re: Less-silly selectivity for JSONB matching operators