Re: WAL usage calculation patch - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: WAL usage calculation patch
Date
Msg-id 20200402083035.GB64485@nol
Whole thread Raw
In response to Re: WAL usage calculation patch  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: WAL usage calculation patch
List pgsql-hackers
On Thu, Apr 02, 2020 at 11:07:29AM +0530, Amit Kapila wrote:
> On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
> > > 3. Doing some testing with and without parallelism to ensure WAL usage
> > > data is correct would be great and if possible, share the results?
> >
> >
> > I just saw that Dilip did some testing, but just in case here is some
> > additional one
> >
> > - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
> >
> > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
> >          query          | calls | wal_bytes | wal_records | wal_num_fpw
> > ------------------------+-------+-----------+-------------+-------------
> >  vacuum (parallel 3) t1 |     1 |  20098962 |       34104 |           2
> >  vacuum (parallel 0) t1 |     1 |  20098962 |       34104 |           2
> > (2 rows)
> >
> > - create index, overload t1's parallel_workers, using the 1M line just
> >   vacuumed:
> >
> > =# alter table t1 set (parallel_workers = 2);
> > ALTER TABLE
> >
> > =# create index t1_parallel_2 on t1(id);
> > CREATE INDEX
> >
> > =# alter table t1 set (parallel_workers = 0);
> > ALTER TABLE
> >
> > =# create index t1_parallel_0 on t1(id);
> > CREATE INDEX
> >
> > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
> >                 query                 | calls | wal_bytes | wal_records | wal_num_fpw
> > --------------------------------------+-------+-----------+-------------+-------------
> >  create index t1_parallel_0 on t1(id) |     1 |  20355540 |        2762 |        2745
> >  create index t1_parallel_2 on t1(id) |     1 |  20406811 |        2762 |        2758
> > (2 rows)
> >
> > It all looks good to me.
> >
> 
> Here the wal_num_fpw and wal_bytes are different between parallel and
> non-parallel versions.  Is it due to checkpoint or something else?  We
> can probably rule out checkpoint by increasing checkpoint_timeout and
> other checkpoint related parameters.

I think this is because I did a checkpoint after the VACUUM tests, so the 1st
CREATE INDEX (with parallelism) induced some FPW on the catalog blocks.  I
didn't try to investigate more since:

On Thu, Apr 02, 2020 at 11:22:16AM +0530, Amit Kapila wrote:
>
> Also, I forgot to mention that let's not base this on buffer usage
> patch for create index
> (v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as
> per recent discussion I am not sure about its usefulness.  I think we
> can proceed with this patch without
> v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well.


Which is done in attached v11.


> > > 5.
> > > -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
> > > -               query               | calls | rows
> > > ------------------------------------+-------+------
> > > - SELECT $1::TEXT                   |     1 |    1
> > > - SELECT PLUS_ONE($1)               |     2 |    2
> > > - SELECT PLUS_TWO($1)               |     2 |    2
> > > - SELECT pg_stat_statements_reset() |     1 |    1
> > > +SELECT query, calls, rows, wal_bytes, wal_records FROM
> > > pg_stat_statements ORDER BY query COLLATE "C";
> > > +               query               | calls | rows | wal_bytes | wal_records
> > > +-----------------------------------+-------+------+-----------+-------------
> > > + SELECT $1::TEXT                   |     1 |    1 |         0 |           0
> > > + SELECT PLUS_ONE($1)               |     2 |    2 |         0 |           0
> > > + SELECT PLUS_TWO($1)               |     2 |    2 |         0 |           0
> > > + SELECT pg_stat_statements_reset() |     1 |    1 |         0 |           0
> > >  (4 rows)
> > >
> > > Again, I am not sure if these modifications make much sense?
> >
> >
> > Those are queries that were previously executed.  As those are read-only query,
> > that are pretty much guaranteed to not cause any WAL activity, I don't see how
> > it hurts to test at the same time that that's we indeed record with
> > pg_stat_statements, just to be safe.
> >
> 
> On a similar theory, one could have checked bufferusage stats as well.
> The statements are using some expressions so don't see any value in
> check all usage data for such statements.


Dropped.


> Right now, that particular patch is not getting applied (probably due
> to recent commit 17e0328224).  Can you rebase it?


Done.


> > > v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
> > >
> > > 3.
> > > + if (usage->wal_num_fpw > 0)
> > > + appendStringInfo(es->str, " full page records=%ld",
> > > +    usage->wal_num_fpw);
> > > + if (usage->wal_bytes > 0)
> > > + appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
> > > +    usage->wal_bytes);
> > >
> > > Shall we change to 'full page writes' or 'full page image' instead of
> > > full page records?
> >
> >
> > Indeed, I changed it in the (auto)vacuum output but missed this one.  Fixed.
> >
> 
> I don't see this change in the patch.


Yes, as Dilip reported I fixuped the wrong commit, sorry about that.  This
version should now be ok.


On Thu, Apr 02, 2020 at 12:04:32PM +0530, Dilip Kumar wrote:
> On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > > By tomorrow, I will try to finish reviewing 0005 and 0006.
>
> I have reviewed these patches and I have a few cosmetic comments.
> v10-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
>
> 1.
> + uint64 wal_bytes; /* total amount of wal bytes written */
> + int64 wal_records; /* # of wal records written */
> + int64 wal_num_fpw; /* # of full page wal records written */
>
>
> /s/# of full page wal records written / /* # of WAL full page image produced */


Done, I also consistently s/wal/WAL/.

>
> 2.
>  static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
>   ProcessUtilityContext context, ParamListInfo params,
>   QueryEnvironment *queryEnv,
> - DestReceiver *dest, QueryCompletion *qc);
> + DestReceiver *dest, QueryCompletion * qc);
>
> Useless hunk.


Oops, leftover of a pgindent as QueryCompletion isn't in the typedefs yet.  I
thought I discarded all the useless hunks but missed this one.  Thanks, fixed.


>
> 3.
>
> v10-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum
>
> @@ -3105,7 +3105,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
>   {
>   ExplainPropertyInteger("WAL records", NULL,
>      usage->wal_records, es);
> - ExplainPropertyInteger("WAL full page records", NULL,
> + ExplainPropertyInteger("WAL full page writes", NULL,
>      usage->wal_num_fpw, es);
> Just noticed that in 0004 you have first added "WAL full page
> records", which is later corrected to "WAL full page writes" in 0006.
> I think we better keep this proper in 0004 itself and avoid this hunk
> in 0006, otherwise, it creates confusion while reviewing.


Oh, I didn't realized that I fixuped the wrong commit.  Fixed.


I also adapted the documentation that mentioned full page records instead of
full page images, and integrated Justin's comment:

> In 0003:
> +       /* Provide WAL update data to the instrumentation */
> Remove "data" ??

so changed to "Report WAL traffic to the instrumentation."

I didn't change the (auto)vacuum output yet (except fixing the s/full page
records/full page writes/ that I previously missed), as it's not clear what the
consensus is yet.  I'll take care of that as soon as we reach to a consensus.

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: EINTR while resizing dsm segment.
Next
From: Alvaro Herrera
Date:
Subject: Re: Add A Glossary