Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance? - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?
Date
Msg-id CALj2ACVy1S6pVOekoDcP6=9GZH55r6P0VCtnAX7gLSrGusHMWQ@mail.gmail.com
Whole thread Raw
In response to Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Wed, Apr 21, 2021 at 8:02 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> > Thanks! I think we could avoid extra processing costs for cases like
> > VACUUM/ANALYZE foo, foo; when no explicit columns are specified. The
> > avoided costs can be lock acquire, relation open, vacuum/analyze,
> > relation close, starting new xact command, command counter increment
> > in case of analyze etc. This can be done with a simple patch like the
> > attached. When explicit columns are specified along with relations
> > i.e. VACUUM/ANALYZE foo(c1), foo(c2); we don't want to do the extra
> > complex processing to optimize the cases when c1 = c2.
> >
> > Note that the TRUNCATE command currently skips processing repeated
> > relations (see ExecuteTruncate). For example, TRUNCATE foo, foo; and
> > TRUNCATE foo, ONLY foo, foo; first instance of relation foo is taken
> > into consideration for processing and other relation instances
> > (options specified if any) are ignored.
> >
> > Thoughts?
>
> Although I don't strongly oppose to check that, the check of truncate
> is natural and required. The relation list is anyway used afterwards,
> and we cannot truncate the same relation twice or more since a
> relation under "use" cannot be truncated. (Truncation is one form of
> use).  In short, TRUNCATE runs no checking just for the check's own
> sake.

Thanks for the point. Yes, if we don't skip repeated instances we do
get below error:
postgres=# truncate t1, t1;
ERROR:  cannot TRUNCATE "t1" because it is being used by active
queries in this session

> On the other hand the patch creates a relation list just for this
> purpose, which is not needed to run VACUUM/ANALYZE, and VACUUM/ANALYE
> works well with duplicates in target relations.

Yeah, the relids list is only used to skip the duplicates. I feel
that's okay given the negligible extra processing (searching for the
relids in the list) we add with it versus the extra processing we
avoid with skipping duplicates, see [1].

Although VACUUM/ANALYZE works well with duplicate relations without
any error (unlike TRUNCATE), is there any benefit if we run
back-to-back VACUUM/ANALYZE within a single command? I assume that
there's no benefit. My only point was that even if somebody specifies
duplicate relations, we could avoid some processing effort see [1] for
the gain. For ANALYZE, we can avoid doing extra
StartTransactionCommand, CommitTransactionCommand and
CommandCounterIncrement as well.

I know the use cases that I'm trying to optimize with the patch are
worthless and unrealistic (may be written by someone like me). Since
we generally don't optimize for rare and unrecommended scenarios, I'm
okay if we drop this patch. But I would like to mention [1] the gain
we get with the patch.

[1] tested on my dev system, with default postgresql.conf, t1 is
having 10mn rows:
HEAD:
postgres=# analyze t1;
Time: 363.580 ms
postgres=# analyze t1;
Time: 384.760 ms

postgres=# analyze t1, t1;
Time: 687.976 ms
postgres=# analyze t1, t1;
Time: 664.420 ms

postgres=# analyze t1, t1, t1;
Time: 1010.855 ms (00:01.011)
postgres=# analyze t1, t1, t1;
Time: 1119.970 ms (00:01.120)

postgres=# analyze t1, t1, t1, t1;
Time: 1350.345 ms (00:01.350)
postgres=# analyze t1, t1, t1, t1;
Time: 1316.738 ms (00:01.317)

postgres=# analyze t1, t1, t1, t1, t1;
Time: 1651.780 ms (00:01.652)
postgres=# analyze t1, t1, t1, t1, t1, t1;
Time: 1983.163 ms (00:01.983)

PATCHed:
postgres=# analyze t1;
Time: 356.709 ms
postgres=# analyze t1;
Time: 360.780 ms

postgres=# analyze t1, t1;
Time: 377.193 ms
postgres=# analyze t1, t1;
Time: 370.636 ms

postgres=# analyze t1, t1, t1;
Time: 364.271 ms
postgres=# analyze t1, t1, t1;
Time: 349.988 ms

postgres=# analyze t1, t1, t1, t1;
Time: 362.567 ms
postgres=# analyze t1, t1, t1, t1;
Time: 383.292 ms

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Replication slot stats misgivings
Next
From: Michael Paquier
Date:
Subject: Re: multi-install PostgresNode fails with older postgres versions