Thread: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?
Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?
From
Bharath Rupireddy
Date:
Hi, I'm reading the code for vacuum/analyze and it looks like currently we call vacuum_rel/analyze_rel for each relation specified. Which means that if a relation is specified more than once, then we simply vacuum/analyze it that many times. Do we gain any advantage by vacuuming/analyzing a relation back-to-back within a single command? I strongly feel no. I'm thinking we could do a simple optimization here, by transforming following VACUUM/ANALYZE commands to: 1) VACUUM t1, t2, t1, t2, t1; transform to --> VACUUM t1, t2; 2) VACUUM ANALYZE t1(a1), t2(a2), t1(b1), t2(b2), t1(c1); transform to --> VACUUM ANALYZE t1(a1, b1, c1), t2(a2, b2) 3) ANALYZE t1, t2, t1, t2, t1; transform to --> ANALYZE t1, t2; 4) ANALYZE t1(a1), t2(a2), t1(b1), t2(b2), t1(c1); transform to --> ANALYZE t1(a1, b1, c1), t2(a2, b2) Above use cases may look pretty much unsound and we could think of disallowing with an error for the use cases (1) and 3(), but the use cases (2) and (4) are quite possible in customer scenarios(??). Please feel free to add any other use cases you may think of. The main advantage of the above said optimization is that the commands can become a bit faster because we will avoid extra processing. I would like to hear opinions on this. I'm not sure if this optimization was already given a thought and not done because of some specific reasons. If so, it will be great if someone can point me to those discussions. Or it could be that I'm badly missing in my understanding of current vacuum/analyze code, feel free to correct me. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?
From
Tom Lane
Date:
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: > I'm reading the code for vacuum/analyze and it looks like currently we > call vacuum_rel/analyze_rel for each relation specified. Which means > that if a relation is specified more than once, then we simply > vacuum/analyze it that many times. Do we gain any advantage by > vacuuming/analyzing a relation back-to-back within a single command? I > strongly feel no. I'm thinking we could do a simple optimization here, This really is not something to expend cycles and code complexity on. If the user wrote the same table more than once, that's their choice. regards, tom lane
Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?
From
Bharath Rupireddy
Date:
On Sat, Apr 10, 2021 at 8:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: > > I'm reading the code for vacuum/analyze and it looks like currently we > > call vacuum_rel/analyze_rel for each relation specified. Which means > > that if a relation is specified more than once, then we simply > > vacuum/analyze it that many times. Do we gain any advantage by > > vacuuming/analyzing a relation back-to-back within a single command? I > > strongly feel no. I'm thinking we could do a simple optimization here, > > This really is not something to expend cycles and code complexity on. > If the user wrote the same table more than once, that's their choice. Thanks! I think we could avoid extra processing costs for cases like VACUUM/ANALYZE foo, foo; when no explicit columns are specified. The avoided costs can be lock acquire, relation open, vacuum/analyze, relation close, starting new xact command, command counter increment in case of analyze etc. This can be done with a simple patch like the attached. When explicit columns are specified along with relations i.e. VACUUM/ANALYZE foo(c1), foo(c2); we don't want to do the extra complex processing to optimize the cases when c1 = c2. Note that the TRUNCATE command currently skips processing repeated relations (see ExecuteTruncate). For example, TRUNCATE foo, foo; and TRUNCATE foo, ONLY foo, foo; first instance of relation foo is taken into consideration for processing and other relation instances (options specified if any) are ignored. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?
From
Kyotaro Horiguchi
Date:
At Wed, 21 Apr 2021 07:34:40 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Sat, Apr 10, 2021 at 8:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: > > > I'm reading the code for vacuum/analyze and it looks like currently we > > > call vacuum_rel/analyze_rel for each relation specified. Which means > > > that if a relation is specified more than once, then we simply > > > vacuum/analyze it that many times. Do we gain any advantage by > > > vacuuming/analyzing a relation back-to-back within a single command? I > > > strongly feel no. I'm thinking we could do a simple optimization here, > > > > This really is not something to expend cycles and code complexity on. > > If the user wrote the same table more than once, that's their choice. > > Thanks! I think we could avoid extra processing costs for cases like > VACUUM/ANALYZE foo, foo; when no explicit columns are specified. The > avoided costs can be lock acquire, relation open, vacuum/analyze, > relation close, starting new xact command, command counter increment > in case of analyze etc. This can be done with a simple patch like the > attached. When explicit columns are specified along with relations > i.e. VACUUM/ANALYZE foo(c1), foo(c2); we don't want to do the extra > complex processing to optimize the cases when c1 = c2. > > Note that the TRUNCATE command currently skips processing repeated > relations (see ExecuteTruncate). For example, TRUNCATE foo, foo; and > TRUNCATE foo, ONLY foo, foo; first instance of relation foo is taken > into consideration for processing and other relation instances > (options specified if any) are ignored. > > Thoughts? Although I don't strongly oppose to check that, the check of truncate is natural and required. The relation list is anyway used afterwards, and we cannot truncate the same relation twice or more since a relation under "use" cannot be truncated. (Truncation is one form of use). In short, TRUNCATE runs no checking just for the check's own sake. On the other hand the patch creates a relation list just for this purpose, which is not needed to run VACUUM/ANALYZE, and VACUUM/ANALYE works well with duplicates in target relations. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?
From
Bharath Rupireddy
Date:
On Wed, Apr 21, 2021 at 8:02 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Thanks! I think we could avoid extra processing costs for cases like > > VACUUM/ANALYZE foo, foo; when no explicit columns are specified. The > > avoided costs can be lock acquire, relation open, vacuum/analyze, > > relation close, starting new xact command, command counter increment > > in case of analyze etc. This can be done with a simple patch like the > > attached. When explicit columns are specified along with relations > > i.e. VACUUM/ANALYZE foo(c1), foo(c2); we don't want to do the extra > > complex processing to optimize the cases when c1 = c2. > > > > Note that the TRUNCATE command currently skips processing repeated > > relations (see ExecuteTruncate). For example, TRUNCATE foo, foo; and > > TRUNCATE foo, ONLY foo, foo; first instance of relation foo is taken > > into consideration for processing and other relation instances > > (options specified if any) are ignored. > > > > Thoughts? > > Although I don't strongly oppose to check that, the check of truncate > is natural and required. The relation list is anyway used afterwards, > and we cannot truncate the same relation twice or more since a > relation under "use" cannot be truncated. (Truncation is one form of > use). In short, TRUNCATE runs no checking just for the check's own > sake. Thanks for the point. Yes, if we don't skip repeated instances we do get below error: postgres=# truncate t1, t1; ERROR: cannot TRUNCATE "t1" because it is being used by active queries in this session > On the other hand the patch creates a relation list just for this > purpose, which is not needed to run VACUUM/ANALYZE, and VACUUM/ANALYE > works well with duplicates in target relations. Yeah, the relids list is only used to skip the duplicates. I feel that's okay given the negligible extra processing (searching for the relids in the list) we add with it versus the extra processing we avoid with skipping duplicates, see [1]. Although VACUUM/ANALYZE works well with duplicate relations without any error (unlike TRUNCATE), is there any benefit if we run back-to-back VACUUM/ANALYZE within a single command? I assume that there's no benefit. My only point was that even if somebody specifies duplicate relations, we could avoid some processing effort see [1] for the gain. For ANALYZE, we can avoid doing extra StartTransactionCommand, CommitTransactionCommand and CommandCounterIncrement as well. I know the use cases that I'm trying to optimize with the patch are worthless and unrealistic (may be written by someone like me). Since we generally don't optimize for rare and unrecommended scenarios, I'm okay if we drop this patch. But I would like to mention [1] the gain we get with the patch. [1] tested on my dev system, with default postgresql.conf, t1 is having 10mn rows: HEAD: postgres=# analyze t1; Time: 363.580 ms postgres=# analyze t1; Time: 384.760 ms postgres=# analyze t1, t1; Time: 687.976 ms postgres=# analyze t1, t1; Time: 664.420 ms postgres=# analyze t1, t1, t1; Time: 1010.855 ms (00:01.011) postgres=# analyze t1, t1, t1; Time: 1119.970 ms (00:01.120) postgres=# analyze t1, t1, t1, t1; Time: 1350.345 ms (00:01.350) postgres=# analyze t1, t1, t1, t1; Time: 1316.738 ms (00:01.317) postgres=# analyze t1, t1, t1, t1, t1; Time: 1651.780 ms (00:01.652) postgres=# analyze t1, t1, t1, t1, t1, t1; Time: 1983.163 ms (00:01.983) PATCHed: postgres=# analyze t1; Time: 356.709 ms postgres=# analyze t1; Time: 360.780 ms postgres=# analyze t1, t1; Time: 377.193 ms postgres=# analyze t1, t1; Time: 370.636 ms postgres=# analyze t1, t1, t1; Time: 364.271 ms postgres=# analyze t1, t1, t1; Time: 349.988 ms postgres=# analyze t1, t1, t1, t1; Time: 362.567 ms postgres=# analyze t1, t1, t1, t1; Time: 383.292 ms With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com