Thread: bug fix and documentation improvement about vacuumdb
Hi there,
I have 1 trivial fix, 1 bug fix, and 1 suggestion about vacuumdb.
First, I noticed that the help message of `vacuumdb` is a bit incorrect.
`vacuumdb -?` displays the following message
```
...
-n, --schema=PATTERN vacuum tables in the specified schema(s) only
-N, --exclude-schema=PATTERN do not vacuum tables in the specified schema(s)
...
```
PATTERN should be changed to SCHEMA because -n and -N options don't support
pattern matching for schema names. The attached patch 0001 fixes this.
Second, when we use multiple -N options, vacuumdb runs incorrectly as shown below.
```
$ psql
=# CREATE SCHEMA s1;
=# CREATE SCHEMA s2;
=# CREATE SCHEMA s3;
=# CREATE TABLE s1.t(i int);
=# CREATE TABLE s2.t(i int);
=# CREATE TABLE s3.t(i int);
=# ALTER SYSTEM SET log_statement TO 'all';
=# SELECT pg_reload_conf();
=# \q
$ vacuumdb -N s1 -N s2
```
We expect that tables in schemas s1 and s2 should not be vacuumed, while the
others should be. However, logfile says like this.
```
LOG: statement: VACUUM (SKIP_DATABASE_STATS) pg_catalog.pg_proc;
LOG: statement: VACUUM (SKIP_DATABASE_STATS) pg_catalog.pg_proc;
...
LOG: statement: VACUUM (SKIP_DATABASE_STATS) s2.t;
LOG: statement: VACUUM (SKIP_DATABASE_STATS) s1.t;
LOG: statement: VACUUM (ONLY_DATABASE_STATS);
```
Even specified by -N, s1.t and s2.t are vacuumed, and also the others are vacuumed
twice. The attached patch 0002 fixes this.
Third, for the description of the -N option, I wonder if "vacuum all tables except
in the specified schema(s)" might be clearer. The current one says nothing about
tables not in the specified schema.
Thoughts?
Masaki Kuwamura
I have 1 trivial fix, 1 bug fix, and 1 suggestion about vacuumdb.
First, I noticed that the help message of `vacuumdb` is a bit incorrect.
`vacuumdb -?` displays the following message
```
...
-n, --schema=PATTERN vacuum tables in the specified schema(s) only
-N, --exclude-schema=PATTERN do not vacuum tables in the specified schema(s)
...
```
PATTERN should be changed to SCHEMA because -n and -N options don't support
pattern matching for schema names. The attached patch 0001 fixes this.
Second, when we use multiple -N options, vacuumdb runs incorrectly as shown below.
```
$ psql
=# CREATE SCHEMA s1;
=# CREATE SCHEMA s2;
=# CREATE SCHEMA s3;
=# CREATE TABLE s1.t(i int);
=# CREATE TABLE s2.t(i int);
=# CREATE TABLE s3.t(i int);
=# ALTER SYSTEM SET log_statement TO 'all';
=# SELECT pg_reload_conf();
=# \q
$ vacuumdb -N s1 -N s2
```
We expect that tables in schemas s1 and s2 should not be vacuumed, while the
others should be. However, logfile says like this.
```
LOG: statement: VACUUM (SKIP_DATABASE_STATS) pg_catalog.pg_proc;
LOG: statement: VACUUM (SKIP_DATABASE_STATS) pg_catalog.pg_proc;
...
LOG: statement: VACUUM (SKIP_DATABASE_STATS) s2.t;
LOG: statement: VACUUM (SKIP_DATABASE_STATS) s1.t;
LOG: statement: VACUUM (ONLY_DATABASE_STATS);
```
Even specified by -N, s1.t and s2.t are vacuumed, and also the others are vacuumed
twice. The attached patch 0002 fixes this.
Third, for the description of the -N option, I wonder if "vacuum all tables except
in the specified schema(s)" might be clearer. The current one says nothing about
tables not in the specified schema.
Thoughts?
Masaki Kuwamura
Attachment
> On 14 Sep 2023, at 13:21, Kuwamura Masaki <kuwamura@db.is.i.nagoya-u.ac.jp> wrote: > PATTERN should be changed to SCHEMA because -n and -N options don't support > pattern matching for schema names. The attached patch 0001 fixes this. True, there is no pattern matching performed. I wonder if it's worth lifting the pattern matching from pg_dump into common code such that tools like this can use it? > Second, when we use multiple -N options, vacuumdb runs incorrectly as shown below. > ... > Even specified by -N, s1.t and s2.t are vacuumed, and also the others are vacuumed > twice. The attached patch 0002 fixes this. I can reproduce that, a single -N works but adding multiple -N's makes none of them excluded. The current coding does this: if (objfilter & OBJFILTER_SCHEMA_EXCLUDE) appendPQExpBufferStr(&catalog_query, "OPERATOR(pg_catalog.!=) "); If the join is instead made to exclude the oids in listed_objects with a left join and a clause on object_oid being null I can make the current query work without adding a second clause. I don't have strong feelings wrt if we should add a NOT IN () or fix this JOIN, but we shouldn't have a faulty join together with the fix. With your patch the existing join is left in place, let's fix that. > Third, for the description of the -N option, I wonder if "vacuum all tables except > in the specified schema(s)" might be clearer. The current one says nothing about > tables not in the specified schema. Maybe, but the point of vacuumdb is to analyze a database so I'm not sure who would expect anything else than vacuuming everything but the excluded schema when specifying -N. What else could "vacuumdb -N foo" be interpreted to do that can be confusing? -- Daniel Gustafsson
On Thu, Sep 14, 2023 at 02:06:51PM +0200, Daniel Gustafsson wrote: >> On 14 Sep 2023, at 13:21, Kuwamura Masaki <kuwamura@db.is.i.nagoya-u.ac.jp> wrote: > >> PATTERN should be changed to SCHEMA because -n and -N options don't support >> pattern matching for schema names. The attached patch 0001 fixes this. > > True, there is no pattern matching performed. I wonder if it's worth lifting > the pattern matching from pg_dump into common code such that tools like this > can use it? I agree that this should be changed to SCHEMA. It might be tough to add pattern matching with the current catalog query, and I don't know whether there is demand for such a feature, but I wouldn't discourage someone from trying. >> Second, when we use multiple -N options, vacuumdb runs incorrectly as shown below. >> ... > >> Even specified by -N, s1.t and s2.t are vacuumed, and also the others are vacuumed >> twice. The attached patch 0002 fixes this. > > I can reproduce that, a single -N works but adding multiple -N's makes none of > them excluded. The current coding does this: > > if (objfilter & OBJFILTER_SCHEMA_EXCLUDE) > appendPQExpBufferStr(&catalog_query, "OPERATOR(pg_catalog.!=) "); > > If the join is instead made to exclude the oids in listed_objects with a left > join and a clause on object_oid being null I can make the current query work > without adding a second clause. I don't have strong feelings wrt if we should > add a NOT IN () or fix this JOIN, but we shouldn't have a faulty join together > with the fix. With your patch the existing join is left in place, let's fix that. Yeah, I think we can fix the JOIN as you suggest. I quickly put a patch together to demonstrate. We should probably add some tests... >> Third, for the description of the -N option, I wonder if "vacuum all tables except >> in the specified schema(s)" might be clearer. The current one says nothing about >> tables not in the specified schema. > > Maybe, but the point of vacuumdb is to analyze a database so I'm not sure who > would expect anything else than vacuuming everything but the excluded schema > when specifying -N. What else could "vacuumdb -N foo" be interpreted to do > that can be confusing? I agree with Daniel on this one. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
At Thu, 14 Sep 2023 07:57:57 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in > On Thu, Sep 14, 2023 at 02:06:51PM +0200, Daniel Gustafsson wrote: > > I can reproduce that, a single -N works but adding multiple -N's makes none of > > them excluded. The current coding does this: > > > > if (objfilter & OBJFILTER_SCHEMA_EXCLUDE) > > appendPQExpBufferStr(&catalog_query, "OPERATOR(pg_catalog.!=) "); > > > > If the join is instead made to exclude the oids in listed_objects with a left > > join and a clause on object_oid being null I can make the current query work > > without adding a second clause. I don't have strong feelings wrt if we should > > add a NOT IN () or fix this JOIN, but we shouldn't have a faulty join together > > with the fix. With your patch the existing join is left in place, let's fix that. > > Yeah, I think we can fix the JOIN as you suggest. I quickly put a patch > together to demonstrate. We should probably add some tests... It seems to work fine. However, if we're aiming for consistent spacing, the "IS NULL" (two spaces in between) might be an concern. > >> Third, for the description of the -N option, I wonder if "vacuum all tables except > >> in the specified schema(s)" might be clearer. The current one says nothing about > >> tables not in the specified schema. > > > > Maybe, but the point of vacuumdb is to analyze a database so I'm not sure who > > would expect anything else than vacuuming everything but the excluded schema > > when specifying -N. What else could "vacuumdb -N foo" be interpreted to do > > that can be confusing? > > I agree with Daniel on this one. +1. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
> On 15 Sep 2023, at 04:39, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > At Thu, 14 Sep 2023 07:57:57 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in >> Yeah, I think we can fix the JOIN as you suggest. I quickly put a patch >> together to demonstrate. Looks good from a quick skim. >> We should probably add some tests... Agreed. > It seems to work fine. However, if we're aiming for consistent > spacing, the "IS NULL" (two spaces in between) might be an concern. I don't think that's a problem. I would rather have readable C code and two spaces in the generated SQL than contorting the C code to produce less whitespace in a query few will read in its generated form. -- Daniel Gustafsson
On Fri, Sep 15, 2023 at 10:13:10AM +0200, Daniel Gustafsson wrote: >> On 15 Sep 2023, at 04:39, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: >> It seems to work fine. However, if we're aiming for consistent >> spacing, the "IS NULL" (two spaces in between) might be an concern. > > I don't think that's a problem. I would rather have readable C code and two > spaces in the generated SQL than contorting the C code to produce less > whitespace in a query few will read in its generated form. I think we could pretty easily avoid the extra space and keep the C code relatively readable. These sorts of things bug me, too (see 2af3336). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Thank you for all your reviews!
>>> PATTERN should be changed to SCHEMA because -n and -N options don't support
>>> pattern matching for schema names. The attached patch 0001 fixes this.
>>
>> True, there is no pattern matching performed. I wonder if it's worth lifting
>> the pattern matching from pg_dump into common code such that tools like this
>> can use it?
>
> I agree that this should be changed to SCHEMA. It might be tough to add
> pattern matching with the current catalog query, and I don't know whether
> there is demand for such a feature, but I wouldn't discourage someone from
> trying.
I think that supporting pattern matching is quite nice.
But it will be not only tough but also a breaking change, I wonder.
So I guess this change should be commited either way.
>>> Yeah, I think we can fix the JOIN as you suggest. I quickly put a patch
>>> together to demonstrate.
>
> Looks good from a quick skim.
I do agree with this updates. Thank you!
>> We should probably add some tests...
>
> Agreed.
The attached patch includes new tests for this bug.
Also, I fixed the current test for -N option seems to be incorrect.
>>> It seems to work fine. However, if we're aiming for consistent
>>> spacing, the "IS NULL" (two spaces in between) might be an concern.
>>
>> I don't think that's a problem. I would rather have readable C code and two
>> spaces in the generated SQL than contorting the C code to produce less
>> whitespace in a query few will read in its generated form.
>
> I think we could pretty easily avoid the extra space and keep the C code
> relatively readable. These sorts of things bug me, too (see 2af3336).
Though I don't think it affects readability, I'm neutral about this.
>> >> Third, for the description of the -N option, I wonder if "vacuum all tables except
>> >> in the specified schema(s)" might be clearer. The current one says nothing about
>> >> tables not in the specified schema.
>> >
>> > Maybe, but the point of vacuumdb is to analyze a database so I'm not sure who
>> > would expect anything else than vacuuming everything but the excluded schema
>> > when specifying -N. What else could "vacuumdb -N foo" be interpreted to do
>> > that can be confusing?
>>
>> I agree with Daniel on this one.
>
> +1.
That make sense. I retract my suggestion.
>>> PATTERN should be changed to SCHEMA because -n and -N options don't support
>>> pattern matching for schema names. The attached patch 0001 fixes this.
>>
>> True, there is no pattern matching performed. I wonder if it's worth lifting
>> the pattern matching from pg_dump into common code such that tools like this
>> can use it?
>
> I agree that this should be changed to SCHEMA. It might be tough to add
> pattern matching with the current catalog query, and I don't know whether
> there is demand for such a feature, but I wouldn't discourage someone from
> trying.
I think that supporting pattern matching is quite nice.
But it will be not only tough but also a breaking change, I wonder.
So I guess this change should be commited either way.
>>> Yeah, I think we can fix the JOIN as you suggest. I quickly put a patch
>>> together to demonstrate.
>
> Looks good from a quick skim.
I do agree with this updates. Thank you!
>> We should probably add some tests...
>
> Agreed.
The attached patch includes new tests for this bug.
Also, I fixed the current test for -N option seems to be incorrect.
>>> It seems to work fine. However, if we're aiming for consistent
>>> spacing, the "IS NULL" (two spaces in between) might be an concern.
>>
>> I don't think that's a problem. I would rather have readable C code and two
>> spaces in the generated SQL than contorting the C code to produce less
>> whitespace in a query few will read in its generated form.
>
> I think we could pretty easily avoid the extra space and keep the C code
> relatively readable. These sorts of things bug me, too (see 2af3336).
Though I don't think it affects readability, I'm neutral about this.
>> >> Third, for the description of the -N option, I wonder if "vacuum all tables except
>> >> in the specified schema(s)" might be clearer. The current one says nothing about
>> >> tables not in the specified schema.
>> >
>> > Maybe, but the point of vacuumdb is to analyze a database so I'm not sure who
>> > would expect anything else than vacuuming everything but the excluded schema
>> > when specifying -N. What else could "vacuumdb -N foo" be interpreted to do
>> > that can be confusing?
>>
>> I agree with Daniel on this one.
>
> +1.
That make sense. I retract my suggestion.
Attachment
> On 20 Sep 2023, at 11:46, Kuwamura Masaki <kuwamura@db.is.i.nagoya-u.ac.jp> wrote: > I think that supporting pattern matching is quite nice. > But it will be not only tough but also a breaking change, I wonder. > So I guess this change should be commited either way. I agree. Supporting pattern matching should, if anyone is interested in trying, be done separately in its own thread, no need to move the goalposts here. Sorry if I made it sound like so upthread. > The attached patch includes new tests for this bug. > Also, I fixed the current test for -N option seems to be incorrect. When sending an update, please include the previous patch as well with your new tests as a 0002 patch in a patchset. The CFBot can only apply and build/test patches when the entire patchset is attached to the email. The below testresults indicate that the patch failed the new tests (which in a way is good since without the fix the tests *should* fail), since the fix patch was not applied: http://cfbot.cputube.org/masaki-kuwamura.html -- Daniel Gustafsson
I agree. Supporting pattern matching should, if anyone is interested in
trying, be done separately in its own thread, no need to move the goalposts
here. Sorry if I made it sound like so upthread.
I got it.
When sending an update, please include the previous patch as well with your new
tests as a 0002 patch in a patchset. The CFBot can only apply and build/test
patches when the entire patchset is attached to the email. The below
testresults indicate that the patch failed the new tests (which in a way is
good since without the fix the tests *should* fail), since the fix patch was
not applied:
http://cfbot.cputube.org/masaki-kuwamura.html
I'm sorry, I didn't know that. I attached both the test and fix patch to this mail.
(The fix patch is clearly Nathan-san's though)
If I'm still in a wrong way, please let me know.
Masaki Kuwamura
Attachment
> On 21 Sep 2023, at 03:53, Kuwamura Masaki <kuwamura@db.is.i.nagoya-u.ac.jp> wrote: > When sending an update, please include the previous patch as well with your new > tests as a 0002 patch in a patchset. The CFBot can only apply and build/test > patches when the entire patchset is attached to the email. The below > testresults indicate that the patch failed the new tests (which in a way is > good since without the fix the tests *should* fail), since the fix patch was > not applied: > > http://cfbot.cputube.org/masaki-kuwamura.html <http://cfbot.cputube.org/masaki-kuwamura.html> > I'm sorry, I didn't know that. I attached both the test and fix patch to this mail. No worries at all. If you look at the page now you will see all green checkmarks indicating that the patch was tested in CI. So now we know that your tests fail without the fix and work with the fix applied, so all is well. -- Daniel Gustafsson
No worries at all. If you look at the page now you will see all green
checkmarks indicating that the patch was tested in CI. So now we know that
your tests fail without the fix and work with the fix applied, so all is well.
Thank you for your kind words!
And it seems to me that all concerns are resolved.
I'll change the patch status to Ready for Committer.
If you have or find any flaw, let me know that.
I'll change the patch status to Ready for Committer.
If you have or find any flaw, let me know that.
Best Regards,
Masaki Kuwamura
> On 22 Sep 2023, at 11:08, Kuwamura Masaki <kuwamura@db.is.i.nagoya-u.ac.jp> wrote: > > No worries at all. If you look at the page now you will see all green > checkmarks indicating that the patch was tested in CI. So now we know that > your tests fail without the fix and work with the fix applied, so all is well. > > Thank you for your kind words! > > And it seems to me that all concerns are resolved. > I'll change the patch status to Ready for Committer. > If you have or find any flaw, let me know that. I had a look at this and tweaked the testcase a bit to make the diff smaller, as well as removed the (in some cases) superfluous space in the generated SQL query mentioned upthread. The attached two patches is what I propose we commit to fix this, with a backpatch to v16 where it was introduced. -- Daniel Gustafsson
Attachment
On Fri, Sep 22, 2023 at 02:58:20PM +0200, Daniel Gustafsson wrote: > I had a look at this and tweaked the testcase a bit to make the diff smaller, > as well as removed the (in some cases) superfluous space in the generated SQL > query mentioned upthread. The attached two patches is what I propose we commit > to fix this, with a backpatch to v16 where it was introduced. LGTM -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
LGTM too!
>> a bit to make the diff smaller,
I couldn't think from that perspective. Thanks for your update, Daniel-san.
Masaki Kuwamura
> On 24 Sep 2023, at 10:22, Kuwamura Masaki <kuwamura@db.is.i.nagoya-u.ac.jp> wrote: > > LGTM too! I've applied this down to v16 now, thanks for the submission! -- Daniel Gustafsson
I've applied this down to v16 now, thanks for the submission!
Thanks for pushing!
Masaki Kuwamura