Thread: Add important info about ANALYZE after create Functional Index

Add important info about ANALYZE after create Functional Index

From

Fabrízio de Royes Mello

Date:

26 October 2020, 22:08:14

Hi all,

As you all already know Postgres supports functions in index expressions (marked as immutable ofc) and for this special index the ANALYZE command creates some statistics (new pg_statistic entry) about it.

The problem is just after creating a new index or rebuilding concurrently (using the new REINDEX .. CONCURRENTLY or the old manner creating new one and then swapping) we need to run ANALYZE to update statistics but we don't mention it in any part of our documentation.

Last weekend Gitlab went down because the lack of an ANALYZE after rebuilding concurrently a functional index and they followed the recommendation we have into our documentation [1] about how to rebuild it concurrently, but we don't warn users about the ANALYZE after.

Would be nice if add some information about it into our docs but not sure where. I'm thinking about:

- doc/src/sgml/ref/create_index.sgml

- doc/src/sgml/maintenance.sgml (routine-reindex)

Thoughts?

[1] https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499

Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

Re: Add important info about ANALYZE after create Functional Index

From

"David G. Johnston"

Date:

26 October 2020, 22:46:10

On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:

Hi all,

As you all already know Postgres supports functions in index expressions (marked as immutable ofc) and for this special index the ANALYZE command creates some statistics (new pg_statistic entry) about it.

The problem is just after creating a new index or rebuilding concurrently (using the new REINDEX .. CONCURRENTLY or the old manner creating new one and then swapping) we need to run ANALYZE to update statistics but we don't mention it in any part of our documentation.

Last weekend Gitlab went down because the lack of an ANALYZE after rebuilding concurrently a functional index and they followed the recommendation we have into our documentation [1] about how to rebuild it concurrently, but we don't warn users about the ANALYZE after.

Would be nice if add some information about it into our docs but not sure where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Thoughts?

[1] https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499

It would seem preferable to call the lack of auto-analyzing after these operations a bug and back-patch a fix that injects an analyze side-effect just before their completion. It doesn't have to be smart either, analyzing things even if the created (or newly validated) index doesn't have statistics of its own isn't a problem in my book.

David J.

Re: Add important info about ANALYZE after create Functional Index

From

Nikolay Samokhvalov

Date:

27 October 2020, 01:29:07

On Mon, Oct 26, 2020 at 3:46 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

It would seem preferable to call the lack of auto-analyzing after these operations a bug and back-patch a fix that injects an analyze side-effect just before their completion. It doesn't have to be smart either, analyzing things even if the created (or newly validated) index doesn't have statistics of its own isn't a problem in my book.

+1 to consider it as a major problem of CREATE INDEX [CONCURRENTLY] for indexes on expressions, it's very easy to forget what I've observed many times.

Although, this triggers a question – should ANALYZE be automated in, say, pg_restore as well?

And another question: how ANALYZE needs to be run? If it's under the user's control, there is an option to use vacuumdb --analyze and benefit from using -j to parallelize the work (and, in some cases, benefit from using --analyze-in-stages). If we had ANALYZE as a part of building indexes on expressions, should it be parallelized to the same extent as index creation (controlled by max_parallel_maintenance_workers)?

Thanks,

Nik

Re: Add important info about ANALYZE after create Functional Index

From

"David G. Johnston"

Date:

27 October 2020, 02:03:54

On Monday, October 26, 2020, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:

Although, this triggers a question – should ANALYZE be automated in, say, pg_restore as well?

Independent concern.

And another question: how ANALYZE needs to be run? If it's under the user's control, there is an option to use vacuumdb --analyze and benefit from using -j to parallelize the work (and, in some cases, benefit from using --analyze-in-stages). If we had ANALYZE as a part of building indexes on expressions, should it be parallelized to the same extent as index creation (controlled by max_parallel_maintenance_workers)?

None of that seems relevant here. The only relevant parameter I see is what to specify for “table_and_columns”.

David J.

Re: Add important info about ANALYZE after create Functional Index

From

Nikolay Samokhvalov

Date:

27 October 2020, 04:44:42

On Mon, Oct 26, 2020 at 7:03 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

On Monday, October 26, 2020, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
Although, this triggers a question – should ANALYZE be automated in, say, pg_restore as well?

Independent concern.

It's the same class of issues – after we created some objects, we lack statistics and willing to automate its collection. If the approach is automated in one case, it should be automated in the others, for consistency.

And another question: how ANALYZE needs to be run? If it's under the user's control, there is an option to use vacuumdb --analyze and benefit from using -j to parallelize the work (and, in some cases, benefit from using --analyze-in-stages). If we had ANALYZE as a part of building indexes on expressions, should it be parallelized to the same extent as index creation (controlled by max_parallel_maintenance_workers)?

None of that seems relevant here. The only relevant parameter I see is what to specify for “table_and_columns”.

I'm not sure I follow.

Thanks,

Nik

Re: Add important info about ANALYZE after create Functional Index

From

Nikolay Samokhvalov

Date:

27 October 2020, 07:12:00

On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:

Would be nice if add some information about it into our docs but not sure where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Attaching the patches for the docs, one for 11 and older, and another for 12+ (which have REINDEX CONCURRENTLY not suffering from lack of ANALYZE).

I still think that automating is the right thing to do but of course, it's a much bigger topic that a quick fix dor the docs.

Attachment

Re: Add important info about ANALYZE after create Functional Index

From

Fabrízio de Royes Mello

Date:

27 October 2020, 14:06:22

On Mon, Oct 26, 2020 at 7:46 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> It would seem preferable to call the lack of auto-analyzing after these operations a bug and back-patch a fix that injects an analyze side-effect just before their completion. It doesn't have to be smart either, analyzing things even if the created (or newly validated) index doesn't have statistics of its own isn't a problem in my book.
>

When we create a new table or index they will not have statistics until an ANALYZE happens. This is the default behaviour and I think is not a big problem here, but we need to add some note on docs about the need of statistics for indexes on expressions.

But IMHO there is a misbehaviour with the implementation of CONCURRENTLY on REINDEX because running it will lose the statistics. Have a look the example below:

fabrizio=# SELECT version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
(1 row)

fabrizio=# CREATE TABLE t(f1 BIGSERIAL PRIMARY KEY, f2 TEXT) WITH (autovacuum_enabled = false);
CREATE TABLE
fabrizio=# INSERT INTO t(f2) SELECT repeat(chr(65+(random()*26)::int), (random()*300)::int) FROM generate_series(1, 10000);
INSERT 0 10000
fabrizio=# CREATE INDEX t_idx2 ON t(lower(f2));
CREATE INDEX
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid = 't_pkey'::regclass;
count
-------
0
(1 row)

fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid = 't_idx2'::regclass;
count
-------
0
(1 row)

fabrizio=# ANALYZE t;
ANALYZE
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid = 't_pkey'::regclass;
count
-------
0
(1 row)

fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid = 't_idx2'::regclass;
count
-------
1
(1 row)

fabrizio=# REINDEX INDEX t_idx2;
REINDEX
fabrizio=# REINDEX INDEX t_pkey;
REINDEX
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid = 't_pkey'::regclass;
count
-------
0
(1 row)

fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid = 't_idx2'::regclass;
count
-------
1
(1 row)

^^^^^^^^

-- A regular REINDEX don't lose the statistics.

fabrizio=# REINDEX INDEX CONCURRENTLY t_idx2;
REINDEX
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid = 't_idx2'::regclass;
count
-------
0
(1 row)

^^^^^^^^

-- But the REINDEX CONCURRENTLY loses.

So IMHO here is the place we should rework a bit to execute ANALYZE as a last step.

Regards,

Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

Re: Add important info about ANALYZE after create Functional Index

From

Fabrízio de Royes Mello

Date:

27 October 2020, 14:11:40

On Tue, Oct 27, 2020 at 4:12 AM Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
>
> On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:
>>
>> Would be nice if add some information about it into our docs but not sure where. I'm thinking about:
>> - doc/src/sgml/ref/create_index.sgml
>> - doc/src/sgml/maintenance.sgml (routine-reindex)
>
>
> Attaching the patches for the docs, one for 11 and older, and another for 12+ (which have REINDEX CONCURRENTLY not suffering from lack of ANALYZE).
>

Actually the REINDEX CONCURRENTLY suffers with the lack of ANALYZE. See my previous message on this thread.

So just adding the note on the ANALYZE docs is enough.

> I still think that automating is the right thing to do but of course, it's a much bigger topic that a quick fix dor the docs.

So what we need to do is see how to fix REINDEX CONCURRENTLY.

Regards,

Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

Re: Add important info about ANALYZE after create Functional Index

From

Michael Paquier

Date:

28 October 2020, 05:14:55

On Tue, Oct 27, 2020 at 11:06:22AM -0300, Fabrízio de Royes Mello wrote:
> When we create a new table or index they will not have statistics until an
> ANALYZE happens. This is the default behaviour and I think is not a big
> problem here, but we need to add some note on docs about the need of
> statistics for indexes on expressions.
>
> But IMHO there is a misbehaviour with the implementation of CONCURRENTLY on
> REINDEX because running it will lose the statistics. Have a look the
> example below:
>
> [...]
>
> So IMHO here is the place we should rework a bit to execute ANALYZE as a
> last step.

I agree that this is not user-friendly, and I suspect that we will
need to do something within index_concurrently_swap() to fill in the
stats of the new index from the data of the old one (not looked at
that in details yet).
--
Michael

Attachment

signature.asc

Re: Add important info about ANALYZE after create Functional Index

From

Fabrízio de Royes Mello

Date:

28 October 2020, 12:35:21

On Wed, Oct 28, 2020 at 2:15 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Oct 27, 2020 at 11:06:22AM -0300, Fabrízio de Royes Mello wrote:
> > When we create a new table or index they will not have statistics until an
> > ANALYZE happens. This is the default behaviour and I think is not a big
> > problem here, but we need to add some note on docs about the need of
> > statistics for indexes on expressions.
> >
> > But IMHO there is a misbehaviour with the implementation of CONCURRENTLY on
> > REINDEX because running it will lose the statistics. Have a look the
> > example below:
> >
> > [...]
> >
> > So IMHO here is the place we should rework a bit to execute ANALYZE as a
> > last step.
>
> I agree that this is not user-friendly, and I suspect that we will
> need to do something within index_concurrently_swap() to fill in the
> stats of the new index from the data of the old one (not looked at
> that in details yet).
>

We already do a similar thing for PgStats [1] so maybe we should also copy pg_statistics from old to new index during the swap.

But I'm not sure if it's totally safe anyway and would be better to create a new phase to issue ANALYZE if necessary (exists statistics for old index).

Regards,

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/catalog/index.c;h=0974f3e23a23726b63246cd3a1347e10923dd541;hb=HEAD#l1693

--

Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

Re: Add important info about ANALYZE after create Functional Index

From

Tomas Vondra

Date:

28 October 2020, 18:52:11

On Tue, Oct 27, 2020 at 11:06:22AM -0300, Fabrízio de Royes Mello wrote:
>On Mon, Oct 26, 2020 at 7:46 PM David G. Johnston <
>david.g.johnston@gmail.com> wrote:
>>
>> It would seem preferable to call the lack of auto-analyzing after these
>operations a bug and back-patch a fix that injects an analyze side-effect
>just before their completion.  It doesn't have to be smart either,
>analyzing things even if the created (or newly validated) index doesn't
>have statistics of its own isn't a problem in my book.
>>
>
>When we create a new table or index they will not have statistics until an
>ANALYZE happens. This is the default behaviour and I think is not a big
>problem here, but we need to add some note on docs about the need of
>statistics for indexes on expressions.
>

I think the problem is we notice when a table has not been analyzed yet
(and trigger an analyze), but we won't notice that for an index. So if
the table does not change very often, it may take ages before we build
stats for the index - not great.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add important info about ANALYZE after create Functional Index

From

Tomas Vondra

Date:

28 October 2020, 18:55:38

On Mon, Oct 26, 2020 at 03:46:10PM -0700, David G. Johnston wrote:
>On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <
>fabriziomello@gmail.com> wrote:
>
>> Hi all,
>>
>> As you all already know Postgres supports functions in index expressions
>> (marked as immutable ofc) and for this special index the ANALYZE command
>> creates some statistics (new pg_statistic entry) about it.
>>
>> The problem is just after creating a new index or rebuilding concurrently
>> (using the new REINDEX .. CONCURRENTLY or the old manner creating new one
>> and then swapping) we need to run ANALYZE to update statistics but we don't
>> mention it in any part of our documentation.
>>
>> Last weekend Gitlab went down because the lack of an ANALYZE after
>> rebuilding concurrently a functional index and they followed the
>> recommendation we have into our documentation [1] about how to rebuild it
>> concurrently, but we don't warn users about the ANALYZE after.
>>
>> Would be nice if add some information about it into our docs but not sure
>> where. I'm thinking about:
>> - doc/src/sgml/ref/create_index.sgml
>> - doc/src/sgml/maintenance.sgml (routine-reindex)
>>
>> Thoughts?
>>
>> [1]
>> https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499
>>
>
>It would seem preferable to call the lack of auto-analyzing after these
>operations a bug and back-patch a fix that injects an analyze side-effect
>just before their completion.  It doesn't have to be smart either,
>analyzing things even if the created (or newly validated) index doesn't
>have statistics of its own isn't a problem in my book.
>

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add important info about ANALYZE after create Functional Index

From

"David G. Johnston"

Date:

28 October 2020, 19:00:54

On Wed, Oct 28, 2020 at 11:55 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.

My guess is that it wouldn't be too difficult to write a patch that could be safely back-patched and it's worth doing so even if ultimately the decision is not to. But then again the patch writer isn't going to be me.

Given how simple the manual workaround is not having it be manual seems like it would be safe and straight-forward to implement.

David J.

Re: Add important info about ANALYZE after create Functional Index

From

Tom Lane

Date:

28 October 2020, 19:05:39

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Mon, Oct 26, 2020 at 03:46:10PM -0700, David G. Johnston wrote:
>> It would seem preferable to call the lack of auto-analyzing after these
>> operations a bug and back-patch a fix that injects an analyze side-effect
>> just before their completion.  It doesn't have to be smart either,
>> analyzing things even if the created (or newly validated) index doesn't
>> have statistics of its own isn't a problem in my book.

> I agree the lack of stats may be quite annoying and cause issues, but my
> guess is the chances of backpatching such change are about 0.000001%. We
> have a usable 'workaround' for this - manual analyze.

This doesn't seem clearly different from any other situation where
auto-analyze doesn't react fast enough to suit you.  I would not
call it a bug, at least not without a wholesale redefinition of
how auto-analyze is supposed to work.  As a close analogy, we
don't make any effort to force an immediate auto-analyze after
CREATE STATISTICS.

I don't see anything in the CREATE STATISTICS man page pointing
that out, either.  But there's probably room for "Notes" entries
about it in both places.

            regards, tom lane

Re: Add important info about ANALYZE after create Functional Index

From

"David G. Johnston"

Date:

28 October 2020, 19:07:52

On Mon, Oct 26, 2020 at 9:44 PM Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:

On Mon, Oct 26, 2020 at 7:03 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Monday, October 26, 2020, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
Although, this triggers a question – should ANALYZE be automated in, say, pg_restore as well?

Independent concern.

It's the same class of issues – after we created some objects, we lack statistics and willing to automate its collection. If the approach is automated in one case, it should be automated in the others, for consistency.

I don't see a need to force consistency between something that will affect, at most, one table, and something that will affect an entire database or cluster. The other material difference is that the previous state of a restore is "nothing" while in the create/reindex cases we are going from live, populated, state to another.

I do observe that while the create/reindex analyze would run automatically during the restore on object creation there would be no data present so it would be close to a no-op in practice.

And another question: how ANALYZE needs to be run? If it's under the user's control, there is an option to use vacuumdb --analyze and benefit from using -j to parallelize the work (and, in some cases, benefit from using --analyze-in-stages). If we had ANALYZE as a part of building indexes on expressions, should it be parallelized to the same extent as index creation (controlled by max_parallel_maintenance_workers)?

None of that seems relevant here. The only relevant parameter I see is what to specify for “table_and_columns”.

I'm not sure I follow.

Describe how parallelism within the session that is auto-analyzing is supposed to work. vaccuumdb opens up multiple connections which shouldn't happen here.

I suppose having the auto-analyze run three times with different targets would work but I'm doubting that is a win. I may just be underestimating how long an analyze on an extremely large table with high statistics takes.

David J.

Re: Add important info about ANALYZE after create Functional Index

From

Tomas Vondra

Date:

28 October 2020, 19:12:01

On Wed, Oct 28, 2020 at 12:00:54PM -0700, David G. Johnston wrote:
>On Wed, Oct 28, 2020 at 11:55 AM Tomas Vondra <tomas.vondra@2ndquadrant.com>
>wrote:
>
>> I agree the lack of stats may be quite annoying and cause issues, but my
>> guess is the chances of backpatching such change are about 0.000001%. We
>> have a usable 'workaround' for this - manual analyze.
>>
>
>My guess is that it wouldn't be too difficult to write a patch that could
>be safely back-patched and it's worth doing so even if ultimately the
>decision is not to.  But then again the patch writer isn't going to be me.
>
>Given how simple the manual workaround is not having it be manual seems
>like it would be safe and straight-forward to implement.
>

Maybe, but I wouldn't be surprised if it was actually a bit trickier in
practice, particularly for the CONCURRENTLY case. But I haven't tried.

Anyway, I think there's an agreement it'd be valuable to do this after
CREATE INDEX in the future, so if someone wants to implement it that'd
be great. We can consider backpatching only once we have an actual patch
anyway.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add important info about ANALYZE after create Functional Index

From

Tomas Vondra

Date:

28 October 2020, 19:13:48

On Wed, Oct 28, 2020 at 03:05:39PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Mon, Oct 26, 2020 at 03:46:10PM -0700, David G. Johnston wrote:
>>> It would seem preferable to call the lack of auto-analyzing after these
>>> operations a bug and back-patch a fix that injects an analyze side-effect
>>> just before their completion.  It doesn't have to be smart either,
>>> analyzing things even if the created (or newly validated) index doesn't
>>> have statistics of its own isn't a problem in my book.
>
>> I agree the lack of stats may be quite annoying and cause issues, but my
>> guess is the chances of backpatching such change are about 0.000001%. We
>> have a usable 'workaround' for this - manual analyze.
>
>This doesn't seem clearly different from any other situation where
>auto-analyze doesn't react fast enough to suit you.  I would not
>call it a bug, at least not without a wholesale redefinition of
>how auto-analyze is supposed to work.  As a close analogy, we
>don't make any effort to force an immediate auto-analyze after
>CREATE STATISTICS.
>

True.

>I don't see anything in the CREATE STATISTICS man page pointing
>that out, either.  But there's probably room for "Notes" entries
>about it in both places.
>

I agree. I'll add it to my TODO list for the next CF.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add important info about ANALYZE after create Functional Index

From

"David G. Johnston"

Date:

28 October 2020, 19:14:27

On Wed, Oct 28, 2020 at 12:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

This doesn't seem clearly different from any other situation where
auto-analyze doesn't react fast enough to suit you.

I would not
call it a bug, at least not without a wholesale redefinition of
how auto-analyze is supposed to work.

The definition of auto-analyze is just fine; the issue is with the user unfriendly position that the only times analyze is ever run is when it is run manually or heuristically in a separate process. I agree that this isn't a bug in the traditional sense - the current behavior is intentional - but it is a POLA violation.

The fundamental question here is do we want to change our policy in this regard and make our system more user-friendly? If so, let's do so for v14 in honor of the problem the lack of documentation and POLA violation has recently caused.

Then, as a separate concern, should we admit the oversight and back-patch our policy change or just move forward and add documentation to older versions?

As a close analogy, we
don't make any effort to force an immediate auto-analyze after
CREATE STATISTICS.

At least we have been consistent...

David J.

Re: Add important info about ANALYZE after create Functional Index

From

Tom Lane

Date:

28 October 2020, 19:18:52

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Wed, Oct 28, 2020 at 12:00:54PM -0700, David G. Johnston wrote:
>> Given how simple the manual workaround is not having it be manual seems
>> like it would be safe and straight-forward to implement.

> Maybe, but I wouldn't be surprised if it was actually a bit trickier in
> practice, particularly for the CONCURRENTLY case. But I haven't tried.

> Anyway, I think there's an agreement it'd be valuable to do this after
> CREATE INDEX in the future, so if someone wants to implement it that'd
> be great. We can consider backpatching only once we have an actual patch
> anyway.

Just to be clear, I'm entirely *not* on board with that.  IMV it's
intentional that we do not force auto-analyze activity after CREATE
INDEX or CREATE STATISTICS.  If we change that, people will want a
way to opt out of it, and then your "simple" patch isn't so simple
anymore.  (Not that it was simple anyway.  What if the CREATE is
inside a transaction block, for instance?  There's no use in
kicking autovacuum before commit.)

            regards, tom lane

Re: Add important info about ANALYZE after create Functional Index

From

Tomas Vondra

Date:

28 October 2020, 19:35:46

On Wed, Oct 28, 2020 at 03:18:52PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Wed, Oct 28, 2020 at 12:00:54PM -0700, David G. Johnston wrote:
>>> Given how simple the manual workaround is not having it be manual seems
>>> like it would be safe and straight-forward to implement.
>
>> Maybe, but I wouldn't be surprised if it was actually a bit trickier in
>> practice, particularly for the CONCURRENTLY case. But I haven't tried.
>
>> Anyway, I think there's an agreement it'd be valuable to do this after
>> CREATE INDEX in the future, so if someone wants to implement it that'd
>> be great. We can consider backpatching only once we have an actual patch
>> anyway.
>
>Just to be clear, I'm entirely *not* on board with that.  IMV it's
>intentional that we do not force auto-analyze activity after CREATE
>INDEX or CREATE STATISTICS.  If we change that, people will want a
>way to opt out of it, and then your "simple" patch isn't so simple
>anymore.

True. Some users may have reasons to not want the analyze, I guess.

>  (Not that it was simple anyway.  What if the CREATE is
>inside a transaction block, for instance?  There's no use in
>kicking autovacuum before commit.)
>

I don't think anyone proposed to do this through autovacuum. There was a
reference to auto-analyze but I think that was meant as 'run analyze
automatically.' Which would work in transactions just fine, I think.

But I agree it'd likely be a more complicated patch than it might seem
at first glance.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add important info about ANALYZE after create Functional Index

From

Fabrízio de Royes Mello

Date:

28 October 2020, 20:43:08

On Wed, Oct 28, 2020 at 4:35 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> I don't think anyone proposed to do this through autovacuum. There was a
> reference to auto-analyze but I think that was meant as 'run analyze
> automatically.' Which would work in transactions just fine, I think.
>

Maybe I was not very clear at the beginning so will try to clarify my thoughts:

1) We should add notes on our docs about the need to issue ANALYZE after creating indexes using expressions and create extended statistics. Nikolay sent a patch upthread and we can work on it and back patch.

2) REINDEX CONCURRENTLY does not keep statistics (pg_statistc) like a regular REINDEX for indexes using expressions and to me it's a bug. Michael pointed out upthread that maybe we should rework a bit index_concurrently_swap() to copy statistics from old index to new one.

> But I agree it'd likely be a more complicated patch than it might seem

> at first glance.
>

If we think about a way to kick AutoAnalyze for sure it will be a more complicated task but IMHO for now we can do it simply just by copying statistics like I mentioned above.

Regards,

--
Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: Add important info about ANALYZE after create Functional Index

From

Tomas Vondra

Date:

28 October 2020, 23:02:11

On Wed, Oct 28, 2020 at 05:43:08PM -0300, Fabrízio de Royes Mello wrote:
>On Wed, Oct 28, 2020 at 4:35 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
>wrote:
>>
>> I don't think anyone proposed to do this through autovacuum. There was a
>> reference to auto-analyze but I think that was meant as 'run analyze
>> automatically.' Which would work in transactions just fine, I think.
>>
>
>Maybe I was not very clear at the beginning so will try to clarify my
>thoughts:
>
>1) We should add notes on our docs about the need to issue ANALYZE after
>creating indexes using expressions and create extended statistics. Nikolay
>sent a patch upthread and we can work on it and back patch.
>

+1

>2) REINDEX CONCURRENTLY does not keep statistics (pg_statistc) like a
>regular REINDEX for indexes using expressions and to me it's a bug. Michael
>pointed out upthread that maybe we should rework a bit
>index_concurrently_swap() to copy statistics from old index to new one.
>

Yeah. Not sure it counts as a bug, but I see what you mean - it's
definitely an unexpected/undesirable difference in behavior between
plain REINDEX and concurrent one.

>
>> But I agree it'd likely be a more complicated patch than it might seem
>> at first glance.
>>
>
>If we think about a way to kick AutoAnalyze for sure it will be a more
>complicated task but IMHO for now we can do it simply just by copying
>statistics like I mentioned above.
>

I very much doubt just we can rely on autoanalyze here. For one, it'll
have issues with transactions, as Tom already pointed out elsewhere in
this thread. So if you do a reindex after a bulk load in a transaction,
followed by some report queries, autoanalyze is not going to help.

But it has another issue - there may not be any free autovacuum workers,
so it'd have to wait for unknown amount of time. In fact, it'd have to
wait for the autovacuum worker to actually do the analyze, otherwise we
could still have unpredictable behavior for queries immediately after
the REINDEX, even outside transactions. That's not good, so this would
have to do an actual analyze I think.

But as Tom pointed out, the automatic analyze may be against wishes of
some users, and there are other similar cases that don't trigger analyze
(CREATE STATISTICS). So not sure about this.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add important info about ANALYZE after create Functional Index

From

Michael Paquier

Date:

29 October 2020, 01:59:52

On Thu, Oct 29, 2020 at 12:02:11AM +0100, Tomas Vondra wrote:
> On Wed, Oct 28, 2020 at 05:43:08PM -0300, Fabrízio de Royes Mello wrote:
>> 2) REINDEX CONCURRENTLY does not keep statistics (pg_statistc) like a
>> regular REINDEX for indexes using expressions and to me it's a bug. Michael
>> pointed out upthread that maybe we should rework a bit
>> index_concurrently_swap() to copy statistics from old index to new one.
>
> Yeah. Not sure it counts as a bug, but I see what you mean - it's
> definitely an unexpected/undesirable difference in behavior between
> plain REINDEX and concurrent one.

REINDEX CONCURRENTLY is by design wanted to provide an experience
transparent to the user similar to what a plain REINDEX would do, at
least that's the idea behind it, so..  This qualifies as a bug to me,
in spirit.
--
Michael

Attachment

signature.asc

Re: Add important info about ANALYZE after create Functional Index

From

Michael Paquier

Date:

30 October 2020, 06:22:52

On Thu, Oct 29, 2020 at 10:59:52AM +0900, Michael Paquier wrote:
> REINDEX CONCURRENTLY is by design wanted to provide an experience
> transparent to the user similar to what a plain REINDEX would do, at
> least that's the idea behind it, so..  This qualifies as a bug to me,
> in spirit.

And in spirit, it is possible to address this issue with the patch
attached which copies the set of stats from the old to the new index.
For a non-concurrent REINDEX, this does not happen because we keep the
same base relation, while we replace completely the relation with a
concurrent operation.  We have a RemoveStatistics() in heap.c, but I
did not really see the point to invent a copy flavor for this
particular case.  Perhaps others have an opinion on that?
--
Michael

Attachment

Re: Add important info about ANALYZE after create Functional Index

From

Fabrízio de Royes Mello

Date:

31 October 2020, 22:56:33

On Fri, Oct 30, 2020 at 3:22 AM Michael Paquier <michael@paquier.xyz> wrote:
>

> And in spirit, it is possible to address this issue with the patch
> attached which copies the set of stats from the old to the new index.

Did some tests and everything went ok... some comments below!

> For a non-concurrent REINDEX, this does not happen because we keep the

> same base relation, while we replace completely the relation with a

> concurrent operation.

Exactly!

> We have a RemoveStatistics() in heap.c, but I

> did not really see the point to invent a copy flavor for this
> particular case. Perhaps others have an opinion on that?
>

Even if we won't use it now, IMHO it is more legible to separate this responsibility into its own CopyStatistics function as attached.

Regards,

--

Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

Attachment

reindex-stats-v2.patch

Re: Add important info about ANALYZE after create Functional Index

From

Michael Paquier

Date:

01 November 2020, 00:23:44

On Sat, Oct 31, 2020 at 07:56:33PM -0300, Fabrízio de Royes Mello wrote:
> Even if we won't use it now, IMHO it is more legible to separate this
> responsibility into its own CopyStatistics function as attached.

By doing so, there is no need to include pg_statistic.h in index.c.
Except that, the logic looks fine at quick glance.  In the long-term,
I also think that it would make sense to move both routnes out of
heap.c into a separate pg_statistic.c.  That's material for a separate
patch of course.
--
Michael

Attachment

signature.asc

Re: Add important info about ANALYZE after create Functional Index

From

Michael Paquier

Date:

01 November 2020, 12:29:02

On Sun, Nov 01, 2020 at 09:23:44AM +0900, Michael Paquier wrote:
> By doing so, there is no need to include pg_statistic.h in index.c.
> Except that, the logic looks fine at quick glance.  In the long-term,
> I also think that it would make sense to move both routnes out of
> heap.c into a separate pg_statistic.c.  That's material for a separate
> patch of course.

I have looked again at that, and applied it after some tweaks.
Particularly, I did not really like the use of "old" and "new" for the
copy from the old to a new relation in the new function, so I have
replaced that by "from" and "to".
--
Michael

Attachment

signature.asc

Re: Add important info about ANALYZE after create Functional Index

From

Fabrízio de Royes Mello

Date:

01 November 2020, 16:53:23

On Sun, 1 Nov 2020 at 09:29 Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Nov 01, 2020 at 09:23:44AM +0900, Michael Paquier wrote:
> By doing so, there is no need to include pg_statistic.h in index.c.
> Except that, the logic looks fine at quick glance. In the long-term,
> I also think that it would make sense to move both routnes out of
> heap.c into a separate pg_statistic.c. That's material for a separate
> patch of course.

I have looked again at that, and applied it after some tweaks.
Particularly, I did not really like the use of "old" and "new" for the
copy from the old to a new relation in the new function, so I have
replaced that by "from" and "to".

Awesome thanks!!

Regards,

Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: Add important info about ANALYZE after create Functional Index

From

Bruce Momjian

Date:

09 November 2020, 23:27:20

On Tue, Oct 27, 2020 at 12:12:00AM -0700, Nikolay Samokhvalov wrote:
> On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <
> fabriziomello@gmail.com> wrote:
> 
>     Would be nice if add some information about it into our docs but not sure
>     where. I'm thinking about:
>     - doc/src/sgml/ref/create_index.sgml
>     - doc/src/sgml/maintenance.sgml (routine-reindex)
> 
> 
> Attaching the patches for the docs, one for 11 and older, and another for 12+
> (which have REINDEX CONCURRENTLY not suffering from lack of ANALYZE).
> 
> I still think that automating is the right thing to do but of course, it's a
> much bigger topic that a quick fix dor the docs.

I see REINDEX CONCURRENTLY was fixed in head, but the docs didn't get
updated to mention the need to run ANALYZE or wait for autovacuum before
expression indexes can be fully used by the optimizer.  Instead of
putting this mention in the maintenance section, I thought the CREATE
INDEX page make more sense, since it is more of a usability issue,
rather than "why use expression indexes".  Patch attached, which I plan
to apply to all supported branches.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Attachment

expindex.diff

Re: Add important info about ANALYZE after create Functional Index

From

Fabrízio de Royes Mello

Date:

09 November 2020, 23:35:46

On Mon, 9 Nov 2020 at 20:27 Bruce Momjian <bruce@momjian.us> wrote:

I see REINDEX CONCURRENTLY was fixed in head, but the docs didn't get
updated to mention the need to run ANALYZE or wait for autovacuum before
expression indexes can be fully used by the optimizer. Instead of
putting this mention in the maintenance section, I thought the CREATE
INDEX page make more sense, since it is more of a usability issue,
rather than "why use expression indexes". Patch attached, which I plan
to apply to all supported branches.

Did a quick review and totally agree... thanks a lot Bruce to help us don’t miss it.

Regards,

Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: Add important info about ANALYZE after create Functional Index

From

Bruce Momjian

Date:

12 November 2020, 20:01:10

On Mon, Nov  9, 2020 at 08:35:46PM -0300, Fabrízio de Royes Mello wrote:
> 
> 
> On Mon, 9 Nov 2020 at 20:27 Bruce Momjian <bruce@momjian.us> wrote:
> 
> 
>     I see REINDEX CONCURRENTLY was fixed in head, but the docs didn't get
>     updated to mention the need to run ANALYZE or wait for autovacuum before
>     expression indexes can be fully used by the optimizer.  Instead of
>     putting this mention in the maintenance section, I thought the CREATE
>     INDEX page make more sense, since it is more of a usability issue,
>     rather than "why use expression indexes".  Patch attached, which I plan
>     to apply to all supported branches.
>        
> 
> 
> Did a quick review and totally agree... thanks a lot Bruce to help us don’t
> miss it.

Patch applied to all branches.  Thanks for the review.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Add important info about ANALYZE after create Functional Index

From

Alvaro Herrera

Date:

16 November 2020, 14:59:03

On 2020-Nov-12, Bruce Momjian wrote:

>     For new expression indexes, it is necessary to run <link
>     linkend="sql-analyze"><command>ANALYZE</command></link> or wait for
>     the <link linkend="autovacuum">autovacuum daemon</link> to analyze
> -   the table to generate statistics about new expression indexes.
> +   the table to generate statistics for these indexes.

Looks good to me.

Re: Add important info about ANALYZE after create Functional Index

From

Bruce Momjian

Date:

16 November 2020, 15:26:37

On Mon, Nov 16, 2020 at 11:59:03AM -0300, Álvaro Herrera wrote:
> On 2020-Nov-12, Bruce Momjian wrote:
> 
> >     For new expression indexes, it is necessary to run <link
> >     linkend="sql-analyze"><command>ANALYZE</command></link> or wait for
> >     the <link linkend="autovacuum">autovacuum daemon</link> to analyze
> > -   the table to generate statistics about new expression indexes.
> > +   the table to generate statistics for these indexes.
> 
> Looks good to me.

Applied to all branches.  Thanks for the review.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee