Thread: Query plan not updated after dropped index

Query plan not updated after dropped index

From

Victor Blomqvist

Date:

18 February 2016, 08:17:07

Hello!

We just had a major issue on our databases, after a index was replaced a user defined function didnt change its query plan to use the new index. At least this is our theory, since the function in question became much slower and as a result brought our system to a halt.

Basically it went:

1. create new index (a simple btree on a bigint column index)

2. drop old index
3. rename new index to old index name

3. analyze table

After these steps normally our functions will update their plans and use the new index just fine. However this time the function (only one function use this particular index) seemed to take forever to complete. This is a 40GB table so querying for something not indexed would take a long time. Therefor my suspicion is that the function didnt start to use the new index.

Adding to the strangeness is that if I ran the function manually it was fast, only when called from our application through pg_bouncer it was slow. I should also say that the function is only used on our 3 read slaves setup to our database.

Things we tried to fix this:

1. Analyze table

2. Restart our application

3. Recreate the function

4. Kill the slow running queries with pg_cancel_backend()

These things did not help.

Instead what helped in the end was to replace the function with an extra useless where clause (in the hope that it would force it to create a new plan)

So, the function only have a single SELECT inside:

RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X

LIMIT 100 OFFSET 0;

And this is my modification that made it work again:

RETURN QUERY

SELECT * FROM table
WHERE bigint_column = X AND 1=1
LIMIT 100 OFFSET 0;

Obviously we are now worried why this happened and how we can avoid it in the future? We run Postgres 9.3 on CentOS 6.

Thanks!

Victor

Re: Query plan not updated after dropped index

From

Oleg Bartunov

Date:

18 February 2016, 09:21:29

On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist <vb@viblo.se> wrote:

Hello!

We just had a major issue on our databases, after a index was replaced a user defined function didnt change its query plan to use the new index. At least this is our theory, since the function in question became much slower and as a result brought our system to a halt.

Basically it went:
1. create new index (a simple btree on a bigint column index)
2. drop old index
3. rename new index to old index name

why do you did this !?

3. analyze table

After these steps normally our functions will update their plans and use the new index just fine. However this time the function (only one function use this particular index) seemed to take forever to complete. This is a 40GB table so querying for something not indexed would take a long time. Therefor my suspicion is that the function didnt start to use the new index.

Adding to the strangeness is that if I ran the function manually it was fast, only when called from our application through pg_bouncer it was slow. I should also say that the function is only used on our 3 read slaves setup to our database.

Things we tried to fix this:
1. Analyze table
2. Restart our application
3. Recreate the function
4. Kill the slow running queries with pg_cancel_backend()

These things did not help.

Instead what helped in the end was to replace the function with an extra useless where clause (in the hope that it would force it to create a new plan)

So, the function only have a single SELECT inside:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X
LIMIT 100 OFFSET 0;

And this is my modification that made it work again:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X AND 1=1
LIMIT 100 OFFSET 0;

Obviously we are now worried why this happened and how we can avoid it in the future? We run Postgres 9.3 on CentOS 6.

Thanks!
Victor

Re: Query plan not updated after dropped index

From

Victor Blomqvist

Date:

18 February 2016, 10:32:56

The end goal is to get rid of index bloat. If there is a better way to handle this Im all ears!

/Victor

On Thu, Feb 18, 2016 at 5:21 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist <vb@viblo.se> wrote:
Hello!

We just had a major issue on our databases, after a index was replaced a user defined function didnt change its query plan to use the new index. At least this is our theory, since the function in question became much slower and as a result brought our system to a halt.

Basically it went:
1. create new index (a simple btree on a bigint column index)
2. drop old index
3. rename new index to old index name

why do you did this !?

3. analyze table

After these steps normally our functions will update their plans and use the new index just fine. However this time the function (only one function use this particular index) seemed to take forever to complete. This is a 40GB table so querying for something not indexed would take a long time. Therefor my suspicion is that the function didnt start to use the new index.

Adding to the strangeness is that if I ran the function manually it was fast, only when called from our application through pg_bouncer it was slow. I should also say that the function is only used on our 3 read slaves setup to our database.

Things we tried to fix this:
1. Analyze table
2. Restart our application
3. Recreate the function
4. Kill the slow running queries with pg_cancel_backend()

These things did not help.

Instead what helped in the end was to replace the function with an extra useless where clause (in the hope that it would force it to create a new plan)

So, the function only have a single SELECT inside:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X
LIMIT 100 OFFSET 0;

And this is my modification that made it work again:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X AND 1=1
LIMIT 100 OFFSET 0;

Obviously we are now worried why this happened and how we can avoid it in the future? We run Postgres 9.3 on CentOS 6.

Thanks!
Victor

Re: Query plan not updated after dropped index

From

Vitaly Burovoy

Date:

18 February 2016, 12:12:02

On 2/18/16, Victor Blomqvist <vb@viblo.se> wrote:
> Hello!
>
> We just had a major issue on our databases, after a index was replaced a
> user defined function didnt change its query plan to use the new index. At
> least this is our theory, since the function in question became much slower
> and as a result brought our system to a halt.
>
> Basically it went:
> 1. create new index (a simple btree on a bigint column index)
> 2. drop old index
> 3. rename new index to old index name
> 4. analyze table
>
> After these steps normally our functions will update their plans and use
> the new index just fine. However this time the function (only one function
> use this particular index) seemed to take forever to complete. This is a
> 40GB table so querying for something not indexed would take a long time.
> Therefore my suspicion is that the function didnt start to use the new
> index.

My guess is that backends somehow cached generic plan[1] and didn't
recalculate it.

> Adding to the strangeness is that if I ran the function manually it was
> fast,

It is because _backends_ (processes) cache plans, not DBMS (i.e. they
are not shared).
So you connected to DB (making a new backend process), run the
function; backend could not find cached plan and create it for itself.
Plan for _your_ connection includes the new index.

> only when called from our application through pg_bouncer it was slow.
> I should also say that the function is only used on our 3 read slaves setup
> to our database.
>
> Things we tried to fix this:
> 1. Analyze table
> 2. Restart our application
> 3. Recreate the function
> 4. Kill the slow running queries with pg_cancel_backend()
>
> These things did not help.

Since pgbouncer reuses connections then backends processes still have
cached plan. If you tried pg_terminate_backend() it could help.

> Instead what helped in the end was to replace the function with an extra
> useless where clause (in the hope that it would force it to create a new
> plan)
>
> So, the function only have a single SELECT inside:
> RETURN QUERY
> SELECT * FROM table
>   WHERE bigint_column = X
>   LIMIT 100 OFFSET 0;
>
> And this is my modification that made it work again:
> RETURN QUERY
> SELECT * FROM table
>   WHERE bigint_column = X AND 1=1
>   LIMIT 100 OFFSET 0;

Yes, it is a new query for PG, and therefore it requires a new plan
because it is not in a cache.

> Obviously we are now worried why this happened

Also my guess you did CREATE INDEX CONCURRENTLY and there is several
cases[2] when it can not be used ("invalid" state or waiting for
unfinished transactions).
When the old index is dropped but the new index is not accessible
while a query/function is running a generated plan does not include
that index. And in case of caching such plan is caching and uses later
without index too.

> and how we can avoid it in
> the future? We run Postgres 9.3 on CentOS 6.

Firstly you can drop the old index not immediately but a little later
depending on yours queries time.
Also after creating the new index (and possible waiting a little) you
can drop index in a _transaction_ and see whether the new index is
used in an EXPLAIN of any query that use it or not. In the first case
do COMMIT, in the second case just do ROLLBACK and leave old index for
using.

> Thanks!
> Victor

[1]http://www.postgresql.org/docs/devel/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
[2]http://www.postgresql.org/docs/devel/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
--
Best regards,
Vitaly Burovoy

Re: Query plan not updated after dropped index

From

Tom Lane

Date:

18 February 2016, 15:05:38

Victor Blomqvist <vb@viblo.se> writes:
> We just had a major issue on our databases, after a index was replaced a
> user defined function didnt change its query plan to use the new index.

I'm suspicious that this is some variant of the problem discussed a couple
days ago:

http://www.postgresql.org/message-id/flat/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=DBHf4L6rkfjtEJ_w@mail.gmail.com

However, that theory requires that the index not have been immediately
usable, which implies that it initially had some broken HOT chains,
which really should not have happened if you were simply replacing one
index with an identical one.  (The pre-existing index should've been
enough to ensure HOT chain consistency for its columns.)

Perhaps you were doing something "cute" like replacing a single-column
index with a multi-column one?

            regards, tom lane

Re: Query plan not updated after dropped index

From

Victor Blomqvist

Date:

19 February 2016, 03:45:55

On Thu, Feb 18, 2016 at 11:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Victor Blomqvist <vb@viblo.se> writes:
> We just had a major issue on our databases, after a index was replaced a
> user defined function didnt change its query plan to use the new index.

I'm suspicious that this is some variant of the problem discussed a couple
days ago:

http://www.postgresql.org/message-id/flat/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=DBHf4L6rkfjtEJ_w@mail.gmail.com

However, that theory requires that the index not have been immediately
usable, which implies that it initially had some broken HOT chains,
which really should not have happened if you were simply replacing one
index with an identical one. (The pre-existing index should've been
enough to ensure HOT chain consistency for its columns.)

Perhaps you were doing something "cute" like replacing a single-column
index with a multi-column one?

No the new index looked exactly as the old one. The index was created with CONCURRENTLY, and we waited until it returned.

I could use the index just fine when running a query in a separate connection, so I am not sure if it helps dropping the old index in a transaction and run a query to verify that the new index is usable? How can I know that the new index is usable from already open connections?

Another thing to note about this index is that it sometimes has an unexpected zero page inside, resulting in this error: ERROR: index "user_pictures_picture_dhash_idx" contains unexpected zero page at block 123780. But it always happens sporadically on hour read slaves, and rerunning the query that cause it doesnt reproduce the error. I asked about this problem before, here:
http://www.postgresql.org/message-id/flat/CAL870DVXR9fHkyEJ5sMydK4pJUPL5kWabUhSPbTQeK03gdDcqA@mail.gmail.com#CAL870DVXR9fHkyEJ5sMydK4pJUPL5kWabUhSPbTQeK03gdDcqA@mail.gmail.com

regards, tom lane