Thread: Query plan not updated after dropped index
Hello!
We just had a major issue on our databases, after a index was replaced a user defined function didnt change its query plan to use the new index. At least this is our theory, since the function in question became much slower and as a result brought our system to a halt.Basically it went:
1. create new index (a simple btree on a bigint column index)
2. drop old index
3. rename new index to old index name
3. rename new index to old index name
3. analyze table
After these steps normally our functions will update their plans and use the new index just fine. However this time the function (only one function use this particular index) seemed to take forever to complete. This is a 40GB table so querying for something not indexed would take a long time. Therefor my suspicion is that the function didnt start to use the new index.
Adding to the strangeness is that if I ran the function manually it was fast, only when called from our application through pg_bouncer it was slow. I should also say that the function is only used on our 3 read slaves setup to our database.
Things we tried to fix this:
1. Analyze table
2. Restart our application
3. Recreate the function
4. Kill the slow running queries with pg_cancel_backend()
These things did not help.
Instead what helped in the end was to replace the function with an extra useless where clause (in the hope that it would force it to create a new plan)
So, the function only have a single SELECT inside:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X
SELECT * FROM table
WHERE bigint_column = X
LIMIT 100 OFFSET 0;
And this is my modification that made it work again:
RETURN QUERY
SELECT * FROM table
WHERE bigint_column = X AND 1=1
LIMIT 100 OFFSET 0;
Obviously we are now worried why this happened and how we can avoid it in the future? We run Postgres 9.3 on CentOS 6.WHERE bigint_column = X AND 1=1
LIMIT 100 OFFSET 0;
Thanks!
Victor
On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist <vb@viblo.se> wrote:
Hello!We just had a major issue on our databases, after a index was replaced a user defined function didnt change its query plan to use the new index. At least this is our theory, since the function in question became much slower and as a result brought our system to a halt.Basically it went:1. create new index (a simple btree on a bigint column index)2. drop old index
3. rename new index to old index name
why do you did this !?
3. analyze tableAfter these steps normally our functions will update their plans and use the new index just fine. However this time the function (only one function use this particular index) seemed to take forever to complete. This is a 40GB table so querying for something not indexed would take a long time. Therefor my suspicion is that the function didnt start to use the new index.Adding to the strangeness is that if I ran the function manually it was fast, only when called from our application through pg_bouncer it was slow. I should also say that the function is only used on our 3 read slaves setup to our database.Things we tried to fix this:1. Analyze table2. Restart our application3. Recreate the function4. Kill the slow running queries with pg_cancel_backend()These things did not help.Instead what helped in the end was to replace the function with an extra useless where clause (in the hope that it would force it to create a new plan)So, the function only have a single SELECT inside:RETURN QUERY
SELECT * FROM table
WHERE bigint_column = XLIMIT 100 OFFSET 0;And this is my modification that made it work again:RETURN QUERYSELECT * FROM tableObviously we are now worried why this happened and how we can avoid it in the future? We run Postgres 9.3 on CentOS 6.
WHERE bigint_column = X AND 1=1
LIMIT 100 OFFSET 0;Thanks!Victor
The end goal is to get rid of index bloat. If there is a better way to handle this Im all ears!
/VictorOn Thu, Feb 18, 2016 at 5:21 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist <vb@viblo.se> wrote:Hello!We just had a major issue on our databases, after a index was replaced a user defined function didnt change its query plan to use the new index. At least this is our theory, since the function in question became much slower and as a result brought our system to a halt.Basically it went:1. create new index (a simple btree on a bigint column index)2. drop old index
3. rename new index to old index namewhy do you did this !?3. analyze tableAfter these steps normally our functions will update their plans and use the new index just fine. However this time the function (only one function use this particular index) seemed to take forever to complete. This is a 40GB table so querying for something not indexed would take a long time. Therefor my suspicion is that the function didnt start to use the new index.Adding to the strangeness is that if I ran the function manually it was fast, only when called from our application through pg_bouncer it was slow. I should also say that the function is only used on our 3 read slaves setup to our database.Things we tried to fix this:1. Analyze table2. Restart our application3. Recreate the function4. Kill the slow running queries with pg_cancel_backend()These things did not help.Instead what helped in the end was to replace the function with an extra useless where clause (in the hope that it would force it to create a new plan)So, the function only have a single SELECT inside:RETURN QUERY
SELECT * FROM table
WHERE bigint_column = XLIMIT 100 OFFSET 0;And this is my modification that made it work again:RETURN QUERYSELECT * FROM tableObviously we are now worried why this happened and how we can avoid it in the future? We run Postgres 9.3 on CentOS 6.
WHERE bigint_column = X AND 1=1
LIMIT 100 OFFSET 0;Thanks!Victor
On 2/18/16, Victor Blomqvist <vb@viblo.se> wrote: > Hello! > > We just had a major issue on our databases, after a index was replaced a > user defined function didnt change its query plan to use the new index. At > least this is our theory, since the function in question became much slower > and as a result brought our system to a halt. > > Basically it went: > 1. create new index (a simple btree on a bigint column index) > 2. drop old index > 3. rename new index to old index name > 4. analyze table > > After these steps normally our functions will update their plans and use > the new index just fine. However this time the function (only one function > use this particular index) seemed to take forever to complete. This is a > 40GB table so querying for something not indexed would take a long time. > Therefore my suspicion is that the function didnt start to use the new > index. My guess is that backends somehow cached generic plan[1] and didn't recalculate it. > Adding to the strangeness is that if I ran the function manually it was > fast, It is because _backends_ (processes) cache plans, not DBMS (i.e. they are not shared). So you connected to DB (making a new backend process), run the function; backend could not find cached plan and create it for itself. Plan for _your_ connection includes the new index. > only when called from our application through pg_bouncer it was slow. > I should also say that the function is only used on our 3 read slaves setup > to our database. > > Things we tried to fix this: > 1. Analyze table > 2. Restart our application > 3. Recreate the function > 4. Kill the slow running queries with pg_cancel_backend() > > These things did not help. Since pgbouncer reuses connections then backends processes still have cached plan. If you tried pg_terminate_backend() it could help. > Instead what helped in the end was to replace the function with an extra > useless where clause (in the hope that it would force it to create a new > plan) > > So, the function only have a single SELECT inside: > RETURN QUERY > SELECT * FROM table > WHERE bigint_column = X > LIMIT 100 OFFSET 0; > > And this is my modification that made it work again: > RETURN QUERY > SELECT * FROM table > WHERE bigint_column = X AND 1=1 > LIMIT 100 OFFSET 0; Yes, it is a new query for PG, and therefore it requires a new plan because it is not in a cache. > Obviously we are now worried why this happened Also my guess you did CREATE INDEX CONCURRENTLY and there is several cases[2] when it can not be used ("invalid" state or waiting for unfinished transactions). When the old index is dropped but the new index is not accessible while a query/function is running a generated plan does not include that index. And in case of caching such plan is caching and uses later without index too. > and how we can avoid it in > the future? We run Postgres 9.3 on CentOS 6. Firstly you can drop the old index not immediately but a little later depending on yours queries time. Also after creating the new index (and possible waiting a little) you can drop index in a _transaction_ and see whether the new index is used in an EXPLAIN of any query that use it or not. In the first case do COMMIT, in the second case just do ROLLBACK and leave old index for using. > Thanks! > Victor [1]http://www.postgresql.org/docs/devel/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING [2]http://www.postgresql.org/docs/devel/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY -- Best regards, Vitaly Burovoy
Victor Blomqvist <vb@viblo.se> writes: > We just had a major issue on our databases, after a index was replaced a > user defined function didnt change its query plan to use the new index. I'm suspicious that this is some variant of the problem discussed a couple days ago: http://www.postgresql.org/message-id/flat/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=DBHf4L6rkfjtEJ_w@mail.gmail.com However, that theory requires that the index not have been immediately usable, which implies that it initially had some broken HOT chains, which really should not have happened if you were simply replacing one index with an identical one. (The pre-existing index should've been enough to ensure HOT chain consistency for its columns.) Perhaps you were doing something "cute" like replacing a single-column index with a multi-column one? regards, tom lane
On Thu, Feb 18, 2016 at 11:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Another thing to note about this index is that it sometimes has an unexpected zero page inside, resulting in this error: ERROR: index "user_pictures_picture_dhash_idx" contains unexpected zero page at block 123780. But it always happens sporadically on hour read slaves, and rerunning the query that cause it doesnt reproduce the error. I asked about this problem before, here:
http://www.postgresql.org/message-id/flat/CAL870DVXR9fHkyEJ5sMydK4pJUPL5kWabUhSPbTQeK03gdDcqA@mail.gmail.com#CAL870DVXR9fHkyEJ5sMydK4pJUPL5kWabUhSPbTQeK03gdDcqA@mail.gmail.com
Victor Blomqvist <vb@viblo.se> writes:
> We just had a major issue on our databases, after a index was replaced a
> user defined function didnt change its query plan to use the new index.
I'm suspicious that this is some variant of the problem discussed a couple
days ago:
http://www.postgresql.org/message-id/flat/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=DBHf4L6rkfjtEJ_w@mail.gmail.com
However, that theory requires that the index not have been immediately
usable, which implies that it initially had some broken HOT chains,
which really should not have happened if you were simply replacing one
index with an identical one. (The pre-existing index should've been
enough to ensure HOT chain consistency for its columns.)
Perhaps you were doing something "cute" like replacing a single-column
index with a multi-column one?
No the new index looked exactly as the old one. The index was created with CONCURRENTLY, and we waited until it returned.
I could use the index just fine when running a query in a separate connection, so I am not sure if it helps dropping the old index in a transaction and run a query to verify that the new index is usable? How can I know that the new index is usable from already open connections?
I could use the index just fine when running a query in a separate connection, so I am not sure if it helps dropping the old index in a transaction and run a query to verify that the new index is usable? How can I know that the new index is usable from already open connections?
Another thing to note about this index is that it sometimes has an unexpected zero page inside, resulting in this error: ERROR: index "user_pictures_picture_dhash_idx" contains unexpected zero page at block 123780. But it always happens sporadically on hour read slaves, and rerunning the query that cause it doesnt reproduce the error. I asked about this problem before, here:
http://www.postgresql.org/message-id/flat/CAL870DVXR9fHkyEJ5sMydK4pJUPL5kWabUhSPbTQeK03gdDcqA@mail.gmail.com#CAL870DVXR9fHkyEJ5sMydK4pJUPL5kWabUhSPbTQeK03gdDcqA@mail.gmail.com
regards, tom lane