Thread: indexes no longer used after shutdown during reindexing
Hello all, I have a database that was shut down, cleanly, during an 'reindex table' command. When the database came back up, queries against that table started doing sequential scans instead of using the indexes as they had been up until that point. We tried: 1) vacuuming the table (vacuum tblName) 2) reindexing the table (reindex table tblName) 3) dropping and recreating the indexes but none of those actions helped. We ended up recreating the table by renaming the table and doing a create table as select * from oldTable and readding the indexes. This worked. This problem presented itself as an application timing out. It took several people, several hours to track this down and solve it. Several months ago I had two other tables also stopped using their indexes. Those times however I don't know if a database shutdown caused the problem. Has anyone had this problem? If so, what specifically is the cause? Is shutting down a database during a table rebuild or vacuum an absolute no-no? Any and all help or insight would be appreciated, Matt
On Wed, Jan 11, 2012 at 10:42 AM, Matt Dew <mattd@consistentstate.com> wrote: > Hello all, > I have a database that was shut down, cleanly, during an 'reindex table' > command. When the database came back up, queries against that table > started doing sequential scans instead of using the indexes as they had been > up until that point. > > We tried: > 1) vacuuming the table (vacuum tblName) > 2) reindexing the table (reindex table tblName) > 3) dropping and recreating the indexes > > but none of those actions helped. We ended up recreating the table by > renaming the table and doing a create table as select * from oldTable and > readding the indexes. This worked. > > This problem presented itself as an application timing out. It took several > people, several hours to track this down and solve it. > > Several months ago I had two other tables also stopped using their indexes. > Those times however I don't know if a database shutdown caused the problem. > > Has anyone had this problem? If so, what specifically is the cause? Is > shutting down a database during a table rebuild or vacuum an absolute no-no? > > Any and all help or insight would be appreciated, > Matt You likely had an invalid index, I've seen that crop up when doing a create index concurrently. Just a guess. What did or does \d of the table and its indexes show? Look for invalid in the output.
On 01/11/2012 11:07 AM, Scott Marlowe wrote: > On Wed, Jan 11, 2012 at 10:42 AM, Matt Dew<mattd@consistentstate.com> wrote: >> Hello all, >> I have a database that was shut down, cleanly, during an 'reindex table' >> command. When the database came back up, queries against that table >> started doing sequential scans instead of using the indexes as they had been >> up until that point. >> >> We tried: >> 1) vacuuming the table (vacuum tblName) >> 2) reindexing the table (reindex table tblName) >> 3) dropping and recreating the indexes >> >> but none of those actions helped. We ended up recreating the table by >> renaming the table and doing a create table as select * from oldTable and >> readding the indexes. This worked. >> >> This problem presented itself as an application timing out. It took several >> people, several hours to track this down and solve it. >> >> Several months ago I had two other tables also stopped using their indexes. >> Those times however I don't know if a database shutdown caused the problem. >> >> Has anyone had this problem? If so, what specifically is the cause? Is >> shutting down a database during a table rebuild or vacuum an absolute no-no? >> >> Any and all help or insight would be appreciated, >> Matt > You likely had an invalid index, I've seen that crop up when doing a > create index concurrently. Just a guess. What did or does \d of the > table and its indexes show? Look for invalid in the output. Hi Scott, The output of \d looked normal. Nothing weird or different than before.
Matt Dew <mattd@consistentstate.com> writes: > I have a database that was shut down, cleanly, during an 'reindex > table' command. When the database came back up, queries against that > table started doing sequential scans instead of using the indexes as > they had been up until that point. What exactly is your definition of a "clean shutdown"? At the very least you'd have had to abort the session running the reindex. Also, what PG version is this, and what are the index definitions? regards, tom lane
On 01/11/2012 04:29 PM, Tom Lane wrote: > Matt Dew<mattd@consistentstate.com> writes: >> I have a database that was shut down, cleanly, during an 'reindex >> table' command. When the database came back up, queries against that >> table started doing sequential scans instead of using the indexes as >> they had been up until that point. > What exactly is your definition of a "clean shutdown"? At the very > least you'd have had to abort the session running the reindex. Also, > what PG version is this, and what are the index definitions? Is a reboot command considered a clean shutdown? It's a redhat box which called /etc/init.d/postgresql stop, which does: pg_ctl stop -D '$PGDATA' -s -m fast We're using v8.3.9 "idx1" UNIQUE, btree (id) "idx_2" btree (homeaddress) "idx_3" btree (f3) "idx_4" btree (lower(firstname::text) varchar_pattern_ops) "idx_5" btree (lower(lastname::text) varchar_pattern_ops) "idx_6" btree (lower(lastname::text) varchar_pattern_ops, lower(firstname::text) varchar_pattern_ops, id, f5) "idx_7" btree (s2id) "idx_8" btree (sid, lower(memberusername::text) varchar_pattern_ops, lower(email::text) varchar_pattern_ops, birthdate) "idx_9" btree (id, f5) WHERE f5 = false I'm in a rabbit hole. I dug in more and learned that that problem may have existed before the shutdown. I believe the root problem is still the same though; having to recreate the table to get it to use indexes. thanks for any help, Matt > regards, tom lane
Matt Dew <mattd@consistentstate.com> writes: > On 01/11/2012 04:29 PM, Tom Lane wrote: >> What exactly is your definition of a "clean shutdown"? > Is a reboot command considered a clean shutdown? It's a redhat box > which called /etc/init.d/postgresql stop, which does: pg_ctl stop -D > '$PGDATA' -s -m fast Well, a fast-mode stop would abort the reindex operation, but that should certainly have left the catalog entries in the same state as before, so there's no obvious reason here why the indexes would've stopped being used. > We're using v8.3.9 That's a tad old. Please consult http://www.postgresql.org/docs/8.3/static/release.html for reasons why an update might be a good idea. I don't recall any 8.3.x bugs that might be related to this, but I haven't trawled the commit logs to see what I've forgotten, either. > I'm in a rabbit hole. I dug in more and learned that that problem may > have existed before the shutdown. I believe the root problem is still > the same though; having to recreate the table to get it to use indexes. Hmm. If that's the case then we don't have to explain how an aborted reindex operation could have affected the usability of the old indexes, so I'm inclined to believe that it didn't. Which seems to mean that you have a garden variety "why won't the planner use my index" issue, not something unusual. If you no longer have the original table then it may be impossible to investigate further; but if you can recreate the state where it's not using the index(es), please see http://wiki.postgresql.org/wiki/Slow_Query_Questions and pursue the issue on pgsql-performance. regards, tom lane
On 01/12/2012 01:21 PM, Tom Lane wrote: > Matt Dew<mattd@consistentstate.com> writes: >> On 01/11/2012 04:29 PM, Tom Lane wrote: >>> What exactly is your definition of a "clean shutdown"? >> Is a reboot command considered a clean shutdown? It's a redhat box >> which called /etc/init.d/postgresql stop, which does: pg_ctl stop -D >> '$PGDATA' -s -m fast > Well, a fast-mode stop would abort the reindex operation, but that > should certainly have left the catalog entries in the same state as > before, so there's no obvious reason here why the indexes would've > stopped being used. > >> We're using v8.3.9 > That's a tad old. Please consult > http://www.postgresql.org/docs/8.3/static/release.html > for reasons why an update might be a good idea. I don't recall any > 8.3.x bugs that might be related to this, but I haven't trawled the > commit logs to see what I've forgotten, either. > >> I'm in a rabbit hole. I dug in more and learned that that problem may >> have existed before the shutdown. I believe the root problem is still >> the same though; having to recreate the table to get it to use indexes. > Hmm. If that's the case then we don't have to explain how an aborted > reindex operation could have affected the usability of the old indexes, > so I'm inclined to believe that it didn't. Which seems to mean that you > have a garden variety "why won't the planner use my index" issue, not > something unusual. If you no longer have the original table then it may > be impossible to investigate further; but if you can recreate the state > where it's not using the index(es), please see > http://wiki.postgresql.org/wiki/Slow_Query_Questions > and pursue the issue on pgsql-performance. > Thanks Tom. I have the original database stored away for investigation. This was a serious problem so we're investigatng how to prevent this in the future. It's strange because even though it looks like this problem did happen before the reboot, it was once in a while. After the reboot it was everytime and the application completely stopped working. Plus after the reboot even a simple query against the table: select * from tbl where id=1; was/is doing sequential scans. It's a smallish table, just under 5 million rows. Thanks for the link. I'm using that. It's on a netapp if that matters. (Not my choice.)
On 01/12/2012 01:21 PM, Tom Lane wrote: > Matt Dew<mattd@consistentstate.com> writes: >> On 01/11/2012 04:29 PM, Tom Lane wrote: >>> What exactly is your definition of a "clean shutdown"? >> Is a reboot command considered a clean shutdown? It's a redhat box >> which called /etc/init.d/postgresql stop, which does: pg_ctl stop -D >> '$PGDATA' -s -m fast > Well, a fast-mode stop would abort the reindex operation, but that > should certainly have left the catalog entries in the same state as > before, so there's no obvious reason here why the indexes would've > stopped being used. > >> We're using v8.3.9 > That's a tad old. Please consult > http://www.postgresql.org/docs/8.3/static/release.html > for reasons why an update might be a good idea. I don't recall any > 8.3.x bugs that might be related to this, but I haven't trawled the > commit logs to see what I've forgotten, either. > >> I'm in a rabbit hole. I dug in more and learned that that problem may >> have existed before the shutdown. I believe the root problem is still >> the same though; having to recreate the table to get it to use indexes. > Hmm. If that's the case then we don't have to explain how an aborted > reindex operation could have affected the usability of the old indexes, > so I'm inclined to believe that it didn't. Which seems to mean that you > have a garden variety "why won't the planner use my index" issue, not > something unusual. If you no longer have the original table then it may > be impossible to investigate further; but if you can recreate the state > where it's not using the index(es), please see > http://wiki.postgresql.org/wiki/Slow_Query_Questions > and pursue the issue on pgsql-performance. > > regards, tom lane Thanks Tom. An interesting sidenote we realized. the nice system shutdown script /etc/init.d/postgres doesn't actually wait for the db to be down, it just waits for pg_ctl to return. I'm guessing it's not good when the box shuts down before postgres is. Matt
Matt Dew <mattd@consistentstate.com> writes: > An interesting sidenote we realized. the nice system shutdown script > /etc/init.d/postgres doesn't actually wait for the db to be down, it > just waits for pg_ctl to return. By default, "pg_ctl stop" does wait for the server to shut down ... regards, tom lane
On 13.1.2012 22:20, Tom Lane wrote: > Matt Dew <mattd@consistentstate.com> writes: >> An interesting sidenote we realized. the nice system shutdown script >> /etc/init.d/postgres doesn't actually wait for the db to be down, it >> just waits for pg_ctl to return. > > By default, "pg_ctl stop" does wait for the server to shut down ... Not really. It waits for up to 60 seconds and if the shutdown was not successful (as there was a connected client), it prints a message to the log pg_ctl: server does not shut down HINT: The "-m fast" option immediately disconnects sessions rather than and returns 1. If you really need to wait for shutdown, you need to add "-w" to the command line, use "-m fast" or "-m immediate". But even ignoring the return value should not cause corruption IMHO. Tomas
On 01/13/2012 02:49 PM, Tomas Vondra wrote: > On 13.1.2012 22:20, Tom Lane wrote: >> Matt Dew<mattd@consistentstate.com> writes: >>> An interesting sidenote we realized. the nice system shutdown script >>> /etc/init.d/postgres doesn't actually wait for the db to be down, it >>> just waits for pg_ctl to return. >> >> By default, "pg_ctl stop" does wait for the server to shut down ... > > Not really. It waits for up to 60 seconds and if the shutdown was not > successful (as there was a connected client), it prints a message to the log > > pg_ctl: server does not shut down > HINT: The "-m fast" option immediately disconnects sessions > rather than > > and returns 1. > > If you really need to wait for shutdown, you need to add "-w" to the > command line, use "-m fast" or "-m immediate". > > But even ignoring the return value should not cause corruption IMHO. Thanks Tom and Tomas, I remember -w now, but I'd long forgotten about it. If the pg_ctl returns a 1 but the machine physically powers off, there is a chance for corruption though right? Postgres is trying to write stuff to disk and clean up and BAM power goes out. ? There is a chance for corruption though if the machine physically powers off after the pg_ctl return
On 20.1.2012 19:47, Matt Dew wrote: > On 01/13/2012 02:49 PM, Tomas Vondra wrote: >> On 13.1.2012 22:20, Tom Lane wrote: >>> Matt Dew<mattd@consistentstate.com> writes: >>>> An interesting sidenote we realized. the nice system shutdown script >>>> /etc/init.d/postgres doesn't actually wait for the db to be down, it >>>> just waits for pg_ctl to return. >>> >>> By default, "pg_ctl stop" does wait for the server to shut down ... >> >> Not really. It waits for up to 60 seconds and if the shutdown was not >> successful (as there was a connected client), it prints a message to >> the log >> >> pg_ctl: server does not shut down >> HINT: The "-m fast" option immediately disconnects sessions >> rather than >> >> and returns 1. >> >> If you really need to wait for shutdown, you need to add "-w" to the >> command line, use "-m fast" or "-m immediate". >> >> But even ignoring the return value should not cause corruption IMHO. > > Thanks Tom and Tomas, > I remember -w now, but I'd long forgotten about it. > > If the pg_ctl returns a 1 but the machine physically powers off, there > is a chance for corruption though right? Postgres is trying to write > stuff to disk and clean up and BAM power goes out. ? > > There is a chance for corruption though if the machine physically powers > off after the pg_ctl return There are various types of corruption. If you power off the system before the database properly shuts down, the data files will be corrupted. But this should be fixed on the next database startup - the database should find out it was not switched properly and perform a recovery (replay the WAL logs). So yes, it would be corrupted but fixed on the next startup. And thus should not cause issues like the one you describe. Tomas