Thread: Reindex does not finish 8.2.6
This is 8.2.6 in Fedora 8 and the same database with which there were performance problems when migrating to 8.3. The migration was not done. The thread was this: http://archives.postgresql.org/pgsql-general/2008-01/msg00508.php The database performance slowly degrades and after two weeks i issue a reindex on the database and the performance gets back to normal. I have been doing this for ages and i don't remember in which version this degradation behavior appeared, perhaps 8.0 I'm not sure. Every two or three months the reindex gets slow even for small tables then i reboot the machine and it gets back to normal reindex performance. This happened in more than one version of Fedora and Postgresql and in two different production machines with very different configurations. Now what is happening is that reindex does not finish even with a small 6,500 rows table and after a reboot. In top there is no CPU or memory usage by postmaster and vmstat shows no disk activity. I recently had a crash during a bulk insert when i updated to the 2.6.24.3-12.fc8 kernel. I rebooted and Postgresql did its thing to recover and everything was working. I made the previous kernel, 2.6.23.15-137.fc8, the default in grub.conf and rebooted. How do I know if the database is corrupted? Since there is no explain for reindex or vacuum i don't know what is happening. How to debug? Regards, Clodoaldo Pinto Neto
On Thu, 13 Mar 2008, Clodoaldo wrote: > I recently had a crash during a bulk insert when i updated to the > 2.6.24.3-12.fc8 kernel. I rebooted and Postgresql did its thing to > recover and everything was working. I made the previous kernel, > 2.6.23.15-137.fc8, the default in grub.conf and rebooted. Have you considered running a Linux distribution that's a bit more stable and production oriented than Fedora 8 on this system? I trust the Fedora kernel to keep data intact about as much as I trust MySQL, and 2.6.24 is waaay more bleeding edge than I'd consider using right now for anything beyond a disposable test environment. I'm not running anything later than 2.6.20 right now and even that I barely trust after six months of pounding on it. I run CentOS on devel machines that can't justify a paid license and where the production system is RedHat Enterprise. I was just forced to use Fedora 8 the other day and was very dissapointed that Fedora remains as buggy as ever. It's the 6th Linux/BSD variant installed on this hardware, the first where I had to pass kernel options just to boot the install CD, and the first where X didn't work after install (the mouse is lost). Solaris 10 gave me less problems on this machine. I know this is kind of off-topic for the question you're asking, but you wouldn't even have to ask it if the underlying platform wasn't problematic. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Clodoaldo escribió: > Now what is happening is that reindex does not finish even with a > small 6,500 rows table and after a reboot. In top there is no CPU or > memory usage by postmaster and vmstat shows no disk activity. Hmm, are you vacuuming the system catalogs appropriately? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>: > Clodoaldo escribió: > > > > Now what is happening is that reindex does not finish even with a > > small 6,500 rows table and after a reboot. In top there is no CPU or > > memory usage by postmaster and vmstat shows no disk activity. > > > Hmm, are you vacuuming the system catalogs appropriately? If a simple vacuum in instead of a vacuum full is appropriate then yes the db is vacuumed every three hours after each bulk insert/delete. Regards, Clodoaldo Pinto Neto
Clodoaldo escribió: > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>: > > Clodoaldo escribió: > > > > > > > Now what is happening is that reindex does not finish even with a > > > small 6,500 rows table and after a reboot. In top there is no CPU or > > > memory usage by postmaster and vmstat shows no disk activity. > > > > Hmm, are you vacuuming the system catalogs appropriately? > > If a simple vacuum in instead of a vacuum full is appropriate then yes > the db is vacuumed every three hours after each bulk insert/delete. As superuser? Take a look at whether relations are skipped during vacuum. Try vacuuming pg_class, pg_index, pg_attribute manually and see if that makes the problem go away. If it doesn't, then my guess is that it's time to see what the hanging process is doing -- try an strace on it, or attaching it with gdb and getting a backtrace. (I hope your binaries have debug symbols). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
2008/3/13, Greg Smith <gsmith@gregsmith.com>: > On Thu, 13 Mar 2008, Clodoaldo wrote: > > > I recently had a crash during a bulk insert when i updated to the > > 2.6.24.3-12.fc8 kernel. I rebooted and Postgresql did its thing to > > recover and everything was working. I made the previous kernel, > > 2.6.23.15-137.fc8, the default in grub.conf and rebooted. > > > Have you considered running a Linux distribution that's a bit more stable > and production oriented than Fedora 8 on this system? Yes, but because I'm so used to Fedora i still didn't change. > I run CentOS on devel machines that can't justify a paid license and where > the production system is RedHat Enterprise. I guess CentOS is very similar to Fedora. Next OS install I will use it. Regards, Clodoaldo Pinto Neto
2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>: > Clodoaldo escribió: > > > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>: > > > Clodoaldo escribió: > > > > > > > > > > Now what is happening is that reindex does not finish even with a > > > > small 6,500 rows table and after a reboot. In top there is no CPU or > > > > memory usage by postmaster and vmstat shows no disk activity. > > > > > > Hmm, are you vacuuming the system catalogs appropriately? > > > > If a simple vacuum in instead of a vacuum full is appropriate then yes > > the db is vacuumed every three hours after each bulk insert/delete. > > > As superuser? Take a look at whether relations are skipped during > vacuum. Now I vacuumed with superuser. > Try vacuuming pg_class, pg_index, pg_attribute manually and see if that > makes the problem go away. It does not go away. > If it doesn't, then my guess is that it's > time to see what the hanging process is doing -- try an strace on it, or > attaching it with gdb and getting a backtrace. (I hope your binaries > have debug symbols). Will try it. Regards, Clodoaldo Pinto Neto
On Thu, Mar 13, 2008 at 5:49 PM, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote: > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>: > > Clodoaldo escribió: > > > > > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>: > > > > Clodoaldo escribió: > > > > > > > > > > > > > Now what is happening is that reindex does not finish even with a > > > > > small 6,500 rows table and after a reboot. In top there is no CPU or > > > > > memory usage by postmaster and vmstat shows no disk activity. > > > > > > > > Hmm, are you vacuuming the system catalogs appropriately? > > > > > > If a simple vacuum in instead of a vacuum full is appropriate then yes > > > the db is vacuumed every three hours after each bulk insert/delete. > > > > > > As superuser? Take a look at whether relations are skipped during > > vacuum. > > Now I vacuumed with superuser. > > > > Try vacuuming pg_class, pg_index, pg_attribute manually and see if that > > makes the problem go away. > > It does not go away. Could this be a problem with a bloated table that needs a vacuum full? I'd try that.
On Fri, Mar 14, 2008 at 5:19 AM, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote: > > > Try vacuuming pg_class, pg_index, pg_attribute manually and see if that > > makes the problem go away. > > It does not go away. > Can it be a case where some other open transaction is holding a lock on the table ? Note that REINDEX would block even some other transaction is inserting/deleting/updating the table. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
2008/3/14, Scott Marlowe <scott.marlowe@gmail.com>: > On Thu, Mar 13, 2008 at 5:49 PM, Clodoaldo > <clodoaldo.pinto.neto@gmail.com> wrote: > > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>: > > > Clodoaldo escribió: > > > > > > > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>: > > > > > Clodoaldo escribió: > > > > > > > > > > > > > > > > Now what is happening is that reindex does not finish even with a > > > > > > small 6,500 rows table and after a reboot. In top there is no CPU or > > > > > > memory usage by postmaster and vmstat shows no disk activity. > > > > > > > > > > Hmm, are you vacuuming the system catalogs appropriately? > > > > > > > > If a simple vacuum in instead of a vacuum full is appropriate then yes > > > > the db is vacuumed every three hours after each bulk insert/delete. > > > > > > > > > As superuser? Take a look at whether relations are skipped during > > > vacuum. > > > > Now I vacuumed with superuser. > > > > > > > Try vacuuming pg_class, pg_index, pg_attribute manually and see if that > > > makes the problem go away. > > > > It does not go away. > > > Could this be a problem with a bloated table that needs a vacuum full? > I'd try that. Vacuum full also hangs. Regards, Clodoaldo Pinto Neto
2008/3/14, Pavan Deolasee <pavan.deolasee@gmail.com>: > On Fri, Mar 14, 2008 at 5:19 AM, Clodoaldo > > <clodoaldo.pinto.neto@gmail.com> wrote: > > > > > > > Try vacuuming pg_class, pg_index, pg_attribute manually and see if that > > > makes the problem go away. > > > > It does not go away. > > > > > Can it be a case where some other open transaction is holding a lock > on the table ? Note that REINDEX would block even some other transaction > is inserting/deleting/updating the table. Postgresql was restarted twice, but yes, it is as if the crash left some kind of permanent lock somewhere. Regards, Clodoaldo Pinto Neto
Clodoaldo escribió: > Postgresql was restarted twice, but yes, it is as if the crash left > some kind of permanent lock somewhere. A prepared transaction perhaps? SELECT * FROM pg_prepared_xacts; A quick look into pg_locks should tell you if it's blocking. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2008/3/14, Alvaro Herrera <alvherre@commandprompt.com>: > Clodoaldo escribió: > > > > Postgresql was restarted twice, but yes, it is as if the crash left > > some kind of permanent lock somewhere. > > > A prepared transaction perhaps? SELECT * FROM pg_prepared_xacts; > > A quick look into pg_locks should tell you if it's blocking. pg_prepared_xacts is empty and pg_locks has 288 rows: # select locktype, mode, count(*) as total from pg_locks group by locktype, mode; locktype | mode | total ---------------+------------------+------- transactionid | ExclusiveLock | 30 relation | RowExclusiveLock | 2 relation | AccessShareLock | 256 (3 rows) Regards, Clodoaldo Pinto Neto
Clodoaldo escribió: > 2008/3/14, Alvaro Herrera <alvherre@commandprompt.com>: > > A quick look into pg_locks should tell you if it's blocking. > > pg_prepared_xacts is empty and pg_locks has 288 rows: > > # select locktype, mode, count(*) as total > from pg_locks group by locktype, mode; > locktype | mode | total > ---------------+------------------+------- > transactionid | ExclusiveLock | 30 > relation | RowExclusiveLock | 2 > relation | AccessShareLock | 256 > (3 rows) How many of them have granted=false? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2008/3/14, Alvaro Herrera <alvherre@commandprompt.com>: > Clodoaldo escribió: > > > 2008/3/14, Alvaro Herrera <alvherre@commandprompt.com>: > > > > > A quick look into pg_locks should tell you if it's blocking. > > > > pg_prepared_xacts is empty and pg_locks has 288 rows: > > > > # select locktype, mode, count(*) as total > > from pg_locks group by locktype, mode; > > locktype | mode | total > > ---------------+------------------+------- > > transactionid | ExclusiveLock | 30 > > relation | RowExclusiveLock | 2 > > relation | AccessShareLock | 256 > > (3 rows) > > > How many of them have granted=false? => select locktype, mode, granted, count(*) as total from pg_locks group by locktype, mode, granted; locktype | mode | granted | total ---------------+-----------------+---------+------- relation | AccessShareLock | t | 112 transactionid | ExclusiveLock | t | 17 (2 rows) Regards, Clodoaldo Pinto Neto
2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>: > Clodoaldo escribió: > > > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>: > > > Clodoaldo escribió: > > > > > > > > > > Now what is happening is that reindex does not finish even with a > > > > small 6,500 rows table and after a reboot. In top there is no CPU or > > > > memory usage by postmaster and vmstat shows no disk activity. > > > > > > Hmm, are you vacuuming the system catalogs appropriately? > > > > If a simple vacuum in instead of a vacuum full is appropriate then yes > > the db is vacuumed every three hours after each bulk insert/delete. > > > As superuser? Take a look at whether relations are skipped during > vacuum. > > Try vacuuming pg_class, pg_index, pg_attribute manually and see if that > makes the problem go away. If it doesn't, then my guess is that it's > time to see what the hanging process is doing -- try an strace on it, or > attaching it with gdb and getting a backtrace. (I hope your binaries > have debug symbols). This is strace started before the reindex table command. postmaster process: # strace -f -p 2263 Process 2263 attached - interrupt to quit select(6, [3 4 5], NULL, NULL, {35, 749000}) = 1 (in [5], left {23, 734000}) rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0 accept(5, {sa_family=AF_FILE, path=@}, [2]) = 8 getsockname(8, {sa_family=AF_FILE, path="/tmp/.s.PGSQL.5432"}, [21]) = 0 clone(Process 16014 attached (waiting for parent) Process 16014 resumed (parent 2263 ready) child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aaaaaac72f0) = 16014 [pid 2263] close(8) = 0 [pid 2263] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 [pid 2263] select(6, [3 4 5], NULL, NULL, {60, 0} <unfinished ...> [pid 16014] close(3) = 0 [pid 16014] close(4) = 0 [pid 16014] close(5) = 0 [pid 16014] close(6) = 0 [pid 16014] setsid() = 16014 [pid 16014] rt_sigaction(SIGTERM, {0x581700, [], SA_RESTORER|SA_RESTART, 0x3e97630f30}, {0x55b5b0, [], SA_RESTORER|SA_RESTART, 0x3e97630f30}, 8) = 0 [pid 16014] rt_sigaction(SIGQUIT, {0x581700, [], SA_RESTORER|SA_RESTART, 0x3e97630f30}, {0x55b5b0, [], SA_RESTORER|SA_RESTART, 0x3e97630f30}, 8) = 0 [pid 16014] rt_sigaction(SIGALRM, {0x581700, [], SA_RESTORER, 0x3e97630f30}, {SIG_IGN}, 8) = 0 [pid 16014] rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV ALRM TERM CONT SYS RTMIN RT_1], NULL, 8) = 0 [pid 16014] write(2, "\0\0_\0\216>\0\0tLOG: 00000: connection"..., 104) = 104 [pid 16014] setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0 [pid 16014] recvfrom(8, "\0\0\0\20\4\322\26.\0\0>\2054\344\367W", 8192, 0, NULL, NULL) = 16 [pid 16014] kill(16005, SIGINT) = 0 [pid 16014] kill(4294951291, SIGINT) = 0 [pid 16014] exit_group(0) = ? Process 16014 detached <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 16014 wait4(-1, 0x7fff77f82844, WNOHANG, NULL) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigreturn(0x2) = -1 EINTR (Interrupted system call) rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 select(6, [3 4 5], NULL, NULL, {60, 0} <unfinished ...> Process 2263 detached Regards, Clodoaldo Pinto Neto
Clodoaldo escribió: > The database performance slowly degrades and after two weeks i issue a > reindex on the database and the performance gets back to normal. I > have been doing this for ages and i don't remember in which version > this degradation behavior appeared, perhaps 8.0 I'm not sure. Was this issue solved? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
2008/3/26, Alvaro Herrera <alvherre@commandprompt.com>: > Clodoaldo escribió: > > > > The database performance slowly degrades and after two weeks i issue a > > reindex on the database and the performance gets back to normal. I > > have been doing this for ages and i don't remember in which version > > this degradation behavior appeared, perhaps 8.0 I'm not sure. > > > Was this issue solved? About the reindex, cluster and vacuum full hanging, i had drop the db and reload the backup and now it is back to normal. I could not debug the hanging problem. It was very likely caused by the kernel update. I never had a crash with this database before. It is rock solid, it just gets slower. About the slow degradation of performance nothing has changed. I'm using 8.2.7 and still have to reindex every two weeks or so. Regards, Clodoaldo Pinto Neto