Thread: Reindex does not finish 8.2.6

Reindex does not finish 8.2.6

From
Clodoaldo
Date:
This is 8.2.6 in Fedora 8 and the same database with which there were
performance problems when migrating to 8.3. The migration was not
done. The thread was this:

http://archives.postgresql.org/pgsql-general/2008-01/msg00508.php

The database performance slowly degrades and after two weeks i issue a
reindex on the database and the performance gets back to normal. I
have been doing this for ages and i don't remember in which version
this degradation behavior appeared, perhaps 8.0 I'm not sure.

Every two or three months the reindex gets slow even for small tables
then i reboot the machine and it gets back to normal reindex
performance. This happened in more than one version of Fedora and
Postgresql and in two different production machines with very
different configurations.

Now what is happening is that reindex does not finish even with a
small 6,500 rows table and after a reboot. In top there is no CPU or
memory usage by postmaster and vmstat shows no disk activity.

I recently had a crash during a bulk insert when i updated to the
2.6.24.3-12.fc8 kernel. I rebooted and Postgresql did its thing to
recover and everything was working. I made the previous kernel,
2.6.23.15-137.fc8, the default in grub.conf and rebooted.

How do I know if the database is corrupted? Since there is no explain
for reindex or vacuum i don't know what is happening. How to debug?

Regards, Clodoaldo Pinto Neto

Re: Reindex does not finish 8.2.6

From
Greg Smith
Date:
On Thu, 13 Mar 2008, Clodoaldo wrote:

> I recently had a crash during a bulk insert when i updated to the
> 2.6.24.3-12.fc8 kernel. I rebooted and Postgresql did its thing to
> recover and everything was working. I made the previous kernel,
> 2.6.23.15-137.fc8, the default in grub.conf and rebooted.

Have you considered running a Linux distribution that's a bit more stable
and production oriented than Fedora 8 on this system?  I trust the Fedora
kernel to keep data intact about as much as I trust MySQL, and 2.6.24 is
waaay more bleeding edge than I'd consider using right now for anything
beyond a disposable test environment.  I'm not running anything later than
2.6.20 right now and even that I barely trust after six months of pounding
on it.

I run CentOS on devel machines that can't justify a paid license and where
the production system is RedHat Enterprise.  I was just forced to use
Fedora 8 the other day and was very dissapointed that Fedora remains as
buggy as ever.  It's the 6th Linux/BSD variant installed on this hardware,
the first where I had to pass kernel options just to boot the install CD,
and the first where X didn't work after install (the mouse is lost).
Solaris 10 gave me less problems on this machine.

I know this is kind of off-topic for the question you're asking, but you
wouldn't even have to ask it if the underlying platform wasn't
problematic.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Reindex does not finish 8.2.6

From
Alvaro Herrera
Date:
Clodoaldo escribió:

> Now what is happening is that reindex does not finish even with a
> small 6,500 rows table and after a reboot. In top there is no CPU or
> memory usage by postmaster and vmstat shows no disk activity.

Hmm, are you vacuuming the system catalogs appropriately?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Reindex does not finish 8.2.6

From
Clodoaldo
Date:
2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>:
> Clodoaldo escribió:
>
>
>  > Now what is happening is that reindex does not finish even with a
>  > small 6,500 rows table and after a reboot. In top there is no CPU or
>  > memory usage by postmaster and vmstat shows no disk activity.
>
>
> Hmm, are you vacuuming the system catalogs appropriately?

If a simple vacuum in instead of a vacuum full is appropriate then yes
the db is vacuumed every three hours after each bulk insert/delete.

Regards, Clodoaldo Pinto Neto

Re: Reindex does not finish 8.2.6

From
Alvaro Herrera
Date:
Clodoaldo escribió:
> 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>:
> > Clodoaldo escribió:
> >
> >
> >  > Now what is happening is that reindex does not finish even with a
> >  > small 6,500 rows table and after a reboot. In top there is no CPU or
> >  > memory usage by postmaster and vmstat shows no disk activity.
> >
> > Hmm, are you vacuuming the system catalogs appropriately?
>
> If a simple vacuum in instead of a vacuum full is appropriate then yes
> the db is vacuumed every three hours after each bulk insert/delete.

As superuser?  Take a look at whether relations are skipped during
vacuum.

Try vacuuming pg_class, pg_index, pg_attribute manually and see if that
makes the problem go away.  If it doesn't, then my guess is that it's
time to see what the hanging process is doing -- try an strace on it, or
attaching it with gdb and getting a backtrace.  (I hope your binaries
have debug symbols).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Reindex does not finish 8.2.6

From
Clodoaldo
Date:
2008/3/13, Greg Smith <gsmith@gregsmith.com>:
> On Thu, 13 Mar 2008, Clodoaldo wrote:
>
>  > I recently had a crash during a bulk insert when i updated to the
>  > 2.6.24.3-12.fc8 kernel. I rebooted and Postgresql did its thing to
>  > recover and everything was working. I made the previous kernel,
>  > 2.6.23.15-137.fc8, the default in grub.conf and rebooted.
>
>
> Have you considered running a Linux distribution that's a bit more stable
>  and production oriented than Fedora 8 on this system?

Yes, but because I'm so used to Fedora i still didn't change.

>  I run CentOS on devel machines that can't justify a paid license and where
>  the production system is RedHat Enterprise.

I guess CentOS is very similar to Fedora. Next OS install I will use it.

Regards, Clodoaldo Pinto Neto

Re: Reindex does not finish 8.2.6

From
Clodoaldo
Date:
2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>:
> Clodoaldo escribió:
>
> > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>:
>  > > Clodoaldo escribió:
>  > >
>  > >
>  > >  > Now what is happening is that reindex does not finish even with a
>  > >  > small 6,500 rows table and after a reboot. In top there is no CPU or
>  > >  > memory usage by postmaster and vmstat shows no disk activity.
>  > >
>  > > Hmm, are you vacuuming the system catalogs appropriately?
>  >
>  > If a simple vacuum in instead of a vacuum full is appropriate then yes
>  > the db is vacuumed every three hours after each bulk insert/delete.
>
>
> As superuser?  Take a look at whether relations are skipped during
>  vacuum.

Now I vacuumed with superuser.

>  Try vacuuming pg_class, pg_index, pg_attribute manually and see if that
>  makes the problem go away.

It does not go away.

>  If it doesn't, then my guess is that it's
>  time to see what the hanging process is doing -- try an strace on it, or
>  attaching it with gdb and getting a backtrace.  (I hope your binaries
>  have debug symbols).

Will try it.

Regards, Clodoaldo Pinto Neto

Re: Reindex does not finish 8.2.6

From
"Scott Marlowe"
Date:
On Thu, Mar 13, 2008 at 5:49 PM, Clodoaldo
<clodoaldo.pinto.neto@gmail.com> wrote:
> 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>:
>  > Clodoaldo escribió:
>  >
>  > > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>:
>  >  > > Clodoaldo escribió:
>  >  > >
>  >  > >
>  >  > >  > Now what is happening is that reindex does not finish even with a
>  >  > >  > small 6,500 rows table and after a reboot. In top there is no CPU or
>  >  > >  > memory usage by postmaster and vmstat shows no disk activity.
>  >  > >
>  >  > > Hmm, are you vacuuming the system catalogs appropriately?
>  >  >
>  >  > If a simple vacuum in instead of a vacuum full is appropriate then yes
>  >  > the db is vacuumed every three hours after each bulk insert/delete.
>  >
>  >
>  > As superuser?  Take a look at whether relations are skipped during
>  >  vacuum.
>
>  Now I vacuumed with superuser.
>
>
>  >  Try vacuuming pg_class, pg_index, pg_attribute manually and see if that
>  >  makes the problem go away.
>
>  It does not go away.

Could this be a problem with a bloated table that needs a vacuum full?
 I'd try that.

Re: Reindex does not finish 8.2.6

From
"Pavan Deolasee"
Date:
On Fri, Mar 14, 2008 at 5:19 AM, Clodoaldo
<clodoaldo.pinto.neto@gmail.com> wrote:

>
>  >  Try vacuuming pg_class, pg_index, pg_attribute manually and see if that
>  >  makes the problem go away.
>
>  It does not go away.
>

Can it be a case where some other open transaction is holding a lock
on the table ? Note that REINDEX would block even some other transaction
is inserting/deleting/updating the table.

Thanks,
Pavan


--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: Reindex does not finish 8.2.6

From
Clodoaldo
Date:
2008/3/14, Scott Marlowe <scott.marlowe@gmail.com>:
> On Thu, Mar 13, 2008 at 5:49 PM, Clodoaldo
>  <clodoaldo.pinto.neto@gmail.com> wrote:
>  > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>:
>  >  > Clodoaldo escribió:
>  >  >
>  >  > > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>:
>  >  >  > > Clodoaldo escribió:
>  >  >  > >
>  >  >  > >
>  >  >  > >  > Now what is happening is that reindex does not finish even with a
>  >  >  > >  > small 6,500 rows table and after a reboot. In top there is no CPU or
>  >  >  > >  > memory usage by postmaster and vmstat shows no disk activity.
>  >  >  > >
>  >  >  > > Hmm, are you vacuuming the system catalogs appropriately?
>  >  >  >
>  >  >  > If a simple vacuum in instead of a vacuum full is appropriate then yes
>  >  >  > the db is vacuumed every three hours after each bulk insert/delete.
>  >  >
>  >  >
>  >  > As superuser?  Take a look at whether relations are skipped during
>  >  >  vacuum.
>  >
>  >  Now I vacuumed with superuser.
>  >
>  >
>  >  >  Try vacuuming pg_class, pg_index, pg_attribute manually and see if that
>  >  >  makes the problem go away.
>  >
>  >  It does not go away.
>
>
> Could this be a problem with a bloated table that needs a vacuum full?
>   I'd try that.

Vacuum full also hangs.

Regards, Clodoaldo Pinto Neto

Re: Reindex does not finish 8.2.6

From
Clodoaldo
Date:
2008/3/14, Pavan Deolasee <pavan.deolasee@gmail.com>:
> On Fri, Mar 14, 2008 at 5:19 AM, Clodoaldo
>
> <clodoaldo.pinto.neto@gmail.com> wrote:
>
>  >
>
> >  >  Try vacuuming pg_class, pg_index, pg_attribute manually and see if that
>  >  >  makes the problem go away.
>  >
>  >  It does not go away.
>  >
>
>
> Can it be a case where some other open transaction is holding a lock
>  on the table ? Note that REINDEX would block even some other transaction
>  is inserting/deleting/updating the table.

Postgresql was restarted twice, but yes, it is as if the crash left
some kind of permanent lock somewhere.

Regards, Clodoaldo Pinto Neto

Re: Reindex does not finish 8.2.6

From
Alvaro Herrera
Date:
Clodoaldo escribió:

> Postgresql was restarted twice, but yes, it is as if the crash left
> some kind of permanent lock somewhere.

A prepared transaction perhaps?  SELECT * FROM pg_prepared_xacts;

A quick look into pg_locks should tell you if it's blocking.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Reindex does not finish 8.2.6

From
Clodoaldo
Date:
2008/3/14, Alvaro Herrera <alvherre@commandprompt.com>:
> Clodoaldo escribió:
>
>
>  > Postgresql was restarted twice, but yes, it is as if the crash left
>  > some kind of permanent lock somewhere.
>
>
> A prepared transaction perhaps?  SELECT * FROM pg_prepared_xacts;
>
>  A quick look into pg_locks should tell you if it's blocking.

pg_prepared_xacts is empty and pg_locks has 288 rows:

# select locktype, mode, count(*) as total
from pg_locks group by locktype, mode;
   locktype    |       mode       | total
---------------+------------------+-------
 transactionid | ExclusiveLock    |    30
 relation      | RowExclusiveLock |     2
 relation      | AccessShareLock  |   256
(3 rows)

Regards, Clodoaldo Pinto Neto

Re: Reindex does not finish 8.2.6

From
Alvaro Herrera
Date:
Clodoaldo escribió:
> 2008/3/14, Alvaro Herrera <alvherre@commandprompt.com>:

> >  A quick look into pg_locks should tell you if it's blocking.
>
> pg_prepared_xacts is empty and pg_locks has 288 rows:
>
> # select locktype, mode, count(*) as total
> from pg_locks group by locktype, mode;
>    locktype    |       mode       | total
> ---------------+------------------+-------
>  transactionid | ExclusiveLock    |    30
>  relation      | RowExclusiveLock |     2
>  relation      | AccessShareLock  |   256
> (3 rows)

How many of them have granted=false?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Reindex does not finish 8.2.6

From
Clodoaldo
Date:
2008/3/14, Alvaro Herrera <alvherre@commandprompt.com>:
> Clodoaldo escribió:
>
> > 2008/3/14, Alvaro Herrera <alvherre@commandprompt.com>:
>
>
> > >  A quick look into pg_locks should tell you if it's blocking.
>  >
>  > pg_prepared_xacts is empty and pg_locks has 288 rows:
>  >
>  > # select locktype, mode, count(*) as total
>  > from pg_locks group by locktype, mode;
>  >    locktype    |       mode       | total
>  > ---------------+------------------+-------
>  >  transactionid | ExclusiveLock    |    30
>  >  relation      | RowExclusiveLock |     2
>  >  relation      | AccessShareLock  |   256
>  > (3 rows)
>
>
> How many of them have granted=false?

=> select locktype, mode, granted, count(*) as total
from pg_locks group by locktype, mode, granted;
   locktype    |      mode       | granted | total
---------------+-----------------+---------+-------
 relation      | AccessShareLock | t       |   112
 transactionid | ExclusiveLock   | t       |    17
(2 rows)

Regards, Clodoaldo Pinto Neto

Re: Reindex does not finish 8.2.6

From
Clodoaldo
Date:
2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>:
> Clodoaldo escribió:
>
> > 2008/3/13, Alvaro Herrera <alvherre@commandprompt.com>:
>  > > Clodoaldo escribió:
>  > >
>  > >
>  > >  > Now what is happening is that reindex does not finish even with a
>  > >  > small 6,500 rows table and after a reboot. In top there is no CPU or
>  > >  > memory usage by postmaster and vmstat shows no disk activity.
>  > >
>  > > Hmm, are you vacuuming the system catalogs appropriately?
>  >
>  > If a simple vacuum in instead of a vacuum full is appropriate then yes
>  > the db is vacuumed every three hours after each bulk insert/delete.
>
>
> As superuser?  Take a look at whether relations are skipped during
>  vacuum.
>
>  Try vacuuming pg_class, pg_index, pg_attribute manually and see if that
>  makes the problem go away.  If it doesn't, then my guess is that it's
>  time to see what the hanging process is doing -- try an strace on it, or
>  attaching it with gdb and getting a backtrace.  (I hope your binaries
>  have debug symbols).

This is strace started before the reindex table command.

postmaster process:
# strace -f -p 2263
Process 2263 attached - interrupt to quit
select(6, [3 4 5], NULL, NULL, {35, 749000}) = 1 (in [5], left {23, 734000})
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS
RTMIN RT_1], NULL, 8) = 0
accept(5, {sa_family=AF_FILE, path=@}, [2]) = 8
getsockname(8, {sa_family=AF_FILE, path="/tmp/.s.PGSQL.5432"}, [21]) = 0
clone(Process 16014 attached (waiting for parent)
Process 16014 resumed (parent 2263 ready)
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2aaaaaac72f0) = 16014
[pid  2263] close(8)                    = 0
[pid  2263] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  2263] select(6, [3 4 5], NULL, NULL, {60, 0} <unfinished ...>
[pid 16014] close(3)                    = 0
[pid 16014] close(4)                    = 0
[pid 16014] close(5)                    = 0
[pid 16014] close(6)                    = 0
[pid 16014] setsid()                    = 16014
[pid 16014] rt_sigaction(SIGTERM, {0x581700, [],
SA_RESTORER|SA_RESTART, 0x3e97630f30}, {0x55b5b0, [],
SA_RESTORER|SA_RESTART, 0x3e97630f30}, 8) = 0
[pid 16014] rt_sigaction(SIGQUIT, {0x581700, [],
SA_RESTORER|SA_RESTART, 0x3e97630f30}, {0x55b5b0, [],
SA_RESTORER|SA_RESTART, 0x3e97630f30}, 8) = 0
[pid 16014] rt_sigaction(SIGALRM, {0x581700, [], SA_RESTORER,
0x3e97630f30}, {SIG_IGN}, 8) = 0
[pid 16014] rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE
SEGV ALRM TERM CONT SYS RTMIN RT_1], NULL, 8) = 0
[pid 16014] write(2, "\0\0_\0\216>\0\0tLOG:  00000: connection"..., 104) = 104
[pid 16014] setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={60,
0}}, NULL) = 0
[pid 16014] recvfrom(8, "\0\0\0\20\4\322\26.\0\0>\2054\344\367W",
8192, 0, NULL, NULL) = 16
[pid 16014] kill(16005, SIGINT)         = 0
[pid 16014] kill(4294951291, SIGINT)    = 0
[pid 16014] exit_group(0)               = ?
Process 16014 detached
<... select resumed> )                  = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS
RTMIN RT_1], NULL, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 16014
wait4(-1, 0x7fff77f82844, WNOHANG, NULL) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigreturn(0x2)                       = -1 EINTR (Interrupted system call)
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS
RTMIN RT_1], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
select(6, [3 4 5], NULL, NULL, {60, 0} <unfinished ...>
Process 2263 detached

Regards, Clodoaldo Pinto Neto

Re: Reindex does not finish 8.2.6

From
Alvaro Herrera
Date:
Clodoaldo escribió:

> The database performance slowly degrades and after two weeks i issue a
> reindex on the database and the performance gets back to normal. I
> have been doing this for ages and i don't remember in which version
> this degradation behavior appeared, perhaps 8.0 I'm not sure.

Was this issue solved?


--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Reindex does not finish 8.2.6

From
Clodoaldo
Date:
2008/3/26, Alvaro Herrera <alvherre@commandprompt.com>:
> Clodoaldo escribió:
>
>
>  > The database performance slowly degrades and after two weeks i issue a
>  > reindex on the database and the performance gets back to normal. I
>  > have been doing this for ages and i don't remember in which version
>  > this degradation behavior appeared, perhaps 8.0 I'm not sure.
>
>
> Was this issue solved?

About the reindex, cluster and vacuum full hanging,  i had drop the db
and reload the backup and now it is back to normal. I could not debug
the hanging problem. It was very likely caused by the kernel update. I
never had a crash with this database before. It is rock solid, it just
gets slower.

About the slow degradation of performance nothing has changed. I'm
using 8.2.7 and still have to reindex every two weeks or so.

Regards, Clodoaldo Pinto Neto