Thread: Join slow on "large" tables

Join slow on "large" tables

From
Josué Maldonado
Date:
Hello list,

Server is dual Xeon 2.4, 2GBRAM, Postgresql is running on partition:
/dev/sda9              29G  8.9G   20G  31% /home2
/dev/sda9 on /home2 type jfs (rw)

Version()
PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2
20020903 (Red Hat Linux 8.0 3.2-7)

I have a view to join two tables inventory details (pkardex) and
inventory documents header (pmdoc) this view usually runs pretty slow as
indicated in the explain analyze, pkardex is 1943465 rows and its size
aprox 659MB, pmdoc is 1183520 rows and its size is aprox 314MB. The view
definition is:

SELECT pkd_pk AS kpk, (pkd_stamp)::date AS kfecha, pkd_docto AS kdocto,
     ((((pdc_custid)::text || ' '::text) ||
(pdc_custname)::text))::character
     varying(50) AS kclpv, pkd_saldo AS ksaldo, pkd_es AS kes, CASE WHEN
(pkd_es
     = 'E'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS kentrada,
CASE WHEN
     (pkd_es = 'S'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS
ksalida,
     pkd_pcode AS kprocode, pkd_price AS kvalor, pdc_tipdoc AS ktipdoc
FROM (pkardex JOIN pmdoc ON ((pmdoc.pdc_pk = pkardex.doctofk)));


Shared memory is:
/root: cat /proc/sys/kernel/shmmax
1073741824

and postgresql.conf have this settings:
tcpip_socket = true
sort_mem = 8190         # min 64, size in KB
vacuum_mem = 262144             # min 1024, size in KB
checkpoint_segments = 10
max_connections = 256
shared_buffers = 32000
effective_cache_size = 160000   # typically 8KB each
random_page_cost = 2            # units are one sequ

The explain analyze is:
dbmund=# explain analyze select * from vkardex where kprocode='1017';
  Nested Loop  (cost=0.00..32155.66 rows=5831 width=114) (actual
time=18.223..47983.157 rows=4553 loops=1)
    ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..11292.52
rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1)
          Index Cond: ((pkd_pcode)::text = '1017'::text)
    ->  Index Scan using pdc_pk_idx on pmdoc  (cost=0.00..3.55 rows=1
width=50) (actual time=1.659..1.661 rows=1 loops=5049)
          Index Cond: (pmdoc.pdc_pk = "outer".doctofk)
  Total runtime: 47988.067 ms
(6 rows)

Does anyone can help me how to properly tune postgresql to gain some
speed in such queries, some people have mentioned a RAM increase is
necesary, about 8GB or more to have postgresql to run  smooth, any
comment or suggestion. I really appreciate any help.

Regards,


--
Sinceramente,
Josué Maldonado.
"Que se me den seis líneas escritas de puño y letra del hombre más
honrado del mundo, y hallaré en ellas motivos para hacerle ahorcar."
--cardenal Richelieu (Cardenal y político francés. 1.585 - 1.642)

Re: Join slow on "large" tables

From
Josh Berkus
Date:
Josue'

>     ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..11292.52
> rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1)

Looks to me like there's a problem with index bloat on pkd_pcode_idx.   Try
REINDEXing it, and if that doesn't help, VACUUM FULL on pkardex.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Join slow on "large" tables

From
"Scott Marlowe"
Date:
On Mon, 2004-06-07 at 16:19, Josué Maldonado wrote:
> Hello list,
>
> Server is dual Xeon 2.4, 2GBRAM, Postgresql is running on partition:
> /dev/sda9              29G  8.9G   20G  31% /home2
> /dev/sda9 on /home2 type jfs (rw)
>
> Version()
> PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2
> 20020903 (Red Hat Linux 8.0 3.2-7)
>
> I have a view to join two tables inventory details (pkardex) and
> inventory documents header (pmdoc) this view usually runs pretty slow as
> indicated in the explain analyze, pkardex is 1943465 rows and its size
> aprox 659MB, pmdoc is 1183520 rows and its size is aprox 314MB. The view
> definition is:
>
> SELECT pkd_pk AS kpk, (pkd_stamp)::date AS kfecha, pkd_docto AS kdocto,
>      ((((pdc_custid)::text || ' '::text) ||
> (pdc_custname)::text))::character
>      varying(50) AS kclpv, pkd_saldo AS ksaldo, pkd_es AS kes, CASE WHEN
> (pkd_es
>      = 'E'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS kentrada,
> CASE WHEN
>      (pkd_es = 'S'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS
> ksalida,
>      pkd_pcode AS kprocode, pkd_price AS kvalor, pdc_tipdoc AS ktipdoc
> FROM (pkardex JOIN pmdoc ON ((pmdoc.pdc_pk = pkardex.doctofk)));
>
>
> Shared memory is:
> /root: cat /proc/sys/kernel/shmmax
> 1073741824
>
> and postgresql.conf have this settings:
> tcpip_socket = true
> sort_mem = 8190         # min 64, size in KB
> vacuum_mem = 262144             # min 1024, size in KB
> checkpoint_segments = 10
> max_connections = 256
> shared_buffers = 32000
> effective_cache_size = 160000   # typically 8KB each
> random_page_cost = 2            # units are one sequ
>
> The explain analyze is:
> dbmund=# explain analyze select * from vkardex where kprocode='1017';
>   Nested Loop  (cost=0.00..32155.66 rows=5831 width=114) (actual
> time=18.223..47983.157 rows=4553 loops=1)
>     ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..11292.52
> rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1)
>           Index Cond: ((pkd_pcode)::text = '1017'::text)
>     ->  Index Scan using pdc_pk_idx on pmdoc  (cost=0.00..3.55 rows=1
> width=50) (actual time=1.659..1.661 rows=1 loops=5049)
>           Index Cond: (pmdoc.pdc_pk = "outer".doctofk)
>   Total runtime: 47988.067 ms
> (6 rows)

OK, you have to ask yourself a question here.  Do I have enough memory
to let both postgresql and the kernel to cache this data, or enough
memory for only one.  Then, you pick one and try it out.  But there's
some issues here.  PostgreSQL's shared buffer are not, and should not
generally be thought of as "cache".  A cache's job it to hold the whole
working set, or as much as possible, ready for access.  A buffer's job
is to hold all the data we're tossing around right this second.  Once
we're done with the data, the buffers can and do just drop whatever was
in them.  PostgreSQL does not have caching, in the classical sense.
that may or may not change.

The kernel, on the other hand, has both cache and buffer.  Ever notice
that a Linux top shows the cache usually being much bigger than the
buffers?  My 512 Meg home box right now has 252968k for cache, and
43276k for buffers.

Now, you're tossing around enough data to actually maybe have a use for
a huge set of buffers, but this means you'll need to starve your cache
to get enough buffers.  Which means that if one process does this kind
of join, drops connection, and two seconds later, another process
connects and does nearly the same thing, it's likely to have to read it
all from the hard drives again, as it's not in the postgresql buffer,
and not in the kernel cache.

Starting a seperate connection, doing a simple select * from table1;
sekect * from table 2, dropping the result set returned, and staying
connected seems to be enough to get 7.4 to hold onto the data.

PostgreSQL's current buffer management algo is dirt simple.  The ones in
the kernel's cache are quite good.  So you can quickly reach a point
where PostgreSQL is chasing it's tail where the kernel would have done
OK.

Your numbers show that you are tossing 659M and 314M against each other,
but I don't know if you're harvesting the whole set at once, or just a
couple row of each.  Indexing help, or is this always gonna be a big seq
scan of 90% of both tables?

If you are getting the whole thing all the time, and want postgresql to
buffer the whole thing (I recommend against it, although a very few
circumstances seem to support it) you need to have 973M of buffer.  That
would be 124544 or we'll just call it 130000.  This high of a number
means you will be getting more than 50% of the RAM for postgreSQL.  At
that point, it seems you might as well go for broke and grab most of it,
~200000 or so.

If you're not always mushing the two things against each other, and
you've got other datasets to interact with, index it.

Oh, in your reply you might to include an explain analyze of the query,
and maybe an output of top while the query is running.


Re: Join slow on "large" tables

From
Josué Maldonado
Date:
Hi Josh and thanks for your response,

El 07/06/2004 4:31 PM, Josh Berkus en su mensaje escribio:

> Josue'
>
>
>>    ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..11292.52
>>rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1)
>
>
> Looks to me like there's a problem with index bloat on pkd_pcode_idx.   Try
> REINDEXing it, and if that doesn't help, VACUUM FULL on pkardex.
>

Recreated the index (drop then create) and did the vacuum full pkardex
and the behavior seems to be the same:

dbmund=# explain analyze select * from vkardex where kprocode='1013';
  Nested Loop  (cost=0.00..2248.19 rows=403 width=114) (actual
time=846.318..16030.633 rows=3145 loops=1)
    ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..806.27
rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1)
          Index Cond: ((pkd_pcode)::text = '1013'::text)
    ->  Index Scan using pdc_pk_idx on pmdoc  (cost=0.00..3.55 rows=1
width=50) (actual time=4.482..4.484 rows=1 loops=3544)
          Index Cond: (pmdoc.pdc_pk = "outer".doctofk)
  Total runtime: 16033.807 ms
(6 rows)

At the time the querie was running top returned:

5:11pm  up  1:28,  3 users,  load average: 0.19, 0.97, 1.41
69 processes: 66 sleeping, 1 running, 2 zombie, 0 stopped
CPU0 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
CPU1 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
CPU2 states:  0.1% user,  0.4% system,  0.0% nice, 98.4% idle
CPU3 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
Mem:  2069596K av, 1477784K used,  591812K free,       0K shrd,    2336K
buff
Swap: 2096440K av,    9028K used, 2087412K free                 1388372K
cached

   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  1225 postgres  17   0  257M 257M  255M S     0.6 12.7   7:14 postmaster
  1978 postgres  11   0  1044 1044   860 R     0.2  0.0   0:00 top
     1 root       9   0   472  444   428 S     0.0  0.0   0:04 init
     2 root       8   0     0    0     0 SW    0.0  0.0   0:00 keventd

and free returned:
/root: free
              total       used       free     shared    buffers     cached
Mem:       2069596    1477832     591764          0       2320    1388372
-/+ buffers/cache:      87140    1982456
Swap:      2096440       9028    2087412

I'm not a Linux guru, it looks like a memory leak.


--
Sinceramente,

Josué Maldonado.
"Las palabras de aliento después de la censura son como el sol tras el
aguacero."

Re: Join slow on "large" tables

From
Josh Berkus
Date:
Josue'

> dbmund=# explain analyze select * from vkardex where kprocode='1013';
>   Nested Loop  (cost=0.00..2248.19 rows=403 width=114) (actual
> time=846.318..16030.633 rows=3145 loops=1)
>     ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..806.27
> rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1)
>           Index Cond: ((pkd_pcode)::text = '1013'::text)
>     ->  Index Scan using pdc_pk_idx on pmdoc  (cost=0.00..3.55 rows=1
> width=50) (actual time=4.482..4.484 rows=1 loops=3544)
>           Index Cond: (pmdoc.pdc_pk = "outer".doctofk)
>   Total runtime: 16033.807 ms

Huh?   It is not at all the same.   Your index scan is down to 87ms from
27,000!   And the total query is down to 16seconds from 47 seconds.   Don't
you consider that an improvement?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Join slow on "large" tables

From
Josué Maldonado
Date:
Josh,

El 07/06/2004 5:21 PM, Josh Berkus en su mensaje escribio:
>
> Huh?   It is not at all the same.   Your index scan is down to 87ms from
> 27,000!   And the total query is down to 16seconds from 47 seconds.   Don't
> you consider that an improvement?

Yes there was an improvement with respect the previus query, but still
16 seconds is too slow for that query. And usually the query takes more
than 10 seconds even with small data sets returned.

Thanks,


--
Sinceramente,
Josué Maldonado.

"La cultura es capaz de descifrar los enigmas en que nos envuelve la vida."

Re: Join slow on "large" tables

From
Tom Lane
Date:
=?ISO-8859-1?Q?Josu=E9_Maldonado?= <josue@lamundial.hn> writes:
> Recreated the index (drop then create) and did the vacuum full pkardex
> and the behavior seems to be the same:

Well, there was a pretty huge improvement in the pkardex scan time,
whether you noticed it or not: 39520.406 to 87.393 msec.  This
definitely suggests that you've been lax about vacuuming this table.

I'm wondering whether pmdoc might not be overdue for vacuuming as
well.

            regards, tom lane

Re: Join slow on "large" tables

From
Josué Maldonado
Date:
Hello Scott,

El 07/06/2004 4:47 PM, Scott Marlowe en su mensaje escribio:

> OK, you have to ask yourself a question here.  Do I have enough memory
> to let both postgresql and the kernel to cache this data, or enough
> memory for only one.  Then, you pick one and try it out.  But there's
> some issues here.  PostgreSQL's shared buffer are not, and should not
> generally be thought of as "cache".  A cache's job it to hold the whole
> working set, or as much as possible, ready for access.  A buffer's job
> is to hold all the data we're tossing around right this second.  Once
> we're done with the data, the buffers can and do just drop whatever was
> in them.  PostgreSQL does not have caching, in the classical sense.
> that may or may not change.
>
> The kernel, on the other hand, has both cache and buffer.  Ever notice
> that a Linux top shows the cache usually being much bigger than the
> buffers?  My 512 Meg home box right now has 252968k for cache, and
> 43276k for buffers.

I noticed buffers are lower agains cache at least as top shows, dunno if
I'm wrong:

   8:28am  up  1:00,  2 users,  load average: 0.40, 0.97, 0.75
65 processes: 64 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
CPU1 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
CPU2 states:  0.0% user,  0.1% system,  0.0% nice, 99.4% idle
CPU3 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
Mem:  2069596K av, 1882228K used,  187368K free,       0K shrd,   32924K
buff
Swap: 2096440K av,       0K used, 2096440K free                 1757220K
cached

   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  1508 root      13   0  1040 1040   856 R     0.1  0.0   0:00 top
     1 root       8   0   476  476   432 S     0.0  0.0   0:04 init



> Now, you're tossing around enough data to actually maybe have a use for
> a huge set of buffers, but this means you'll need to starve your cache
> to get enough buffers.  Which means that if one process does this kind
> of join, drops connection, and two seconds later, another process
> connects and does nearly the same thing, it's likely to have to read it
> all from the hard drives again, as it's not in the postgresql buffer,
> and not in the kernel cache.
>
> Starting a seperate connection, doing a simple select * from table1;
> sekect * from table 2, dropping the result set returned, and staying
> connected seems to be enough to get 7.4 to hold onto the data.
>
> PostgreSQL's current buffer management algo is dirt simple.  The ones in
> the kernel's cache are quite good.  So you can quickly reach a point
> where PostgreSQL is chasing it's tail where the kernel would have done
> OK.
>
> Your numbers show that you are tossing 659M and 314M against each other,
> but I don't know if you're harvesting the whole set at once, or just a
> couple row of each.  Indexing help, or is this always gonna be a big seq
> scan of 90% of both tables?

Generally only a small set is queried, the bigest record set expected is
about 24,000 rows and does not exced the 10MB size, explain analyze
shows the planner is using the index as expected but performance still poor.

> If you are getting the whole thing all the time, and want postgresql to
> buffer the whole thing (I recommend against it, although a very few
> circumstances seem to support it) you need to have 973M of buffer.  That
> would be 124544 or we'll just call it 130000.  This high of a number
> means you will be getting more than 50% of the RAM for postgreSQL.  At
> that point, it seems you might as well go for broke and grab most of it,
> ~200000 or so.
>
> If you're not always mushing the two things against each other, and
> you've got other datasets to interact with, index it.
>
> Oh, in your reply you might to include an explain analyze of the query,
> and maybe an output of top while the query is running.
>

dbmund=# explain analyze select * from vkardex where kprocode='1013';
  Nested Loop  (cost=0.00..2248.19 rows=403 width=114) (actual
time=846.318..16030.633 rows=3145 loops=1)
    ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..806.27
rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1)
          Index Cond: ((pkd_pcode)::text = '1013'::text)
    ->  Index Scan using pdc_pk_idx on pmdoc  (cost=0.00..3.55 rows=1
width=50) (actual time=4.482..4.484 rows=1 loops=3544)
          Index Cond: (pmdoc.pdc_pk = "outer".doctofk)
  Total runtime: 16033.807 ms
(6 rows)

At the time the querie was running top returned:
5:11pm  up  1:28,  3 users,  load average: 0.19, 0.97, 1.41
69 processes: 66 sleeping, 1 running, 2 zombie, 0 stopped
CPU0 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
CPU1 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
CPU2 states:  0.1% user,  0.4% system,  0.0% nice, 98.4% idle
CPU3 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
Mem:  2069596K av, 1477784K used,  591812K free,       0K shrd,    2336K
buff
Swap: 2096440K av,    9028K used, 2087412K free                 1388372K
cached

   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  1225 postgres  17   0  257M 257M  255M S     0.6 12.7   7:14 postmaster
  1978 postgres  11   0  1044 1044   860 R     0.2  0.0   0:00 top
     1 root       9   0   472  444   428 S     0.0  0.0   0:04 init
     2 root       8   0     0    0     0 SW    0.0  0.0   0:00 keventd

and free returned:
/root: free
              total       used       free     shared    buffers     cached
Mem:       2069596    1477832     591764          0       2320    1388372
-/+ buffers/cache:      87140    1982456
Swap:      2096440       9028    2087412


Thanks,


--
Sinceramente,
Josué Maldonado.
"El verdadero placer está en la búsqueda, más que en la explicación." --
Isaac Asimov

reindex and copy - deadlock?

From
Litao Wu
Date:
Hi,

We often experience with the problem that reindex
cannot be finished in our production database.
It's typically done with 30 minutes. However,
sometimes, when there is another "COPY" process,
reindex will not finish. By monitoring the CPU
time reindex takes, it does not increase at all.
That seems a deadlock. But the following query shows
only reindex process (23127)is granted lock while
COPY process (3149) is not.

Last time when we have this problem and kill
reindex process and COPY process does not work.
We had to bounce the database server.

As you know, when reindex is running, nobody can
access the table.
Can someone kindly help?

Thanks,



Here is lock info from database:

        replace        | database | transaction |  pid
 |        mode         | granted
-----------------------+----------+-------------+-------+---------------------+---------
 email                 |    17613 |             |
3149 | RowExclusiveLock    | f
 email_cre_dom_idx     |    17613 |             |
23127 | ExclusiveLock       | t
 email_cid_cre_idx     |    17613 |             |
23127 | ShareLock           | t
 email_cid_cre_idx     |    17613 |             |
23127 | AccessExclusiveLock | t
 email                 |    17613 |             |
23127 | ShareLock           | t
 email                 |    17613 |             |
23127 | AccessExclusiveLock | t
 email_cid_cre_dom_idx |    17613 |             |
23127 | ShareLock           | t
 email_cid_cre_dom_idx |    17613 |             |
23127 | AccessExclusiveLock | t
 email_did_cre_idx     |    17613 |             |
23127 | ShareLock           | t
 email_did_cre_idx     |    17613 |             |
23127 | AccessExclusiveLock | t
 email_cre_dom_idx     |    17613 |             |
23127 | AccessExclusiveLock | t
(11 rows)


Here are the processes of 3149 and 23127 from OS:

postgres  3149  1.3  6.4 154104 134444 ?     S
Jun03  92:04 postgres: postgres db1 xx.xx.xx.xx COPY
waiting

postgres 23127  3.2  9.3 228224 194512 ?     S
03:35  15:03 postgres: postgres db1 [local] REINDEX

Here are queries from database:
23127 | REINDEX table email

 3149 | COPY email (...) FROM stdin







__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

Re: Join slow on "large" tables

From
"Scott Marlowe"
Date:
On Tue, 2004-06-08 at 08:36, Josué Maldonado wrote:
> Hello Scott,

SNIP...

> > Your numbers show that you are tossing 659M and 314M against each other,
> > but I don't know if you're harvesting the whole set at once, or just a
> > couple row of each.  Indexing help, or is this always gonna be a big seq
> > scan of 90% of both tables?
>
> Generally only a small set is queried, the bigest record set expected is
> about 24,000 rows and does not exced the 10MB size, explain analyze
> shows the planner is using the index as expected but performance still poor.

If that is the case, then shared_buffers should likely only be 1000 to
10000.  anything over 10000 is usually a bad idea, unless you've proven
it to be faster than <10000.

> dbmund=# explain analyze select * from vkardex where kprocode='1013';
>   Nested Loop  (cost=0.00..2248.19 rows=403 width=114) (actual
> time=846.318..16030.633 rows=3145 loops=1)
>     ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..806.27
> rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1)
>           Index Cond: ((pkd_pcode)::text = '1013'::text)
>     ->  Index Scan using pdc_pk_idx on pmdoc  (cost=0.00..3.55 rows=1
> width=50) (actual time=4.482..4.484 rows=1 loops=3544)
>           Index Cond: (pmdoc.pdc_pk = "outer".doctofk)
>   Total runtime: 16033.807 ms
> (6 rows)

Well, it looks like your predicted versus actual rows are a bit off, and
in the final bit, the planner things that it is going to be merging 403
rows but is in fact merging 3145 rows.  Try

set enable_nestloop = off;
and run the explain analyze again and see if that's faster.  If so, try
upping your target stats on kprocode (see "\h alter table" in psql for
the syntax), rerun analyze, and try the query with set enable_nestloop =
on to see if the planner makes the right choice.




Re: reindex and copy - deadlock?

From
Tom Lane
Date:
Litao Wu <litaowu@yahoo.com> writes:
> We often experience with the problem that reindex
> cannot be finished in our production database.
> It's typically done with 30 minutes. However,
> sometimes, when there is another "COPY" process,
> reindex will not finish. By monitoring the CPU
> time reindex takes, it does not increase at all.
> That seems a deadlock.

There is no deadlock visible in your report: the reindex process is not
waiting for a lock, according to either ps or pg_locks.  You sure it's
not just slow?  I'd expect reindex to be largely I/O bound, so the lack
of CPU activity doesn't prove much.

If you think it's actually stuck waiting for something, try attaching to
the REINDEX backend process with gdb to get a stack trace.  That would
at least give some idea what it's waiting for.

            regards, tom lane

Re: reindex and copy - deadlock?

From
Litao Wu
Date:
Thank you, Tom!

We vacuum and reindex every night and
reindex typically took 30 minutes. Today,
it ran since 3AM, and has not finished till 8:30AM.

The email and its indexe sizes are:
tablename indexname             size_kb reltuples
email                           1292696 8.07905e+06
email     email_cre_dom_idx     323112
email     email_cid_cre_dom_idx 357952
email     email_did_cre_idx     205712
email     email_cid_cre_idx     205560

I agree with you that deadlock is unlikely from
database and OS report.

We have bounced the server since it is a production
database and nobody can access email table because
of this.

I will use gdb next time. What's this right way to
get info as postgres owner?
gdb
attach pid

Thanks again!


--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Litao Wu <litaowu@yahoo.com> writes:
> > We often experience with the problem that reindex
> > cannot be finished in our production database.
> > It's typically done with 30 minutes. However,
> > sometimes, when there is another "COPY" process,
> > reindex will not finish. By monitoring the CPU
> > time reindex takes, it does not increase at all.
> > That seems a deadlock.
>
> There is no deadlock visible in your report: the
> reindex process is not
> waiting for a lock, according to either ps or
> pg_locks.  You sure it's
> not just slow?  I'd expect reindex to be largely I/O
> bound, so the lack
> of CPU activity doesn't prove much.
>
> If you think it's actually stuck waiting for
> something, try attaching to
> the REINDEX backend process with gdb to get a stack
> trace.  That would
> at least give some idea what it's waiting for.
>
>             regards, tom lane





__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

Re: reindex and copy - deadlock?

From
Tom Lane
Date:
Litao Wu <litaowu@yahoo.com> writes:
> I will use gdb next time. What's this right way to
> get info as postgres owner?

    $ gdb /path/to/postgres
    gdb> attach PID-of-backend-process
    gdb> bt
    gdb> quit

You might try this for practice on any idle backend; it shouldn't affect
the state of the backend, except for freezing it while you issue the
commands.

If "bt" gives you just a list of numbers and no symbolic information,
then it won't be much help; you'll need to rebuild the backend with
debugging information so that we can make some sense of the trace.

            regards, tom lane

Re: reindex and copy - deadlock?

From
Litao Wu
Date:
Hi Tom,

Here is gdb info.

This happens in our production database
3 times this week. It's totally unacceptable.
I have written a drop/create script to
avoid reindex. However, drop/create
seems to me take more time than reindex
for the whole database.

Your help is greatly appreciated!

Version:
PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC
2.96


===========================================
su - postgres
ps -auxww|grep REINDEX
postgres 18903  1.2  8.1 195388 169512 ?     S
04:49   1:54 postgres:postgres db1 [local] REINDEX
postgres 13329  0.0  0.0  1768  620 pts/0    S
07:23   0:00 grep REINDEX
gdb /bin/postgres
GNU gdb Red Hat Linux (5.2-2)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General
Public License, and you are
welcome to change it and/or distribute copies of it
under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show
warranty" for details.
This GDB was configured as "i386-redhat-linux"...(no
debugging symbols
found)...
(gdb) attach 18903
Attaching to program: /bin/postgres, process 18903
Reading symbols from /usr/lib/libz.so.1...(no
debugging symbols
found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4...(no
debugging symbols
found)...done.
Loaded symbols for /usr/lib/libreadline.so.4
Reading symbols from /lib/libtermcap.so.2...(no
debugging symbols
found)...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...(no
debugging symbols
found)...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...(no
debugging symbols
found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging
symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...(no
debugging symbols
found)...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...(no
debugging symbols
found)...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no
debugging symbols
found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...(no
debugging symbols
found)...done.
Loaded symbols for /lib/libnss_files.so.2
0x420e8bb2 in semop () from /lib/i686/libc.so.6
(gdb) bt
#0  0x420e8bb2 in semop () from /lib/i686/libc.so.6
#1  0x080ffa28 in PGSemaphoreLock ()
#2  0x08116432 in LWLockAcquire ()
#3  0x0810f572 in LockBuffer ()
#4  0x0807dea3 in _bt_getbuf ()
#5  0x080813ec in _bt_leafbuild ()
#6  0x080816a6 in _bt_leafbuild ()
#7  0x08081b8b in _bt_leafbuild ()
#8  0x080813cc in _bt_leafbuild ()
#9  0x0807e1d0 in btbuild ()
#10 0x081631c3 in OidFunctionCall3 ()
#11 0x080920a7 in index_build ()
#12 0x08092593 in reindex_index ()
#13 0x08092473 in IndexBuildHeapScan ()
#14 0x0809275d in reindex_relation ()
#15 0x080b9164 in ReindexTable ()
#16 0x08118ece in pg_exec_query_string ()
#17 0x08119fe5 in PostgresMain ()
#18 0x0810214c in ClosePostmasterPorts ()
#19 0x08101a9e in ClosePostmasterPorts ()
#20 0x08100ca1 in PostmasterMain ()
#21 0x08100862 in PostmasterMain ()
#22 0x080deed7 in main ()
#23 0x42017589 in __libc_start_main () from
/lib/i686/libc.so.6
(gdb) quit
The program is running.  Quit anyway (and detach it)?
(y or n) y
Detaching from program: /bin/postgres, process 18903

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Litao Wu <litaowu@yahoo.com> writes:
> > I will use gdb next time. What's this right way to
> > get info as postgres owner?
>
>     $ gdb /path/to/postgres
>     gdb> attach PID-of-backend-process
>     gdb> bt
>     gdb> quit
>
> You might try this for practice on any idle backend;
> it shouldn't affect
> the state of the backend, except for freezing it
> while you issue the
> commands.
>
> If "bt" gives you just a list of numbers and no
> symbolic information,
> then it won't be much help; you'll need to rebuild
> the backend with
> debugging information so that we can make some sense
> of the trace.
>
>             regards, tom lane





__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

Re: reindex and copy - deadlock?

From
Tom Lane
Date:
Litao Wu <litaowu@yahoo.com> writes:
> (gdb) bt
> #0  0x420e8bb2 in semop () from /lib/i686/libc.so.6
> #1  0x080ffa28 in PGSemaphoreLock ()
> #2  0x08116432 in LWLockAcquire ()
> #3  0x0810f572 in LockBuffer ()
> #4  0x0807dea3 in _bt_getbuf ()
> #5  0x080813ec in _bt_leafbuild ()
> #6  0x080816a6 in _bt_leafbuild ()
> #7  0x08081b8b in _bt_leafbuild ()
> #8  0x080813cc in _bt_leafbuild ()
> #9  0x0807e1d0 in btbuild ()
> #10 0x081631c3 in OidFunctionCall3 ()
> #11 0x080920a7 in index_build ()
> #12 0x08092593 in reindex_index ()

Hmm.  I don't think I believe this backtrace.  It's obviously wrong at
lines 5-7 - _bt_leafbuild doesn't call itself nor call _bt_getbuf.
It's possible that you don't have any local symbols in this executable
and what we're seeing is the nearest global symbol, so let's ignore
that; but if we take lines 0-4 at face value, what it says is that the
REINDEX is stuck waiting for buffer lock on a buffer for a new empty
page it has just added to the new index.  This is flatly impossible.
There is no other process that could possibly be interested in that
buffer, or for that matter even be able to name it (since the new index
has a new relfilenode value that isn't even visible to any other process
yet).  I thought for a little bit that a background CHECKPOINT might be
trying to write out the new buffer, but that theory holds no water
either, because at this point in the _bt_getbuf sequence, the buffer is
not marked dirty (I just verified this by stepping through it in 7.4.2).

I can think of lots of reasons why the REINDEX might block at the
previous step of the sequence, namely acquiring a fresh buffer ... but
once it's got the buffer there is surely no reason to block.

What I'm inclined to think is that the backtrace isn't right at all.
Would it be possible for you to install a backend built with
--enable-debug and get a more reliable backtrace?

            regards, tom lane

Re: reindex and copy - deadlock?

From
Litao Wu
Date:
We have another production database,
which is similar with this one.
It has never had REINDEX block problem yet.
One difference between these two databases
is the one having REINDEX problem is using
NTFS file system.

Is it possible the root of problem?

Thanks,

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Litao Wu <litaowu@yahoo.com> writes:
> > (gdb) bt
> > #0  0x420e8bb2 in semop () from
> /lib/i686/libc.so.6
> > #1  0x080ffa28 in PGSemaphoreLock ()
> > #2  0x08116432 in LWLockAcquire ()
> > #3  0x0810f572 in LockBuffer ()
> > #4  0x0807dea3 in _bt_getbuf ()
> > #5  0x080813ec in _bt_leafbuild ()
> > #6  0x080816a6 in _bt_leafbuild ()
> > #7  0x08081b8b in _bt_leafbuild ()
> > #8  0x080813cc in _bt_leafbuild ()
> > #9  0x0807e1d0 in btbuild ()
> > #10 0x081631c3 in OidFunctionCall3 ()
> > #11 0x080920a7 in index_build ()
> > #12 0x08092593 in reindex_index ()
>
> Hmm.  I don't think I believe this backtrace.  It's
> obviously wrong at
> lines 5-7 - _bt_leafbuild doesn't call itself nor
> call _bt_getbuf.
> It's possible that you don't have any local symbols
> in this executable
> and what we're seeing is the nearest global symbol,
> so let's ignore
> that; but if we take lines 0-4 at face value, what
> it says is that the
> REINDEX is stuck waiting for buffer lock on a buffer
> for a new empty
> page it has just added to the new index.  This is
> flatly impossible.
> There is no other process that could possibly be
> interested in that
> buffer, or for that matter even be able to name it
> (since the new index
> has a new relfilenode value that isn't even visible
> to any other process
> yet).  I thought for a little bit that a background
> CHECKPOINT might be
> trying to write out the new buffer, but that theory
> holds no water
> either, because at this point in the _bt_getbuf
> sequence, the buffer is
> not marked dirty (I just verified this by stepping
> through it in 7.4.2).
>
> I can think of lots of reasons why the REINDEX might
> block at the
> previous step of the sequence, namely acquiring a
> fresh buffer ... but
> once it's got the buffer there is surely no reason
> to block.
>
> What I'm inclined to think is that the backtrace
> isn't right at all.
> Would it be possible for you to install a backend
> built with
> --enable-debug and get a more reliable backtrace?
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org





__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

Re: reindex and copy - deadlock?

From
Tom Lane
Date:
Litao Wu <litaowu@yahoo.com> writes:
> One difference between these two databases
> is the one having REINDEX problem is using
> NTFS file system.

Oh?  That's interesting.

> Is it possible the root of problem?

I would not expect it to show this particular symptom --- if the
backtrace is accurate.  But there are nearby places that might have
FS-dependent behavior.  Can you do anything about my request for
a stack trace from a debug-enabled build?

            regards, tom lane

Re: reindex and copy - deadlock?

From
Litao Wu
Date:
Hi,

I have changed "reindex table my_table" to:
psql ...
  -c "drop index my_index; create index my_index;"

We still experience the same "hang" problem.

I was told that this time, the process is
"create index my_index;" before the PG server is
bounced.

When I login the database, I found the
my_index is still there.

I do not know what caused this happen, and I
am also confused. If create index my_index is killed
by "-9", then my_index should not present in the
database because it has been dropped before creating.

On the other hand, if "drop index my_index;" is
killed, then how drop index (which is DDL, right?)
can be blocked? There must be other process(es)
has/have execlusive lock on my_index, which
is not our case from pg_locks.

Tom, we are in the process of installing
the backend with --enable-debug.

Thanks,


--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Litao Wu <litaowu@yahoo.com> writes:
> > One difference between these two databases
> > is the one having REINDEX problem is using
> > NTFS file system.
>
> Oh?  That's interesting.
>
> > Is it possible the root of problem?
>
> I would not expect it to show this particular
> symptom --- if the
> backtrace is accurate.  But there are nearby places
> that might have
> FS-dependent behavior.  Can you do anything about my
> request for
> a stack trace from a debug-enabled build?
>
>             regards, tom lane




__________________________________
Do you Yahoo!?
Yahoo! Mail - Helps protect you from nasty viruses.
http://promotions.yahoo.com/new_mail

Re: reindex and copy - deadlock?

From
Tom Lane
Date:
Litao Wu <litaowu@yahoo.com> writes:
> I have changed "reindex table my_table" to:
> psql ...
>   -c "drop index my_index; create index my_index;"

> I do not know what caused this happen, and I
> am also confused. If create index my_index is killed
> by "-9", then my_index should not present in the
> database because it has been dropped before creating.

I believe that the above executes the two commands in a single
transaction.  So if you kill it midway through the CREATE, everything
rolls back and the index is still there.

            regards, tom lane

Re: reindex and copy - deadlock?

From
Litao Wu
Date:
Hi All,

It happened again.
This time it hangs when we drop/create index.

Here is gdb info with --enable-debug postgres.

Thank you for your help!

postgres 24533 24327  2 Jun28 ?        00:39:11
postgres: postgres
xxx xxx.xxx.x.xxx COPY waiting
postgres 23508 24327  0 03:23 ?        00:00:00
postgres: postgres
xxx xxx.xxx.x.xx SELECT waiting
root     23662 22727  0 03:24 ?        00:00:00
/xxx/bin/psql -t -A -q xxx -U postgres -c set
sort_mem=131072; DROP INDEX xxx_mod_ac_did_cre_idx;
CREATE INDEX xxx_mod_ac_did_cre_idx ON
xxx_module_action USING btree (domain_id, created);
postgres 23663 24327  2 03:24 ?        00:04:40
postgres: postgres
xxx [local] CREATE INDEX
postgres 24252 24327  0 03:26 ?        00:00:00
postgres: postgres
xxx xxx.xxx.x.xx SELECT waiting

bash-2.05a$ gdb /xxx/bin/postgres
GNU gdb Red Hat Linux (5.2-2)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General
Public License, and you
are
welcome to change it and/or distribute copies of it
under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show
warranty" for
details.
This GDB was configured as "i386-redhat-linux"...
(gdb) attach 23663
Attaching to program: /xxx/bin.Linux/postgres, process
23663
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4...done.
Loaded symbols for /usr/lib/libreadline.so.4
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
0x420e8bb2 in semop () from /lib/i686/libc.so.6
(gdb) bt
#0  0x420e8bb2 in semop () from /lib/i686/libc.so.6
#1  0x080ff954 in PGSemaphoreLock (sema=0x4a2d83e8,
interruptOK=0 '\0')
at pg_sema.c:434
#2  0x0811635e in LWLockAcquire (lockid=21335,
mode=LW_EXCLUSIVE) at
lwlock.c:312
#3  0x0810f49e in LockBuffer (buffer=10657, mode=2) at
bufmgr.c:1848
#4  0x0807dea3 in _bt_getbuf (rel=0x40141b10,
blkno=4294967295,
access=2) at nbtpage.c:337
#5  0x080813d8 in _bt_blnewpage (index=0x40141b10,
buf=0xbfffe724,
page=0xbfffe728, flags=1) at nbtsort.c:188
#6  0x08081692 in _bt_buildadd (index=0x40141b10,
state=0x4e0b3e30,
bti=0x4fe20cb8) at nbtsort.c:373
#7  0x08081b77 in _bt_load (index=0x40141b10,
btspool=0x82bf7b8,
btspool2=0x0) at nbtsort.c:638
#8  0x080813b8 in _bt_leafbuild (btspool=0x82bf7b8,
btspool2=0x0) at
nbtsort.c:171
#9  0x0807e1d0 in btbuild (fcinfo=0xbfffe820) at
nbtree.c:165
#10 0x081630d7 in OidFunctionCall3 (functionId=338,
arg1=1075019120,
arg2=1075059472, arg3=137095072)
    at fmgr.c:1275
#11 0x08092093 in index_build
(heapRelation=0x40137d70,
indexRelation=0x40141b10, indexInfo=0x82be7a0)
    at index.c:1447
#12 0x080913d7 in index_create (heapRelationId=17618,
indexRelationName=0x82b9648 "xxx_mod_ac_did_cre_idx",
    indexInfo=0x82be7a0, accessMethodObjectId=403,
classObjectId=0x82be578, primary=0 '\0',
isconstraint=0 '\0',
    allow_system_table_mods=0 '\0') at index.c:765
#13 0x080b88ae in DefineIndex (heapRelation=0x82b9698,
indexRelationName=0x82b9648 "xxx_mod_ac_did_cre_idx",
    accessMethodName=0x82b96c0 "btree",
attributeList=0x82b9718,
unique=0 '\0', primary=0 '\0',
    isconstraint=0 '\0', predicate=0x0,
rangetable=0x0) at
indexcmds.c:211
#14 0x0811b250 in ProcessUtility (parsetree=0x82b9788,
dest=Remote,
completionTag=0xbfffea80 "") at utility.c:620
#15 0x08118df6 in pg_exec_query_string
(query_string=0x82b91e0,
dest=Remote, parse_context=0x82ade58)
    at postgres.c:789
#16 0x08119f0d in PostgresMain (argc=4,
argv=0xbfffecb0,
username=0x8240679 "postgres") at postgres.c:2013
#17 0x08102078 in DoBackend (port=0x8240548) at
postmaster.c:2302
#18 0x081019ca in BackendStartup (port=0x8240548) at
postmaster.c:1924
#19 0x08100bcd in ServerLoop () at postmaster.c:1009
#20 0x0810078e in PostmasterMain (argc=1,
argv=0x8227468) at
postmaster.c:788
#21 0x080dee2b in main (argc=1, argv=0xbffff644) at
main.c:210
#22 0x42017589 in __libc_start_main () from
/lib/i686/libc.so.6
(gdb) quit
The program is running.  Quit anyway (and detach it)?
(y or n) y
Detaching from program: /xxx/bin.Linux/postgres,
process 23663



--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Litao Wu <litaowu@yahoo.com> writes:
> > One difference between these two databases
> > is the one having REINDEX problem is using
> > NTFS file system.
>
> Oh?  That's interesting.
>
> > Is it possible the root of problem?
>
> I would not expect it to show this particular
> symptom --- if the
> backtrace is accurate.  But there are nearby places
> that might have
> FS-dependent behavior.  Can you do anything about my
> request for
> a stack trace from a debug-enabled build?
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org




__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail

Re: reindex and copy - deadlock?

From
Tom Lane
Date:
Litao Wu <litaowu@yahoo.com> writes:
> It happened again.
> This time it hangs when we drop/create index.
> Here is gdb info with --enable-debug postgres.

Well, that pretty much removes all doubt: something has left the buffer
context lock (cntx_lock) set on a buffer that certainly ought to be free.

The problem here is that REINDEX (or CREATE INDEX in this case) is the
victim, not the perpetrator, so we still don't know exactly what's
causing the error.  We need to go backwards in time, so to speak, to
identify the code that's leaving the buffer locked when it shouldn't.
I don't offhand have a good idea about how to do that.  Is there another
process that is also getting stuck when REINDEX does (if so please get
a backtrace from it too)?

BTW, what Postgres version are you using again?  The line numbers in
your trace don't square with any current version of bufmgr.c ...

            regards, tom lane

Re: reindex and copy - deadlock?

From
Litao Wu
Date:
Hi Tom,

Our PG version is 7.3.2.

The copy process is always there. Besides copy
process, there are many select processes wait also
(it is understandable only when reindex,
but how come selects wait when drop/create index?
From Postgres doc:
Note: Another approach to dealing with a corrupted
user-table index is just to drop and recreate it. This
may in fact be preferable if you would like to
maintain some semblance of normal operation on the
table meanwhile. REINDEX acquires exclusive lock on
the table, while CREATE INDEX only locks out writes
not reads of the table.
)

Each time, whan this happened, it might hang
on the different index.

But one thing is sure:
reindex or create index is granted lock while
others wait. If reindex/create index is not
the perpetrator, how can PG grants it lock
but not others, like COPY?

Forgive me I had not provided the full table and
index names, IP address, etc. for security reason.

Here is the copy of my the first post on June 8:
Hi,

We often experience with the problem that reindex
cannot be finished in our production database.
It's typically done with 30 minutes. However,
sometimes, when there is another "COPY" process,
reindex will not finish. By monitoring the CPU
time reindex takes, it does not increase at all.
That seems a deadlock. But the following query shows
only reindex process (23127)is granted lock while
COPY process (3149) is not.

Last time when we have this problem and kill
reindex process and COPY process does not work.
We had to bounce the database server.

As you know, when reindex is running, nobody can
access the table.
Can someone kindly help?

Thanks,



Here is lock info from database:

        replace        | database | transaction |  pid
 |        mode         | granted
-----------------------+----------+-------------+-------+---------------------+---------
 email                 |    17613 |             |
3149 | RowExclusiveLock    | f
 email_cre_dom_idx     |    17613 |             |
23127 | ExclusiveLock       | t
 email_cid_cre_idx     |    17613 |             |
23127 | ShareLock           | t
 email_cid_cre_idx     |    17613 |             |
23127 | AccessExclusiveLock | t
 email                 |    17613 |             |
23127 | ShareLock           | t
 email                 |    17613 |             |
23127 | AccessExclusiveLock | t
 email_cid_cre_dom_idx |    17613 |             |
23127 | ShareLock           | t
 email_cid_cre_dom_idx |    17613 |             |
23127 | AccessExclusiveLock | t
 email_did_cre_idx     |    17613 |             |
23127 | ShareLock           | t
 email_did_cre_idx     |    17613 |             |
23127 | AccessExclusiveLock | t
 email_cre_dom_idx     |    17613 |             |
23127 | AccessExclusiveLock | t
(11 rows)


Here are the processes of 3149 and 23127 from OS:

postgres  3149  1.3  6.4 154104 134444 ?     S
Jun03  92:04 postgres: postgres db1 xx.xx.xx.xx COPY
waiting

postgres 23127  3.2  9.3 228224 194512 ?     S
03:35  15:03 postgres: postgres db1 [local] REINDEX

Here are queries from database:
23127 | REINDEX table email

 3149 | COPY email (...) FROM stdin



--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Litao Wu <litaowu@yahoo.com> writes:
> > It happened again.
> > This time it hangs when we drop/create index.
> > Here is gdb info with --enable-debug postgres.
>
> Well, that pretty much removes all doubt: something
> has left the buffer
> context lock (cntx_lock) set on a buffer that
> certainly ought to be free.
>
> The problem here is that REINDEX (or CREATE INDEX in
> this case) is the
> victim, not the perpetrator, so we still don't know
> exactly what's
> causing the error.  We need to go backwards in time,
> so to speak, to
> identify the code that's leaving the buffer locked
> when it shouldn't.
> I don't offhand have a good idea about how to do
> that.  Is there another
> process that is also getting stuck when REINDEX does
> (if so please get
> a backtrace from it too)?
>
> BTW, what Postgres version are you using again?  The
> line numbers in
> your trace don't square with any current version of
> bufmgr.c ...
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose
> an index scan if your
>       joining column's datatypes do not match
>





__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail

Re: reindex and copy - deadlock?

From
Tom Lane
Date:
Litao Wu <litaowu@yahoo.com> writes:
> Our PG version is 7.3.2.

Hmm.  On general principles you should be using 7.3.6, but I do not see
anything in the 7.3.* change logs that looks very likely to cure this.

> The copy process is always there. Besides copy
> process, there are many select processes wait also
> (it is understandable only when reindex,
> but how come selects wait when drop/create index?

DROP INDEX would lock out selects (it has no other way to be sure no
select is trying to *use* the index).  Once you're past that, selects
would work, but if you try something like
    begin; drop index; create index; commit;
then the drop's lock will be held till commit.

I'm not sure about whether COPY is related.  In your original post, the
COPY was waiting to acquire RowExclusiveLock on the table, so it hadn't
actually done anything yet and really couldn't be holding a buffer lock
AFAICS.

> But one thing is sure:
> reindex or create index is granted lock while
> others wait. If reindex/create index is not
> the perpetrator, how can PG grants it lock
> but not others, like COPY?

The point is that it's waiting for a lower-level lock (namely a buffer
LWLock).  There's no deadlock detection for LWLocks, because they're not
supposed to be used in ways that could cause a deadlock.

Assuming for the moment that indeed this is a deadlock, you could learn
something the next time it happens with some manual investigation.
You'll need to keep using the debug-enabled build.  When you next get a
lockup, proceed as follows:

1. Attach to the REINDEX or CREATE INDEX process and find out which
LWLock number it is blocked on.  (This is the lockid argument of
LWLockAcquire, 21335 in your trace of today.)

2. For *each* live backend process (including the REINDEX itself),
attach with gdb and look at the held-locks status of lwlock.c.
This would go something like

    gdb> p num_held_lwlocks
if greater than zero:
    gdb> x/10d held_lwlocks
(replace "10" by the value of num_held_lwlocks)

If you find a backend that is holding the lock number that REINDEX
wants, print out its call stack with "bt", and look in pg_locks to see
what lockmanager locks it is holding or waiting for.  If you do not find
one, then the deadlock theory is disproved, and we're back to square
one.

            regards, tom lane

Re: reindex and copy - deadlock?

From
Litao Wu
Date:
Thanks!

OK, we will do this exceise next time.

TSince there are multiple databases and
there are 170 postgres processes this morning,
60 of them are access the problem database,
and 57 of 60 are non-idle.

We only need to gdb those 57 processes, or
we need gdb 60 or 170?

Thanks again!

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Litao Wu <litaowu@yahoo.com> writes:
> > Our PG version is 7.3.2.
>
> Hmm.  On general principles you should be using
> 7.3.6, but I do not see
> anything in the 7.3.* change logs that looks very
> likely to cure this.
>
> > The copy process is always there. Besides copy
> > process, there are many select processes wait also
> > (it is understandable only when reindex,
> > but how come selects wait when drop/create index?
>
> DROP INDEX would lock out selects (it has no other
> way to be sure no
> select is trying to *use* the index).  Once you're
> past that, selects
> would work, but if you try something like
>     begin; drop index; create index; commit;
> then the drop's lock will be held till commit.
>
> I'm not sure about whether COPY is related.  In your
> original post, the
> COPY was waiting to acquire RowExclusiveLock on the
> table, so it hadn't
> actually done anything yet and really couldn't be
> holding a buffer lock
> AFAICS.
>
> > But one thing is sure:
> > reindex or create index is granted lock while
> > others wait. If reindex/create index is not
> > the perpetrator, how can PG grants it lock
> > but not others, like COPY?
>
> The point is that it's waiting for a lower-level
> lock (namely a buffer
> LWLock).  There's no deadlock detection for LWLocks,
> because they're not
> supposed to be used in ways that could cause a
> deadlock.
>
> Assuming for the moment that indeed this is a
> deadlock, you could learn
> something the next time it happens with some manual
> investigation.
> You'll need to keep using the debug-enabled build.
> When you next get a
> lockup, proceed as follows:
>
> 1. Attach to the REINDEX or CREATE INDEX process and
> find out which
> LWLock number it is blocked on.  (This is the lockid
> argument of
> LWLockAcquire, 21335 in your trace of today.)
>
> 2. For *each* live backend process (including the
> REINDEX itself),
> attach with gdb and look at the held-locks status of
> lwlock.c.
> This would go something like
>
>     gdb> p num_held_lwlocks
> if greater than zero:
>     gdb> x/10d held_lwlocks
> (replace "10" by the value of num_held_lwlocks)
>
> If you find a backend that is holding the lock
> number that REINDEX
> wants, print out its call stack with "bt", and look
> in pg_locks to see
> what lockmanager locks it is holding or waiting for.
>  If you do not find
> one, then the deadlock theory is disproved, and
> we're back to square
> one.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>




__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail

Re: reindex and copy - deadlock?

From
Tom Lane
Date:
Litao Wu <litaowu@yahoo.com> writes:
> Since there are multiple databases and
> there are 170 postgres processes this morning,
> 60 of them are access the problem database,
> and 57 of 60 are non-idle.

> We only need to gdb those 57 processes, or
> we need gdb 60 or 170?

Potentially the deadlock could be anywhere :-(.  You should definitely
not assume it must be one of the processes connected to the problem
database, because the buffer pool is cluster-wide.

Might be worth setting up a shell script to help.

            regards, tom lane

network address query

From
Litao Wu
Date:
Hi,

I have query:
explain
SELECT *
FROM ip_tracking T, ip_map C
WHERE
  T.source_ip::inet >>= C.net;
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..3894833367750.16
rows=51709297065144 width=111)
   Join Filter: ("outer".source_ip >>=
("inner".net)::inet)
   ->  Seq Scan on ip_tracking t
(cost=0.00..825050.68 rows=31093368 width=34)
   ->  Seq Scan on ip_map c  (cost=0.00..83686.66
rows=3326066 width=77)
(4 rows)

ip_tracking (
  pk_col int,
  source_ip inet,
  .. the rest...
)
There is one index
  ip_tracking_ip_idx btree (source_ip)

ip_map (
net cidr,
... the rest...)
Indexes: map_net_idx hash (net)

If I change ">>=" to "=", the query plan is:

 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..10798882243.63 rows=31093368
width=111)
   ->  Seq Scan on ip_map c  (cost=0.00..83686.66
rows=3326066 width=77)
   ->  Index Scan using ip_tracking_ip_idx on
ip_tracking t  (cost=0.00..3236.72 rows=800 width=34)
         Index Cond: (t.source_ip =
("outer".net)::inet)
(4 rows)

This is my first time to deal network address type.

Is it possible to make a query use index with
operator of ">>=" like the above?

Thanks,





__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail