Thread: Join slow on "large" tables
Hello list, Server is dual Xeon 2.4, 2GBRAM, Postgresql is running on partition: /dev/sda9 29G 8.9G 20G 31% /home2 /dev/sda9 on /home2 type jfs (rw) Version() PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7) I have a view to join two tables inventory details (pkardex) and inventory documents header (pmdoc) this view usually runs pretty slow as indicated in the explain analyze, pkardex is 1943465 rows and its size aprox 659MB, pmdoc is 1183520 rows and its size is aprox 314MB. The view definition is: SELECT pkd_pk AS kpk, (pkd_stamp)::date AS kfecha, pkd_docto AS kdocto, ((((pdc_custid)::text || ' '::text) || (pdc_custname)::text))::character varying(50) AS kclpv, pkd_saldo AS ksaldo, pkd_es AS kes, CASE WHEN (pkd_es = 'E'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS kentrada, CASE WHEN (pkd_es = 'S'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS ksalida, pkd_pcode AS kprocode, pkd_price AS kvalor, pdc_tipdoc AS ktipdoc FROM (pkardex JOIN pmdoc ON ((pmdoc.pdc_pk = pkardex.doctofk))); Shared memory is: /root: cat /proc/sys/kernel/shmmax 1073741824 and postgresql.conf have this settings: tcpip_socket = true sort_mem = 8190 # min 64, size in KB vacuum_mem = 262144 # min 1024, size in KB checkpoint_segments = 10 max_connections = 256 shared_buffers = 32000 effective_cache_size = 160000 # typically 8KB each random_page_cost = 2 # units are one sequ The explain analyze is: dbmund=# explain analyze select * from vkardex where kprocode='1017'; Nested Loop (cost=0.00..32155.66 rows=5831 width=114) (actual time=18.223..47983.157 rows=4553 loops=1) -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..11292.52 rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1) Index Cond: ((pkd_pcode)::text = '1017'::text) -> Index Scan using pdc_pk_idx on pmdoc (cost=0.00..3.55 rows=1 width=50) (actual time=1.659..1.661 rows=1 loops=5049) Index Cond: (pmdoc.pdc_pk = "outer".doctofk) Total runtime: 47988.067 ms (6 rows) Does anyone can help me how to properly tune postgresql to gain some speed in such queries, some people have mentioned a RAM increase is necesary, about 8GB or more to have postgresql to run smooth, any comment or suggestion. I really appreciate any help. Regards, -- Sinceramente, Josué Maldonado. "Que se me den seis líneas escritas de puño y letra del hombre más honrado del mundo, y hallaré en ellas motivos para hacerle ahorcar." --cardenal Richelieu (Cardenal y político francés. 1.585 - 1.642)
Josue' > -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..11292.52 > rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1) Looks to me like there's a problem with index bloat on pkd_pcode_idx. Try REINDEXing it, and if that doesn't help, VACUUM FULL on pkardex. -- -Josh Berkus Aglio Database Solutions San Francisco
On Mon, 2004-06-07 at 16:19, Josué Maldonado wrote: > Hello list, > > Server is dual Xeon 2.4, 2GBRAM, Postgresql is running on partition: > /dev/sda9 29G 8.9G 20G 31% /home2 > /dev/sda9 on /home2 type jfs (rw) > > Version() > PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 > 20020903 (Red Hat Linux 8.0 3.2-7) > > I have a view to join two tables inventory details (pkardex) and > inventory documents header (pmdoc) this view usually runs pretty slow as > indicated in the explain analyze, pkardex is 1943465 rows and its size > aprox 659MB, pmdoc is 1183520 rows and its size is aprox 314MB. The view > definition is: > > SELECT pkd_pk AS kpk, (pkd_stamp)::date AS kfecha, pkd_docto AS kdocto, > ((((pdc_custid)::text || ' '::text) || > (pdc_custname)::text))::character > varying(50) AS kclpv, pkd_saldo AS ksaldo, pkd_es AS kes, CASE WHEN > (pkd_es > = 'E'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS kentrada, > CASE WHEN > (pkd_es = 'S'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS > ksalida, > pkd_pcode AS kprocode, pkd_price AS kvalor, pdc_tipdoc AS ktipdoc > FROM (pkardex JOIN pmdoc ON ((pmdoc.pdc_pk = pkardex.doctofk))); > > > Shared memory is: > /root: cat /proc/sys/kernel/shmmax > 1073741824 > > and postgresql.conf have this settings: > tcpip_socket = true > sort_mem = 8190 # min 64, size in KB > vacuum_mem = 262144 # min 1024, size in KB > checkpoint_segments = 10 > max_connections = 256 > shared_buffers = 32000 > effective_cache_size = 160000 # typically 8KB each > random_page_cost = 2 # units are one sequ > > The explain analyze is: > dbmund=# explain analyze select * from vkardex where kprocode='1017'; > Nested Loop (cost=0.00..32155.66 rows=5831 width=114) (actual > time=18.223..47983.157 rows=4553 loops=1) > -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..11292.52 > rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1) > Index Cond: ((pkd_pcode)::text = '1017'::text) > -> Index Scan using pdc_pk_idx on pmdoc (cost=0.00..3.55 rows=1 > width=50) (actual time=1.659..1.661 rows=1 loops=5049) > Index Cond: (pmdoc.pdc_pk = "outer".doctofk) > Total runtime: 47988.067 ms > (6 rows) OK, you have to ask yourself a question here. Do I have enough memory to let both postgresql and the kernel to cache this data, or enough memory for only one. Then, you pick one and try it out. But there's some issues here. PostgreSQL's shared buffer are not, and should not generally be thought of as "cache". A cache's job it to hold the whole working set, or as much as possible, ready for access. A buffer's job is to hold all the data we're tossing around right this second. Once we're done with the data, the buffers can and do just drop whatever was in them. PostgreSQL does not have caching, in the classical sense. that may or may not change. The kernel, on the other hand, has both cache and buffer. Ever notice that a Linux top shows the cache usually being much bigger than the buffers? My 512 Meg home box right now has 252968k for cache, and 43276k for buffers. Now, you're tossing around enough data to actually maybe have a use for a huge set of buffers, but this means you'll need to starve your cache to get enough buffers. Which means that if one process does this kind of join, drops connection, and two seconds later, another process connects and does nearly the same thing, it's likely to have to read it all from the hard drives again, as it's not in the postgresql buffer, and not in the kernel cache. Starting a seperate connection, doing a simple select * from table1; sekect * from table 2, dropping the result set returned, and staying connected seems to be enough to get 7.4 to hold onto the data. PostgreSQL's current buffer management algo is dirt simple. The ones in the kernel's cache are quite good. So you can quickly reach a point where PostgreSQL is chasing it's tail where the kernel would have done OK. Your numbers show that you are tossing 659M and 314M against each other, but I don't know if you're harvesting the whole set at once, or just a couple row of each. Indexing help, or is this always gonna be a big seq scan of 90% of both tables? If you are getting the whole thing all the time, and want postgresql to buffer the whole thing (I recommend against it, although a very few circumstances seem to support it) you need to have 973M of buffer. That would be 124544 or we'll just call it 130000. This high of a number means you will be getting more than 50% of the RAM for postgreSQL. At that point, it seems you might as well go for broke and grab most of it, ~200000 or so. If you're not always mushing the two things against each other, and you've got other datasets to interact with, index it. Oh, in your reply you might to include an explain analyze of the query, and maybe an output of top while the query is running.
Hi Josh and thanks for your response, El 07/06/2004 4:31 PM, Josh Berkus en su mensaje escribio: > Josue' > > >> -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..11292.52 >>rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1) > > > Looks to me like there's a problem with index bloat on pkd_pcode_idx. Try > REINDEXing it, and if that doesn't help, VACUUM FULL on pkardex. > Recreated the index (drop then create) and did the vacuum full pkardex and the behavior seems to be the same: dbmund=# explain analyze select * from vkardex where kprocode='1013'; Nested Loop (cost=0.00..2248.19 rows=403 width=114) (actual time=846.318..16030.633 rows=3145 loops=1) -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..806.27 rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1) Index Cond: ((pkd_pcode)::text = '1013'::text) -> Index Scan using pdc_pk_idx on pmdoc (cost=0.00..3.55 rows=1 width=50) (actual time=4.482..4.484 rows=1 loops=3544) Index Cond: (pmdoc.pdc_pk = "outer".doctofk) Total runtime: 16033.807 ms (6 rows) At the time the querie was running top returned: 5:11pm up 1:28, 3 users, load average: 0.19, 0.97, 1.41 69 processes: 66 sleeping, 1 running, 2 zombie, 0 stopped CPU0 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU1 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU2 states: 0.1% user, 0.4% system, 0.0% nice, 98.4% idle CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle Mem: 2069596K av, 1477784K used, 591812K free, 0K shrd, 2336K buff Swap: 2096440K av, 9028K used, 2087412K free 1388372K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 1225 postgres 17 0 257M 257M 255M S 0.6 12.7 7:14 postmaster 1978 postgres 11 0 1044 1044 860 R 0.2 0.0 0:00 top 1 root 9 0 472 444 428 S 0.0 0.0 0:04 init 2 root 8 0 0 0 0 SW 0.0 0.0 0:00 keventd and free returned: /root: free total used free shared buffers cached Mem: 2069596 1477832 591764 0 2320 1388372 -/+ buffers/cache: 87140 1982456 Swap: 2096440 9028 2087412 I'm not a Linux guru, it looks like a memory leak. -- Sinceramente, Josué Maldonado. "Las palabras de aliento después de la censura son como el sol tras el aguacero."
Josue' > dbmund=# explain analyze select * from vkardex where kprocode='1013'; > Nested Loop (cost=0.00..2248.19 rows=403 width=114) (actual > time=846.318..16030.633 rows=3145 loops=1) > -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..806.27 > rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1) > Index Cond: ((pkd_pcode)::text = '1013'::text) > -> Index Scan using pdc_pk_idx on pmdoc (cost=0.00..3.55 rows=1 > width=50) (actual time=4.482..4.484 rows=1 loops=3544) > Index Cond: (pmdoc.pdc_pk = "outer".doctofk) > Total runtime: 16033.807 ms Huh? It is not at all the same. Your index scan is down to 87ms from 27,000! And the total query is down to 16seconds from 47 seconds. Don't you consider that an improvement? -- -Josh Berkus Aglio Database Solutions San Francisco
Josh, El 07/06/2004 5:21 PM, Josh Berkus en su mensaje escribio: > > Huh? It is not at all the same. Your index scan is down to 87ms from > 27,000! And the total query is down to 16seconds from 47 seconds. Don't > you consider that an improvement? Yes there was an improvement with respect the previus query, but still 16 seconds is too slow for that query. And usually the query takes more than 10 seconds even with small data sets returned. Thanks, -- Sinceramente, Josué Maldonado. "La cultura es capaz de descifrar los enigmas en que nos envuelve la vida."
=?ISO-8859-1?Q?Josu=E9_Maldonado?= <josue@lamundial.hn> writes: > Recreated the index (drop then create) and did the vacuum full pkardex > and the behavior seems to be the same: Well, there was a pretty huge improvement in the pkardex scan time, whether you noticed it or not: 39520.406 to 87.393 msec. This definitely suggests that you've been lax about vacuuming this table. I'm wondering whether pmdoc might not be overdue for vacuuming as well. regards, tom lane
Hello Scott, El 07/06/2004 4:47 PM, Scott Marlowe en su mensaje escribio: > OK, you have to ask yourself a question here. Do I have enough memory > to let both postgresql and the kernel to cache this data, or enough > memory for only one. Then, you pick one and try it out. But there's > some issues here. PostgreSQL's shared buffer are not, and should not > generally be thought of as "cache". A cache's job it to hold the whole > working set, or as much as possible, ready for access. A buffer's job > is to hold all the data we're tossing around right this second. Once > we're done with the data, the buffers can and do just drop whatever was > in them. PostgreSQL does not have caching, in the classical sense. > that may or may not change. > > The kernel, on the other hand, has both cache and buffer. Ever notice > that a Linux top shows the cache usually being much bigger than the > buffers? My 512 Meg home box right now has 252968k for cache, and > 43276k for buffers. I noticed buffers are lower agains cache at least as top shows, dunno if I'm wrong: 8:28am up 1:00, 2 users, load average: 0.40, 0.97, 0.75 65 processes: 64 sleeping, 1 running, 0 zombie, 0 stopped CPU0 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU1 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU2 states: 0.0% user, 0.1% system, 0.0% nice, 99.4% idle CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle Mem: 2069596K av, 1882228K used, 187368K free, 0K shrd, 32924K buff Swap: 2096440K av, 0K used, 2096440K free 1757220K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 1508 root 13 0 1040 1040 856 R 0.1 0.0 0:00 top 1 root 8 0 476 476 432 S 0.0 0.0 0:04 init > Now, you're tossing around enough data to actually maybe have a use for > a huge set of buffers, but this means you'll need to starve your cache > to get enough buffers. Which means that if one process does this kind > of join, drops connection, and two seconds later, another process > connects and does nearly the same thing, it's likely to have to read it > all from the hard drives again, as it's not in the postgresql buffer, > and not in the kernel cache. > > Starting a seperate connection, doing a simple select * from table1; > sekect * from table 2, dropping the result set returned, and staying > connected seems to be enough to get 7.4 to hold onto the data. > > PostgreSQL's current buffer management algo is dirt simple. The ones in > the kernel's cache are quite good. So you can quickly reach a point > where PostgreSQL is chasing it's tail where the kernel would have done > OK. > > Your numbers show that you are tossing 659M and 314M against each other, > but I don't know if you're harvesting the whole set at once, or just a > couple row of each. Indexing help, or is this always gonna be a big seq > scan of 90% of both tables? Generally only a small set is queried, the bigest record set expected is about 24,000 rows and does not exced the 10MB size, explain analyze shows the planner is using the index as expected but performance still poor. > If you are getting the whole thing all the time, and want postgresql to > buffer the whole thing (I recommend against it, although a very few > circumstances seem to support it) you need to have 973M of buffer. That > would be 124544 or we'll just call it 130000. This high of a number > means you will be getting more than 50% of the RAM for postgreSQL. At > that point, it seems you might as well go for broke and grab most of it, > ~200000 or so. > > If you're not always mushing the two things against each other, and > you've got other datasets to interact with, index it. > > Oh, in your reply you might to include an explain analyze of the query, > and maybe an output of top while the query is running. > dbmund=# explain analyze select * from vkardex where kprocode='1013'; Nested Loop (cost=0.00..2248.19 rows=403 width=114) (actual time=846.318..16030.633 rows=3145 loops=1) -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..806.27 rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1) Index Cond: ((pkd_pcode)::text = '1013'::text) -> Index Scan using pdc_pk_idx on pmdoc (cost=0.00..3.55 rows=1 width=50) (actual time=4.482..4.484 rows=1 loops=3544) Index Cond: (pmdoc.pdc_pk = "outer".doctofk) Total runtime: 16033.807 ms (6 rows) At the time the querie was running top returned: 5:11pm up 1:28, 3 users, load average: 0.19, 0.97, 1.41 69 processes: 66 sleeping, 1 running, 2 zombie, 0 stopped CPU0 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU1 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU2 states: 0.1% user, 0.4% system, 0.0% nice, 98.4% idle CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle Mem: 2069596K av, 1477784K used, 591812K free, 0K shrd, 2336K buff Swap: 2096440K av, 9028K used, 2087412K free 1388372K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 1225 postgres 17 0 257M 257M 255M S 0.6 12.7 7:14 postmaster 1978 postgres 11 0 1044 1044 860 R 0.2 0.0 0:00 top 1 root 9 0 472 444 428 S 0.0 0.0 0:04 init 2 root 8 0 0 0 0 SW 0.0 0.0 0:00 keventd and free returned: /root: free total used free shared buffers cached Mem: 2069596 1477832 591764 0 2320 1388372 -/+ buffers/cache: 87140 1982456 Swap: 2096440 9028 2087412 Thanks, -- Sinceramente, Josué Maldonado. "El verdadero placer está en la búsqueda, más que en la explicación." -- Isaac Asimov
Hi, We often experience with the problem that reindex cannot be finished in our production database. It's typically done with 30 minutes. However, sometimes, when there is another "COPY" process, reindex will not finish. By monitoring the CPU time reindex takes, it does not increase at all. That seems a deadlock. But the following query shows only reindex process (23127)is granted lock while COPY process (3149) is not. Last time when we have this problem and kill reindex process and COPY process does not work. We had to bounce the database server. As you know, when reindex is running, nobody can access the table. Can someone kindly help? Thanks, Here is lock info from database: replace | database | transaction | pid | mode | granted -----------------------+----------+-------------+-------+---------------------+--------- email | 17613 | | 3149 | RowExclusiveLock | f email_cre_dom_idx | 17613 | | 23127 | ExclusiveLock | t email_cid_cre_idx | 17613 | | 23127 | ShareLock | t email_cid_cre_idx | 17613 | | 23127 | AccessExclusiveLock | t email | 17613 | | 23127 | ShareLock | t email | 17613 | | 23127 | AccessExclusiveLock | t email_cid_cre_dom_idx | 17613 | | 23127 | ShareLock | t email_cid_cre_dom_idx | 17613 | | 23127 | AccessExclusiveLock | t email_did_cre_idx | 17613 | | 23127 | ShareLock | t email_did_cre_idx | 17613 | | 23127 | AccessExclusiveLock | t email_cre_dom_idx | 17613 | | 23127 | AccessExclusiveLock | t (11 rows) Here are the processes of 3149 and 23127 from OS: postgres 3149 1.3 6.4 154104 134444 ? S Jun03 92:04 postgres: postgres db1 xx.xx.xx.xx COPY waiting postgres 23127 3.2 9.3 228224 194512 ? S 03:35 15:03 postgres: postgres db1 [local] REINDEX Here are queries from database: 23127 | REINDEX table email 3149 | COPY email (...) FROM stdin __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/
On Tue, 2004-06-08 at 08:36, Josué Maldonado wrote: > Hello Scott, SNIP... > > Your numbers show that you are tossing 659M and 314M against each other, > > but I don't know if you're harvesting the whole set at once, or just a > > couple row of each. Indexing help, or is this always gonna be a big seq > > scan of 90% of both tables? > > Generally only a small set is queried, the bigest record set expected is > about 24,000 rows and does not exced the 10MB size, explain analyze > shows the planner is using the index as expected but performance still poor. If that is the case, then shared_buffers should likely only be 1000 to 10000. anything over 10000 is usually a bad idea, unless you've proven it to be faster than <10000. > dbmund=# explain analyze select * from vkardex where kprocode='1013'; > Nested Loop (cost=0.00..2248.19 rows=403 width=114) (actual > time=846.318..16030.633 rows=3145 loops=1) > -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..806.27 > rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1) > Index Cond: ((pkd_pcode)::text = '1013'::text) > -> Index Scan using pdc_pk_idx on pmdoc (cost=0.00..3.55 rows=1 > width=50) (actual time=4.482..4.484 rows=1 loops=3544) > Index Cond: (pmdoc.pdc_pk = "outer".doctofk) > Total runtime: 16033.807 ms > (6 rows) Well, it looks like your predicted versus actual rows are a bit off, and in the final bit, the planner things that it is going to be merging 403 rows but is in fact merging 3145 rows. Try set enable_nestloop = off; and run the explain analyze again and see if that's faster. If so, try upping your target stats on kprocode (see "\h alter table" in psql for the syntax), rerun analyze, and try the query with set enable_nestloop = on to see if the planner makes the right choice.
Litao Wu <litaowu@yahoo.com> writes: > We often experience with the problem that reindex > cannot be finished in our production database. > It's typically done with 30 minutes. However, > sometimes, when there is another "COPY" process, > reindex will not finish. By monitoring the CPU > time reindex takes, it does not increase at all. > That seems a deadlock. There is no deadlock visible in your report: the reindex process is not waiting for a lock, according to either ps or pg_locks. You sure it's not just slow? I'd expect reindex to be largely I/O bound, so the lack of CPU activity doesn't prove much. If you think it's actually stuck waiting for something, try attaching to the REINDEX backend process with gdb to get a stack trace. That would at least give some idea what it's waiting for. regards, tom lane
Thank you, Tom! We vacuum and reindex every night and reindex typically took 30 minutes. Today, it ran since 3AM, and has not finished till 8:30AM. The email and its indexe sizes are: tablename indexname size_kb reltuples email 1292696 8.07905e+06 email email_cre_dom_idx 323112 email email_cid_cre_dom_idx 357952 email email_did_cre_idx 205712 email email_cid_cre_idx 205560 I agree with you that deadlock is unlikely from database and OS report. We have bounced the server since it is a production database and nobody can access email table because of this. I will use gdb next time. What's this right way to get info as postgres owner? gdb attach pid Thanks again! --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Litao Wu <litaowu@yahoo.com> writes: > > We often experience with the problem that reindex > > cannot be finished in our production database. > > It's typically done with 30 minutes. However, > > sometimes, when there is another "COPY" process, > > reindex will not finish. By monitoring the CPU > > time reindex takes, it does not increase at all. > > That seems a deadlock. > > There is no deadlock visible in your report: the > reindex process is not > waiting for a lock, according to either ps or > pg_locks. You sure it's > not just slow? I'd expect reindex to be largely I/O > bound, so the lack > of CPU activity doesn't prove much. > > If you think it's actually stuck waiting for > something, try attaching to > the REINDEX backend process with gdb to get a stack > trace. That would > at least give some idea what it's waiting for. > > regards, tom lane __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/
Litao Wu <litaowu@yahoo.com> writes: > I will use gdb next time. What's this right way to > get info as postgres owner? $ gdb /path/to/postgres gdb> attach PID-of-backend-process gdb> bt gdb> quit You might try this for practice on any idle backend; it shouldn't affect the state of the backend, except for freezing it while you issue the commands. If "bt" gives you just a list of numbers and no symbolic information, then it won't be much help; you'll need to rebuild the backend with debugging information so that we can make some sense of the trace. regards, tom lane
Hi Tom, Here is gdb info. This happens in our production database 3 times this week. It's totally unacceptable. I have written a drop/create script to avoid reindex. However, drop/create seems to me take more time than reindex for the whole database. Your help is greatly appreciated! Version: PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96 =========================================== su - postgres ps -auxww|grep REINDEX postgres 18903 1.2 8.1 195388 169512 ? S 04:49 1:54 postgres:postgres db1 [local] REINDEX postgres 13329 0.0 0.0 1768 620 pts/0 S 07:23 0:00 grep REINDEX gdb /bin/postgres GNU gdb Red Hat Linux (5.2-2) Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"...(no debugging symbols found)... (gdb) attach 18903 Attaching to program: /bin/postgres, process 18903 Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libreadline.so.4 Reading symbols from /lib/libtermcap.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /lib/i686/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2 0x420e8bb2 in semop () from /lib/i686/libc.so.6 (gdb) bt #0 0x420e8bb2 in semop () from /lib/i686/libc.so.6 #1 0x080ffa28 in PGSemaphoreLock () #2 0x08116432 in LWLockAcquire () #3 0x0810f572 in LockBuffer () #4 0x0807dea3 in _bt_getbuf () #5 0x080813ec in _bt_leafbuild () #6 0x080816a6 in _bt_leafbuild () #7 0x08081b8b in _bt_leafbuild () #8 0x080813cc in _bt_leafbuild () #9 0x0807e1d0 in btbuild () #10 0x081631c3 in OidFunctionCall3 () #11 0x080920a7 in index_build () #12 0x08092593 in reindex_index () #13 0x08092473 in IndexBuildHeapScan () #14 0x0809275d in reindex_relation () #15 0x080b9164 in ReindexTable () #16 0x08118ece in pg_exec_query_string () #17 0x08119fe5 in PostgresMain () #18 0x0810214c in ClosePostmasterPorts () #19 0x08101a9e in ClosePostmasterPorts () #20 0x08100ca1 in PostmasterMain () #21 0x08100862 in PostmasterMain () #22 0x080deed7 in main () #23 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6 (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: /bin/postgres, process 18903 --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Litao Wu <litaowu@yahoo.com> writes: > > I will use gdb next time. What's this right way to > > get info as postgres owner? > > $ gdb /path/to/postgres > gdb> attach PID-of-backend-process > gdb> bt > gdb> quit > > You might try this for practice on any idle backend; > it shouldn't affect > the state of the backend, except for freezing it > while you issue the > commands. > > If "bt" gives you just a list of numbers and no > symbolic information, > then it won't be much help; you'll need to rebuild > the backend with > debugging information so that we can make some sense > of the trace. > > regards, tom lane __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/
Litao Wu <litaowu@yahoo.com> writes: > (gdb) bt > #0 0x420e8bb2 in semop () from /lib/i686/libc.so.6 > #1 0x080ffa28 in PGSemaphoreLock () > #2 0x08116432 in LWLockAcquire () > #3 0x0810f572 in LockBuffer () > #4 0x0807dea3 in _bt_getbuf () > #5 0x080813ec in _bt_leafbuild () > #6 0x080816a6 in _bt_leafbuild () > #7 0x08081b8b in _bt_leafbuild () > #8 0x080813cc in _bt_leafbuild () > #9 0x0807e1d0 in btbuild () > #10 0x081631c3 in OidFunctionCall3 () > #11 0x080920a7 in index_build () > #12 0x08092593 in reindex_index () Hmm. I don't think I believe this backtrace. It's obviously wrong at lines 5-7 - _bt_leafbuild doesn't call itself nor call _bt_getbuf. It's possible that you don't have any local symbols in this executable and what we're seeing is the nearest global symbol, so let's ignore that; but if we take lines 0-4 at face value, what it says is that the REINDEX is stuck waiting for buffer lock on a buffer for a new empty page it has just added to the new index. This is flatly impossible. There is no other process that could possibly be interested in that buffer, or for that matter even be able to name it (since the new index has a new relfilenode value that isn't even visible to any other process yet). I thought for a little bit that a background CHECKPOINT might be trying to write out the new buffer, but that theory holds no water either, because at this point in the _bt_getbuf sequence, the buffer is not marked dirty (I just verified this by stepping through it in 7.4.2). I can think of lots of reasons why the REINDEX might block at the previous step of the sequence, namely acquiring a fresh buffer ... but once it's got the buffer there is surely no reason to block. What I'm inclined to think is that the backtrace isn't right at all. Would it be possible for you to install a backend built with --enable-debug and get a more reliable backtrace? regards, tom lane
We have another production database, which is similar with this one. It has never had REINDEX block problem yet. One difference between these two databases is the one having REINDEX problem is using NTFS file system. Is it possible the root of problem? Thanks, --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Litao Wu <litaowu@yahoo.com> writes: > > (gdb) bt > > #0 0x420e8bb2 in semop () from > /lib/i686/libc.so.6 > > #1 0x080ffa28 in PGSemaphoreLock () > > #2 0x08116432 in LWLockAcquire () > > #3 0x0810f572 in LockBuffer () > > #4 0x0807dea3 in _bt_getbuf () > > #5 0x080813ec in _bt_leafbuild () > > #6 0x080816a6 in _bt_leafbuild () > > #7 0x08081b8b in _bt_leafbuild () > > #8 0x080813cc in _bt_leafbuild () > > #9 0x0807e1d0 in btbuild () > > #10 0x081631c3 in OidFunctionCall3 () > > #11 0x080920a7 in index_build () > > #12 0x08092593 in reindex_index () > > Hmm. I don't think I believe this backtrace. It's > obviously wrong at > lines 5-7 - _bt_leafbuild doesn't call itself nor > call _bt_getbuf. > It's possible that you don't have any local symbols > in this executable > and what we're seeing is the nearest global symbol, > so let's ignore > that; but if we take lines 0-4 at face value, what > it says is that the > REINDEX is stuck waiting for buffer lock on a buffer > for a new empty > page it has just added to the new index. This is > flatly impossible. > There is no other process that could possibly be > interested in that > buffer, or for that matter even be able to name it > (since the new index > has a new relfilenode value that isn't even visible > to any other process > yet). I thought for a little bit that a background > CHECKPOINT might be > trying to write out the new buffer, but that theory > holds no water > either, because at this point in the _bt_getbuf > sequence, the buffer is > not marked dirty (I just verified this by stepping > through it in 7.4.2). > > I can think of lots of reasons why the REINDEX might > block at the > previous step of the sequence, namely acquiring a > fresh buffer ... but > once it's got the buffer there is surely no reason > to block. > > What I'm inclined to think is that the backtrace > isn't right at all. > Would it be possible for you to install a backend > built with > --enable-debug and get a more reliable backtrace? > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/
Litao Wu <litaowu@yahoo.com> writes: > One difference between these two databases > is the one having REINDEX problem is using > NTFS file system. Oh? That's interesting. > Is it possible the root of problem? I would not expect it to show this particular symptom --- if the backtrace is accurate. But there are nearby places that might have FS-dependent behavior. Can you do anything about my request for a stack trace from a debug-enabled build? regards, tom lane
Hi, I have changed "reindex table my_table" to: psql ... -c "drop index my_index; create index my_index;" We still experience the same "hang" problem. I was told that this time, the process is "create index my_index;" before the PG server is bounced. When I login the database, I found the my_index is still there. I do not know what caused this happen, and I am also confused. If create index my_index is killed by "-9", then my_index should not present in the database because it has been dropped before creating. On the other hand, if "drop index my_index;" is killed, then how drop index (which is DDL, right?) can be blocked? There must be other process(es) has/have execlusive lock on my_index, which is not our case from pg_locks. Tom, we are in the process of installing the backend with --enable-debug. Thanks, --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Litao Wu <litaowu@yahoo.com> writes: > > One difference between these two databases > > is the one having REINDEX problem is using > > NTFS file system. > > Oh? That's interesting. > > > Is it possible the root of problem? > > I would not expect it to show this particular > symptom --- if the > backtrace is accurate. But there are nearby places > that might have > FS-dependent behavior. Can you do anything about my > request for > a stack trace from a debug-enabled build? > > regards, tom lane __________________________________ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail
Litao Wu <litaowu@yahoo.com> writes: > I have changed "reindex table my_table" to: > psql ... > -c "drop index my_index; create index my_index;" > I do not know what caused this happen, and I > am also confused. If create index my_index is killed > by "-9", then my_index should not present in the > database because it has been dropped before creating. I believe that the above executes the two commands in a single transaction. So if you kill it midway through the CREATE, everything rolls back and the index is still there. regards, tom lane
Hi All, It happened again. This time it hangs when we drop/create index. Here is gdb info with --enable-debug postgres. Thank you for your help! postgres 24533 24327 2 Jun28 ? 00:39:11 postgres: postgres xxx xxx.xxx.x.xxx COPY waiting postgres 23508 24327 0 03:23 ? 00:00:00 postgres: postgres xxx xxx.xxx.x.xx SELECT waiting root 23662 22727 0 03:24 ? 00:00:00 /xxx/bin/psql -t -A -q xxx -U postgres -c set sort_mem=131072; DROP INDEX xxx_mod_ac_did_cre_idx; CREATE INDEX xxx_mod_ac_did_cre_idx ON xxx_module_action USING btree (domain_id, created); postgres 23663 24327 2 03:24 ? 00:04:40 postgres: postgres xxx [local] CREATE INDEX postgres 24252 24327 0 03:26 ? 00:00:00 postgres: postgres xxx xxx.xxx.x.xx SELECT waiting bash-2.05a$ gdb /xxx/bin/postgres GNU gdb Red Hat Linux (5.2-2) Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... (gdb) attach 23663 Attaching to program: /xxx/bin.Linux/postgres, process 23663 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4...done. Loaded symbols for /usr/lib/libreadline.so.4 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 0x420e8bb2 in semop () from /lib/i686/libc.so.6 (gdb) bt #0 0x420e8bb2 in semop () from /lib/i686/libc.so.6 #1 0x080ff954 in PGSemaphoreLock (sema=0x4a2d83e8, interruptOK=0 '\0') at pg_sema.c:434 #2 0x0811635e in LWLockAcquire (lockid=21335, mode=LW_EXCLUSIVE) at lwlock.c:312 #3 0x0810f49e in LockBuffer (buffer=10657, mode=2) at bufmgr.c:1848 #4 0x0807dea3 in _bt_getbuf (rel=0x40141b10, blkno=4294967295, access=2) at nbtpage.c:337 #5 0x080813d8 in _bt_blnewpage (index=0x40141b10, buf=0xbfffe724, page=0xbfffe728, flags=1) at nbtsort.c:188 #6 0x08081692 in _bt_buildadd (index=0x40141b10, state=0x4e0b3e30, bti=0x4fe20cb8) at nbtsort.c:373 #7 0x08081b77 in _bt_load (index=0x40141b10, btspool=0x82bf7b8, btspool2=0x0) at nbtsort.c:638 #8 0x080813b8 in _bt_leafbuild (btspool=0x82bf7b8, btspool2=0x0) at nbtsort.c:171 #9 0x0807e1d0 in btbuild (fcinfo=0xbfffe820) at nbtree.c:165 #10 0x081630d7 in OidFunctionCall3 (functionId=338, arg1=1075019120, arg2=1075059472, arg3=137095072) at fmgr.c:1275 #11 0x08092093 in index_build (heapRelation=0x40137d70, indexRelation=0x40141b10, indexInfo=0x82be7a0) at index.c:1447 #12 0x080913d7 in index_create (heapRelationId=17618, indexRelationName=0x82b9648 "xxx_mod_ac_did_cre_idx", indexInfo=0x82be7a0, accessMethodObjectId=403, classObjectId=0x82be578, primary=0 '\0', isconstraint=0 '\0', allow_system_table_mods=0 '\0') at index.c:765 #13 0x080b88ae in DefineIndex (heapRelation=0x82b9698, indexRelationName=0x82b9648 "xxx_mod_ac_did_cre_idx", accessMethodName=0x82b96c0 "btree", attributeList=0x82b9718, unique=0 '\0', primary=0 '\0', isconstraint=0 '\0', predicate=0x0, rangetable=0x0) at indexcmds.c:211 #14 0x0811b250 in ProcessUtility (parsetree=0x82b9788, dest=Remote, completionTag=0xbfffea80 "") at utility.c:620 #15 0x08118df6 in pg_exec_query_string (query_string=0x82b91e0, dest=Remote, parse_context=0x82ade58) at postgres.c:789 #16 0x08119f0d in PostgresMain (argc=4, argv=0xbfffecb0, username=0x8240679 "postgres") at postgres.c:2013 #17 0x08102078 in DoBackend (port=0x8240548) at postmaster.c:2302 #18 0x081019ca in BackendStartup (port=0x8240548) at postmaster.c:1924 #19 0x08100bcd in ServerLoop () at postmaster.c:1009 #20 0x0810078e in PostmasterMain (argc=1, argv=0x8227468) at postmaster.c:788 #21 0x080dee2b in main (argc=1, argv=0xbffff644) at main.c:210 #22 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6 (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: /xxx/bin.Linux/postgres, process 23663 --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Litao Wu <litaowu@yahoo.com> writes: > > One difference between these two databases > > is the one having REINDEX problem is using > > NTFS file system. > > Oh? That's interesting. > > > Is it possible the root of problem? > > I would not expect it to show this particular > symptom --- if the > backtrace is accurate. But there are nearby places > that might have > FS-dependent behavior. Can you do anything about my > request for > a stack trace from a debug-enabled build? > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org __________________________________ Do you Yahoo!? Yahoo! Mail Address AutoComplete - You start. We finish. http://promotions.yahoo.com/new_mail
Litao Wu <litaowu@yahoo.com> writes: > It happened again. > This time it hangs when we drop/create index. > Here is gdb info with --enable-debug postgres. Well, that pretty much removes all doubt: something has left the buffer context lock (cntx_lock) set on a buffer that certainly ought to be free. The problem here is that REINDEX (or CREATE INDEX in this case) is the victim, not the perpetrator, so we still don't know exactly what's causing the error. We need to go backwards in time, so to speak, to identify the code that's leaving the buffer locked when it shouldn't. I don't offhand have a good idea about how to do that. Is there another process that is also getting stuck when REINDEX does (if so please get a backtrace from it too)? BTW, what Postgres version are you using again? The line numbers in your trace don't square with any current version of bufmgr.c ... regards, tom lane
Hi Tom, Our PG version is 7.3.2. The copy process is always there. Besides copy process, there are many select processes wait also (it is understandable only when reindex, but how come selects wait when drop/create index? From Postgres doc: Note: Another approach to dealing with a corrupted user-table index is just to drop and recreate it. This may in fact be preferable if you would like to maintain some semblance of normal operation on the table meanwhile. REINDEX acquires exclusive lock on the table, while CREATE INDEX only locks out writes not reads of the table. ) Each time, whan this happened, it might hang on the different index. But one thing is sure: reindex or create index is granted lock while others wait. If reindex/create index is not the perpetrator, how can PG grants it lock but not others, like COPY? Forgive me I had not provided the full table and index names, IP address, etc. for security reason. Here is the copy of my the first post on June 8: Hi, We often experience with the problem that reindex cannot be finished in our production database. It's typically done with 30 minutes. However, sometimes, when there is another "COPY" process, reindex will not finish. By monitoring the CPU time reindex takes, it does not increase at all. That seems a deadlock. But the following query shows only reindex process (23127)is granted lock while COPY process (3149) is not. Last time when we have this problem and kill reindex process and COPY process does not work. We had to bounce the database server. As you know, when reindex is running, nobody can access the table. Can someone kindly help? Thanks, Here is lock info from database: replace | database | transaction | pid | mode | granted -----------------------+----------+-------------+-------+---------------------+--------- email | 17613 | | 3149 | RowExclusiveLock | f email_cre_dom_idx | 17613 | | 23127 | ExclusiveLock | t email_cid_cre_idx | 17613 | | 23127 | ShareLock | t email_cid_cre_idx | 17613 | | 23127 | AccessExclusiveLock | t email | 17613 | | 23127 | ShareLock | t email | 17613 | | 23127 | AccessExclusiveLock | t email_cid_cre_dom_idx | 17613 | | 23127 | ShareLock | t email_cid_cre_dom_idx | 17613 | | 23127 | AccessExclusiveLock | t email_did_cre_idx | 17613 | | 23127 | ShareLock | t email_did_cre_idx | 17613 | | 23127 | AccessExclusiveLock | t email_cre_dom_idx | 17613 | | 23127 | AccessExclusiveLock | t (11 rows) Here are the processes of 3149 and 23127 from OS: postgres 3149 1.3 6.4 154104 134444 ? S Jun03 92:04 postgres: postgres db1 xx.xx.xx.xx COPY waiting postgres 23127 3.2 9.3 228224 194512 ? S 03:35 15:03 postgres: postgres db1 [local] REINDEX Here are queries from database: 23127 | REINDEX table email 3149 | COPY email (...) FROM stdin --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Litao Wu <litaowu@yahoo.com> writes: > > It happened again. > > This time it hangs when we drop/create index. > > Here is gdb info with --enable-debug postgres. > > Well, that pretty much removes all doubt: something > has left the buffer > context lock (cntx_lock) set on a buffer that > certainly ought to be free. > > The problem here is that REINDEX (or CREATE INDEX in > this case) is the > victim, not the perpetrator, so we still don't know > exactly what's > causing the error. We need to go backwards in time, > so to speak, to > identify the code that's leaving the buffer locked > when it shouldn't. > I don't offhand have a good idea about how to do > that. Is there another > process that is also getting stuck when REINDEX does > (if so please get > a backtrace from it too)? > > BTW, what Postgres version are you using again? The > line numbers in > your trace don't square with any current version of > bufmgr.c ... > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose > an index scan if your > joining column's datatypes do not match > __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail
Litao Wu <litaowu@yahoo.com> writes: > Our PG version is 7.3.2. Hmm. On general principles you should be using 7.3.6, but I do not see anything in the 7.3.* change logs that looks very likely to cure this. > The copy process is always there. Besides copy > process, there are many select processes wait also > (it is understandable only when reindex, > but how come selects wait when drop/create index? DROP INDEX would lock out selects (it has no other way to be sure no select is trying to *use* the index). Once you're past that, selects would work, but if you try something like begin; drop index; create index; commit; then the drop's lock will be held till commit. I'm not sure about whether COPY is related. In your original post, the COPY was waiting to acquire RowExclusiveLock on the table, so it hadn't actually done anything yet and really couldn't be holding a buffer lock AFAICS. > But one thing is sure: > reindex or create index is granted lock while > others wait. If reindex/create index is not > the perpetrator, how can PG grants it lock > but not others, like COPY? The point is that it's waiting for a lower-level lock (namely a buffer LWLock). There's no deadlock detection for LWLocks, because they're not supposed to be used in ways that could cause a deadlock. Assuming for the moment that indeed this is a deadlock, you could learn something the next time it happens with some manual investigation. You'll need to keep using the debug-enabled build. When you next get a lockup, proceed as follows: 1. Attach to the REINDEX or CREATE INDEX process and find out which LWLock number it is blocked on. (This is the lockid argument of LWLockAcquire, 21335 in your trace of today.) 2. For *each* live backend process (including the REINDEX itself), attach with gdb and look at the held-locks status of lwlock.c. This would go something like gdb> p num_held_lwlocks if greater than zero: gdb> x/10d held_lwlocks (replace "10" by the value of num_held_lwlocks) If you find a backend that is holding the lock number that REINDEX wants, print out its call stack with "bt", and look in pg_locks to see what lockmanager locks it is holding or waiting for. If you do not find one, then the deadlock theory is disproved, and we're back to square one. regards, tom lane
Thanks! OK, we will do this exceise next time. TSince there are multiple databases and there are 170 postgres processes this morning, 60 of them are access the problem database, and 57 of 60 are non-idle. We only need to gdb those 57 processes, or we need gdb 60 or 170? Thanks again! --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Litao Wu <litaowu@yahoo.com> writes: > > Our PG version is 7.3.2. > > Hmm. On general principles you should be using > 7.3.6, but I do not see > anything in the 7.3.* change logs that looks very > likely to cure this. > > > The copy process is always there. Besides copy > > process, there are many select processes wait also > > (it is understandable only when reindex, > > but how come selects wait when drop/create index? > > DROP INDEX would lock out selects (it has no other > way to be sure no > select is trying to *use* the index). Once you're > past that, selects > would work, but if you try something like > begin; drop index; create index; commit; > then the drop's lock will be held till commit. > > I'm not sure about whether COPY is related. In your > original post, the > COPY was waiting to acquire RowExclusiveLock on the > table, so it hadn't > actually done anything yet and really couldn't be > holding a buffer lock > AFAICS. > > > But one thing is sure: > > reindex or create index is granted lock while > > others wait. If reindex/create index is not > > the perpetrator, how can PG grants it lock > > but not others, like COPY? > > The point is that it's waiting for a lower-level > lock (namely a buffer > LWLock). There's no deadlock detection for LWLocks, > because they're not > supposed to be used in ways that could cause a > deadlock. > > Assuming for the moment that indeed this is a > deadlock, you could learn > something the next time it happens with some manual > investigation. > You'll need to keep using the debug-enabled build. > When you next get a > lockup, proceed as follows: > > 1. Attach to the REINDEX or CREATE INDEX process and > find out which > LWLock number it is blocked on. (This is the lockid > argument of > LWLockAcquire, 21335 in your trace of today.) > > 2. For *each* live backend process (including the > REINDEX itself), > attach with gdb and look at the held-locks status of > lwlock.c. > This would go something like > > gdb> p num_held_lwlocks > if greater than zero: > gdb> x/10d held_lwlocks > (replace "10" by the value of num_held_lwlocks) > > If you find a backend that is holding the lock > number that REINDEX > wants, print out its call stack with "bt", and look > in pg_locks to see > what lockmanager locks it is holding or waiting for. > If you do not find > one, then the deadlock theory is disproved, and > we're back to square > one. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 8: explain analyze is your friend > __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail
Litao Wu <litaowu@yahoo.com> writes: > Since there are multiple databases and > there are 170 postgres processes this morning, > 60 of them are access the problem database, > and 57 of 60 are non-idle. > We only need to gdb those 57 processes, or > we need gdb 60 or 170? Potentially the deadlock could be anywhere :-(. You should definitely not assume it must be one of the processes connected to the problem database, because the buffer pool is cluster-wide. Might be worth setting up a shell script to help. regards, tom lane
Hi, I have query: explain SELECT * FROM ip_tracking T, ip_map C WHERE T.source_ip::inet >>= C.net; QUERY PLAN ------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..3894833367750.16 rows=51709297065144 width=111) Join Filter: ("outer".source_ip >>= ("inner".net)::inet) -> Seq Scan on ip_tracking t (cost=0.00..825050.68 rows=31093368 width=34) -> Seq Scan on ip_map c (cost=0.00..83686.66 rows=3326066 width=77) (4 rows) ip_tracking ( pk_col int, source_ip inet, .. the rest... ) There is one index ip_tracking_ip_idx btree (source_ip) ip_map ( net cidr, ... the rest...) Indexes: map_net_idx hash (net) If I change ">>=" to "=", the query plan is: QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..10798882243.63 rows=31093368 width=111) -> Seq Scan on ip_map c (cost=0.00..83686.66 rows=3326066 width=77) -> Index Scan using ip_tracking_ip_idx on ip_tracking t (cost=0.00..3236.72 rows=800 width=34) Index Cond: (t.source_ip = ("outer".net)::inet) (4 rows) This is my first time to deal network address type. Is it possible to make a query use index with operator of ">>=" like the above? Thanks, __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail