Thread: Re: INSERT/UPDATE waiting (another example)

Re: INSERT/UPDATE waiting (another example)

From
Wayne Piekarski
Date:
> Date: Tue,  4 May 1999 11:33:34 +0200 (CEST)
> From: Dirk Lutzebaeck <lutzeb@aeccom.com>
> Subject: INSERT/UPDATE waiting
> 
> Hello,
> 
> somehow the backend is hanging on my system (6.5beta1, Linux 2.2.6):
> 
> postgres 29957  0.0  1.5  4124  2048  ?  S    18:23   0:00 /usr/local/pgsql/bin/postgres localhost lutzeb cs UPDATE
waiting
 
> postgres 29980  0.0  1.6  4124  2064  ?  S    18:25   0:00 /usr/local/pgsql/bin/postgres localhost lutzeb cs UPDATE
waiting
 
> postgres 30005  0.0  1.6  4124  2088  ?  S    18:27   0:00 /usr/local/pgsql/bin/postgres localhost lutzeb cs UPDATE
waiting
 
> postgres 30012  0.0  2.1  4532  2696  ?  S    18:28   0:00 /usr/local/pgsql/bin/postgres localhost lutzeb cs INSERT
waiting
 
> postgres 30030  0.0  3.0  5780  3916  ?  S    18:28   0:00 /usr/local/pgsql/bin/postgres localhost lutzeb cs idle

 
> 
> [about 34 processes]
> 
> What is happening here? Can't find anything in the documentation.

Hi everyone,

We just deployed a large system a few weeks ago which involves using
PostgreSQL 6.4.2 and CGI based interfaces, and we usually have around
10-20 connections always running, the system is very busy during 9-5
hours. The machine is running FreeBSD 2.2.7 and has 256 mb of RAM,
lots of file descriptors.

We are experiencing exactly the same problem as above - during the day,
all of a sudden Postgres will completely jam up, with all processing in
one of the following states: (from ps -axwwwwww)

SELECT waiting
DELETE waiting
INSERT waiting
idle waiting
idle

Sometimes the ps output will also return postgres backends with garbage
high-ascii characters in their name, got no idea why here either....

Originally, we thought it was a bad query somewhere with lock statements
in the wrong order, causing the deadlock, and so we reduced one of the
constants compiled into the backend which controls how often deadlocks
were checked for and set it to something like ten seconds.

I then forced the database to go into a real deadlock by doing:

1:   BEGIN
2:   BEGIN
1:   LOCK table a
2:   LOCK table b
1:   LOCK table b
2:   LOCK table a

Ten seconds later, one is aborted due to the deadlock checking - this is
great, as everything clears up and continues no problems.

However, during operation, every so often for no apparent reason, all the
backends doing work all jam up with "waiting" in their name, otherwise
they are idle - and nothing happens. I let it sit for about 5 minutes one
time, and nothing happened. I had to kill everything so the staff could
actually use the system.

For a while, we went hunting through our code looking for improperly
ordered lock statements and things like that. We found one which was fixed
but the problem still happens - there may be more of these in the code,
there probably are in fact, but I'm under the impression the deadlock
checking is supposed to get around that and kill backends nicely to
resolve the conflict?

Also, one other thing we discovered was making the following deadlock:

S1: BEGIN
S1: LOCK table a
S2: SELECT * FROM a, b where a.id = b.id
S1: LOCK table b

If we did explain on the SELECT, and it chose to scan A first, it would work
but if we used an index or rearranged the select statement, and B was scanned first
we would get a deadlock, since the select couldn't complete. Now I would have
hoped that locks for the join would be acquired either all at once, or none
at all. I don't want to have to wrap lock statements around the select because
all I want to do is read some data, I don't want to make updates!




Anyways, back to the real problem, I enabled a constant in the code and
whenever the deadlock checking occurs during this lockup problem, and it
prints the following out to the log file: 

DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL

Not sure what this means - I'd really like it to show what kind of locks
the postgres processes are waiting for, as it could help me resolve the
problem. Is there anything else I can compile in to do this? I notice
there are quite a few symbols you can define for locks.


So right now I am at a loss on what to do - whenever everything jams up, I
do a ps -axwwww and kill off all postgres processes which are idle
waiting, then SELECT waiting, and sometimes it clears up. Failing that
I then kill the ones making changes like delete, insert, update - although
I hate doing that because I found that when I went around ruthlessly
killing backends to resolve conflicts, tables got corrupted and we
experienced problems like BTP_CHAIN, and vacuums would start failing,
requiring us to dump the tables and reload, which was pretty bad :(

I've had other problems as well, but I'll save them for another email.

So if anyone can offer any advice, I'd be eternally grateful! Right now
I'm getting lots of flak from people, saying I should have used Oracle, or
mSQL, etc, but Postgres works great and allows me to embed code into the
backend, and all kinds of other cool features. I really am looking forward
to 6.5 as the MVCC stuff sounds great, but right now I need to get this
working reliably until then.

thanks,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au



Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)

From
Tom Lane
Date:
Wayne Piekarski <wayne@senet.com.au> writes:
> We are experiencing exactly the same problem as above - during the day,
> all of a sudden Postgres will completely jam up, with all processing in
> one of the following states: (from ps -axwwwwww)

It seems possible that the hashtable bugs I fixed a couple months ago
are rising up to bite you.  (Basically, the shared hashtables that
contain things like locks and buffers would go nuts if there got to be
more than 256 entries ... and it sure sounds like your installation is
big enough that it could have, eg, more than 256 active locks when
under load.)  One quick thing you might try to test this is to reduce
the postmaster's -B setting to less than 256 (if you have it set that
high) and see if stability improves.

These bugs are fixed in 6.5-beta1, but it has enough other bugs that
I don't think Wayne would be wise to try moving to 6.5 just yet.
I have a patch for 6.4.2 that I believe also fixes the problems, but
it hasn't gotten quite as much testing as I would like so I haven't
committed it into the REL6_4 tree.  (There's not going to be a 6.4.3
release, according to current plans, so it's hardly worth doing anyway.)

What I will do is send the patch to Wayne in a separate message, and
also cc: it to the PATCHES list --- anyone else who needs it can get it
from there.  Please let us know if this helps, Wayne.
        regards, tom lane


Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)

From
Wayne Piekarski
Date:
Tom Lane wrote:
> Wayne Piekarski <wayne@senet.com.au> writes:
> > We are experiencing exactly the same problem as above - during the day,
> > all of a sudden Postgres will completely jam up, with all processing in
> > one of the following states: (from ps -axwwwwww)
> 
> It seems possible that the hashtable bugs I fixed a couple months ago
> are rising up to bite you.  (Basically, the shared hashtables that
> contain things like locks and buffers would go nuts if there got to be
> more than 256 entries ... and it sure sounds like your installation is
> big enough that it could have, eg, more than 256 active locks when
> under load.)  One quick thing you might try to test this is to reduce
> the postmaster's -B setting to less than 256 (if you have it set that
> high) and see if stability improves.

Currently, I start up postmaster with -B 192, which I guess puts it below
the value of 256 which causes problems. Apart from when I got past 256
buffers, does the patch fix anything else that might be causing problems?

Just for everyones information, the system contains about 80 tables and
129 indexes. There is about 700 mb of data sprayed over all the tables,
although some have more rows than others. At any one time during the day,
we have about 8 to 10 active postgres connections, half of them are
connected to daemons which continuously sent updates and inserts into the
system, the rest of them are very quick queries from CGI programs. The
problems we experience are always during the day, when the CGI programs
are hammering the database - we don't ever have a problem at night when
the staff go home. 

The whole thing runs 24 hours a day, 7 days a week. Most of the tables
rarely get vacuumed (they have tens of thousands of rows and only inserts
get done to them - the optimiser makes good choices for most of these) -
however we have 5 tables which get vacuum at midnight each day, we drop
all the indexes, vacuum, then recreate. If we don't do the index thing,
the vacuum can take tens of minutes, which is not acceptable - the tables
contain about 20000 rows, each of which gets updated about 3 times during 
the day. I sent an email a while back about vacuum performance, and this
hack is the only way around it.

If any other programs try to query the four tables getting vacuumed then I
get into real trouble. I wish I could do soemthing like:

BEGIN;
LOCK TABLE x;
DROP INDEX x_idx;
VACUUM ANALYZE x;
CREATE INDEX x_idx;
END;

I've seen a #define which looked like it enabled this kind of thing, but
I'm not sure if it is safe to use.


> What I will do is send the patch to Wayne in a separate message, and
> also cc: it to the PATCHES list --- anyone else who needs it can get it
> from there.  Please let us know if this helps, Wayne.

During the week when I get a chance I will trial the patch and see if it
has any affect on the problems we are having. It is very wierd and
impossible to reproduce on demand as it is related to the number of
queries and the load of the machine at the time.

Hopefully I will have some results for this by the end of the week.



While I'm asking some questions here, I should tell you about some of the
other wierd things I've encountered, many of them are related to shared
memory and hash tables, which is making me think more and more that all
the problems I am having are somehow related.

For large tables, when I perform joins, I repeatedly get hash table out of
memory errors. So I have two tables, one called unix, with 20000 rows, and
another called services, with 80000 rows - I am producing a result which
contains about 20000 rows in it as well, so there is lots of data moving
around.

In most cases, the problem occurs when the optimiser mistakenly choses to
use seq scan rather than index scan. To get around these problems, we
initially tried increasing the -B value to larger values (This was a long
time ago but we had problems, it may have been more than 256 which fits in
with what Tom Lane said). Every time we kept increasing the number of
buffers but it got to the point where I was annoyed that the optimiser was
making bad decisions, and I was at a loss on what to do. So I then
discovered the COST_INDEX and COST_HEAP variables, which I set to:

set COST_INDEX = '0'; set COST_HEAP = '99999999';

The optimiser then used index scan for almost anything where possible, the
explain output looked really expensive, but the queries actually executed
properly even with small -B values. So this is what I do to make these big
queries work. There are a few cases where the above set statements
actually cause hash table out of memory as well, so you set them back to
the defaults and then it usually works ok :)

I know the above is a hack but I needed to get out of a jam and that was
the only way I could think of doing it. Are there any other join methods
besides hash join? I thought that lets say I have two tables, A and B,
both with a column called ID which is indexed, and i do a join on A.id and
B.id it can use a more efficient means of joining using indexes rather
than reading both tables into memory and join there?

Here are some explain statements for a big join:


reactor=> explain select unix.username from unix where unix.snum =
services.snum
NOTICE:  QUERY PLAN:

Hash Join  (cost=6191.62 size=43361 width=20) ->  Seq Scan on services  (cost=2204.91 size=43361 width=4) ->  Hash
(cost=0.00size=0 width=0)       ->  Seq Scan on unix  (cost=1212.26 size=20311 width=16)
 



reactor=> set COST_INDEX = '0';
SET VARIABLE
reactor=> set COST_HEAP = '999999999';
SET VARIABLE
reactor=> explain select unix.username from unix where unix.snum =
services.snum;
NOTICE:  QUERY PLAN:

Hash Join  (cost=30000000.00 size=43361 width=20) ->  Index Scan using unix_snum_inv_index on unix
(cost=20311001006080.00 size=20311 width=16) ->  Hash  (cost=0.00 size=0 width=0)       ->  Index Scan using
services_snum_inv_indexon services
 
(cost=43360999964672.00 size=43361 width=4)              

I would assume that the above one which uses indexes would be a lot
better, but why did the optimiser chose the seq scan - do the indexes help
when doing joins and at the same time all rows are being returned back? I
understand that the optimiser will choose not to use indexes if it feels
that it will return most of the rows anyway and so a seq scan is better.


------


One other problem related to the shared memory buffers is every so often,
the postmaster will die with shared memory errors, and device full. This
happens very rarely (once every one to two weeks) but it happens, and I
figured that it might be related to the number of buffers I've started up
with. Note that this problem is not varied by changing the -B value, so I
don't think its my FreeBSD setup.




So I hope someone finds the above useful, I've been reading the mailing
lists a lot and I've heard about developers discovering bugs in locking,
indexes, and vacuum in 6.5, but I wasn't sure if they were applicable to
6.4.2 as well, so I figured I should tell someone just in case.


Sorry about the length of this email, but I had a lot of things to cover.  
Thanks for your help everyone, I look forward to hearing from you ...

Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au


Re: [SQL] Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)

From
Bruce Momjian
Date:
> The whole thing runs 24 hours a day, 7 days a week. Most of the tables
> rarely get vacuumed (they have tens of thousands of rows and only inserts
> get done to them - the optimiser makes good choices for most of these) -
> however we have 5 tables which get vacuum at midnight each day, we drop
> all the indexes, vacuum, then recreate. If we don't do the index thing,
> the vacuum can take tens of minutes, which is not acceptable - the tables
> contain about 20000 rows, each of which gets updated about 3 times during 
> the day. I sent an email a while back about vacuum performance, and this
> hack is the only way around it.

6.5 beta speeds up vacuuming with existing indexes, thanks to Vadim.
Also, accessing during vacuuming may be better too.

> While I'm asking some questions here, I should tell you about some of the
> other wierd things I've encountered, many of them are related to shared
> memory and hash tables, which is making me think more and more that all
> the problems I am having are somehow related.

6.5 beta has some _major_ hash fixes.  We always knew there were hash
problems, but now Tom has fixed many of them.

> I would assume that the above one which uses indexes would be a lot
> better, but why did the optimiser chose the seq scan - do the indexes help
> when doing joins and at the same time all rows are being returned back? I
> understand that the optimiser will choose not to use indexes if it feels
> that it will return most of the rows anyway and so a seq scan is better.

6.5 beta also has a faster and smarter optimizer.

It may be wise for you to test 6.5beta to see how many problems we fix.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)

From
Tom Lane
Date:
Wayne Piekarski <wayne@senet.com.au> writes:
> Currently, I start up postmaster with -B 192, which I guess puts it below
> the value of 256 which causes problems. Apart from when I got past 256
> buffers, does the patch fix anything else that might be causing problems?

Yes: if you have more than 256 active locks then you also will have
problems with the unpatched code.  I don't know how to relate that to
any easily-measured external information, but since you have a bunch of
concurrently running backends it seems possible that you are running
into locktable problems.  Anyway I do urge you to apply the patch and
see if things get better.

> however we have 5 tables which get vacuum at midnight each day, we drop
> all the indexes, vacuum, then recreate. If we don't do the index thing,
> the vacuum can take tens of minutes, which is not acceptable

Yup, it takes an unreasonable amount of time to vacuum an index for a
table in which a lot of rows have been deleted.  The drop/recreate hack
is a good workaround for now.  (Has anyone looked into why vacuum is so
slow in this case?)

> For large tables, when I perform joins, I repeatedly get hash table out of
> memory errors. So I have two tables, one called unix, with 20000 rows, and
> another called services, with 80000 rows - I am producing a result which
> contains about 20000 rows in it as well, so there is lots of data moving
> around.

Yes, the hashtable code needs work.  As a short-term workaround, you
might try disabling hashjoins entirely, which you can do by passing
the debug option "-fh" ("forbid hash").  For example,setenv PGOPTIONS "-fh"psql dbase
The optimizer will then usually choose mergejoins, which are reasonable
in performance if you have indexes on the columns being joined by.
(There ought to be a SET variable that controls this, but there isn't.)

> In most cases, the problem occurs when the optimiser mistakenly choses to
> use seq scan rather than index scan. To get around these problems, we
> initially tried increasing the -B value to larger values 

-B doesn't have any direct influence on the optimizer's choices, AFAIR.

> set COST_INDEX = '0'; set COST_HEAP = '99999999';

> The optimiser then used index scan for almost anything where possible, the
> explain output looked really expensive, but the queries actually executed
> properly even with small -B values. So this is what I do to make these big
> queries work. There are a few cases where the above set statements
> actually cause hash table out of memory as well, so you set them back to
> the defaults and then it usually works ok :)

Again, that doesn't directly prevent the optimizer from using hash
joins, it just skews the cost estimates so that index scans will be used
in preference to sequential scans, whereupon you get silly plans like
the one you quoted:

> Hash Join  (cost=30000000.00 size=43361 width=20)
>   ->  Index Scan using unix_snum_inv_index on unix
> (cost=20311001006080.00 size=20311 width=16)
>   ->  Hash  (cost=0.00 size=0 width=0)
>         ->  Index Scan using services_snum_inv_index on services
> (cost=43360999964672.00 size=43361 width=4)

This is silly because hash join doesn't care whether its inputs are
sorted or not --- the extra cost of scanning the index is just being
wasted here.  (Unless you have WHERE conditions that can be combined
with the index to allow not scanning the whole table.  From the
size estimates it looks like that might be happening for services,
but not for the unix table, so the index scan on unix is definitely
a waste of time.)

Using an index scan *is* a good idea for merge join, on the other hand,
because merge join requires sorted inputs.  (So if you don't have a
suitable index, the plan will have to include an explicit sort step
before the join.)

> One other problem related to the shared memory buffers is every so often,
> the postmaster will die with shared memory errors, and device full.

Er, could you quote the exact error reports?  This is too vague to allow
any conclusions.

> Quick question: when you do a query with a join why does the hash table
> code need to use shared memory? Can't it do the join within its own memory
> space?

It doesn't use shared memory.  It's just that for historical reasons,
the amount of private memory allocated for a hashjoin table is the same
as the amount of shared memory allocated for buffers (ie, the -B
switch).  I've been thinking of changing it to be driven by the -S
switch instead, since that seems to make more sense.
        regards, tom lane


Re: [SQL] Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)

From
Wayne Piekarski
Date:
Hi,

> 6.5 beta speeds up vacuuming with existing indexes, thanks to Vadim.
> Also, accessing during vacuuming may be better too.

That is good news :) When I first heard about MVCC I remember someone
suggested it would be possible to still do SELECT on tables being
vacuumed, is this right or not in the current 6.5?

When we were developing the system we spent a lot of time working out
ways of getting around vacuum, and I've learned a lot from it. I am going
to try it out on a full dump of our current database and test some
examples to see what kind of improvement there is. 

> 6.5 beta also has a faster and smarter optimizer.
> 
> It may be wise for you to test 6.5beta to see how many problems we fix.

This week I intend to test out the patches I've received, and hopefully
they will fix up my big problems (the one with the backend locking up)
then I will grab the latest 6.5 and try that out with some test data to
see what happens.

Unfortunately, I can't test 6.5 like I would the real thing because many
of my problems only occur when everyone is busy firing off queries and the
box is running an unusually high load and things start waiting on locks.
I'll see what I can do here although the only true way is to go live with
it - but I'm not ready for that yet :)

I should be able to check the optimiser improvements though, I've got a
lot of code which does the SET COST_HEAP/COST_INDEX hack to make things
work :)

thanks,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au


Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)

From
Wayne Piekarski
Date:
Hi everyone!


Tom Lane Writes:

> Wayne Piekarski <wayne@senet.com.au> writes:
> > Currently, I start up postmaster with -B 192, which I guess puts it below
> > the value of 256 which causes problems. Apart from when I got past 256
> > buffers, does the patch fix anything else that might be causing problems?
> 
> Yes: if you have more than 256 active locks then you also will have
> problems with the unpatched code.  I don't know how to relate that to
> any easily-measured external information, but since you have a bunch of
> concurrently running backends it seems possible that you are running
> into locktable problems.  Anyway I do urge you to apply the patch and
> see if things get better.

Ok, that is cool. I will try the patch out in the next few days and get
back with my results. I'll try to stress the machine by bogging it down a
bit with I/O, to see if I can cause the problem to happen on demand, so I
should hopefully be able to give you a definite answer on if it is the
solution or not.

> > however we have 5 tables which get vacuum at midnight each day, we drop
> > all the indexes, vacuum, then recreate. If we don't do the index thing,
> > the vacuum can take tens of minutes, which is not acceptable
> 
> Yup, it takes an unreasonable amount of time to vacuum an index for a
> table in which a lot of rows have been deleted.  The drop/recreate hack
> is a good workaround for now.  (Has anyone looked into why vacuum is so
> slow in this case?)

I don't understand the source code, but I've done some experiments and
found out the following (with 6.4.2 - no comment about 6.5):

I have this table where each row gets updated about 2-3 times. So my
picture of the disk file is this massive file with 80000 rows, of which
only 20000 of them are the real ones, the rest of them being holes that
can't get used. This takes *ages* to vacuum (with indexes that is) - BTW,
my impression of ages is a couple of minutes, so this isn't taking an hour
to run. However, I do have 5 tables to vacuum, so it ends up taking near
10 minutes! 

The performance of vacuum was a real worry as sometimes things were
taking ages, so I added some debugging code to
src/backend/commands/vacuum.c to do the following: 

* I had problems with the elapsed times from getrusage() not truly
indicating how long the vacuum took so I added calls to time(NULL) to
produce stats for real life seconds.

* I wanted to know where it was spending most of its time, so I added elog
statements each time I entered or left the scan, index, etc functions in
vacuum.c as well. I used this to work out where I was losing all my time.

Here is the output:

DEBUG:  ===> VACUUM non-verbose analyze test_table STARTS <===
DEBUG:  Table test_table - Vacuum analyze begins
DEBUG:  Table test_table - Initial stats complete at 0 secs
DEBUG:  Table test_table - vc_scanheap starts
DEBUG:  Table test_table - vc_scanheap ends - Pages 791: Changed 82, Reapped 787, Empty 0, New 0; Tup 20111: Vac 64425,
Crash0,
 
UnUsed 0, MinLen 72, Max
Len 72; Re-using: Free/Avail. Space 4679524/4679524; EndEmpty/Avail. Pages 0/787. Elapsed 0/0 sec. Real 1 sec.
DEBUG:  Table test_table - Heap scan complete at 1 secs
DEBUG:  Index test_table_seid_index: vc_vaconeind begins
DEBUG:  Index test_table_seid_index: vc_vaconeind ends - Pages 252; Tuples 20111: Deleted 64425. Elapsed 0/26 sec. Real
37sec.
 
DEBUG:  Index test_table_id_index: vc_vaconeind begins
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  Index test_table_id_index: vc_vaconeind ends - Pages 252; Tuples 20111: Deleted 64425. Elapsed 0/27 sec. Real
31sec.
 
DEBUG:  Index test_table_bid_seid_index: vc_vaconeind begins
DEBUG:  Index test_table_bid_seid_index: vc_vaconeind ends - Pages 332; Tuples 20111: Deleted 64425. Elapsed 0/2 sec.
Real5 sec.
 
DEBUG:  Table test_table - Index clean/scan complete at 74 secs
DEBUG:  Rel test_table: Pages: 791 --> 188; Tuple(s) moved: 19655. Elapsed 2/5 sec. Real 36 sec.
DEBUG:  Index test_table_seid_index: vc_vaconeind begins
DEBUG:  Index test_table_seid_index: vc_vaconeind ends - Pages 253; Tuples 20111: Deleted 19655. Elapsed 0/1 sec. Real
1sec.
 
DEBUG:  Index test_table_id_index: vc_vaconeind begins
DEBUG:  Index test_table_id_index: vc_vaconeind ends - Pages 252; Tuples 20111: Deleted 19655. Elapsed 0/0 sec. Real 1
sec.
DEBUG:  Index test_table_bid_seid_index: vc_vaconeind begins
DEBUG:  Index test_table_bid_seid_index: vc_vaconeind ends - Pages 332; Tuples 20111: Deleted 19655. Elapsed 0/1 sec.
Real1 sec.
 
DEBUG:  Table test_table - Processing complete at 114 secs
DEBUG:  ===> VACUUM non-verbose analyze test_table COMPLETE (115 sec) <===

So vc_vaconeind is taking a while to run, but the wierd part is when
processing test_table_bid_seid_index, it took only 5 seconds compared to
test_table_seid_index or test_table_id_index. These indexes are all normal
ones except for id_index, which is a unique index.

So I don't know why they take differing amounts of time, I don't think its
a caching thing, because it would have affected the second call for
test_table_id_index as well. I've seen cases of where the first index
vaconeind is fast and the remaining ones are slower. 

The wierd part for me is that it processes each index one by one, then the
relation, and then it is done. I would have thought it did everything row
by row, updating the index as required.

Hmmm, this is way beyond me, I thought the above might be useful for
someone, although you already may know this so please ignore it then. 

> Yes, the hashtable code needs work.  As a short-term workaround, you
> might try disabling hashjoins entirely, which you can do by passing
> the debug option "-fh" ("forbid hash").  For example,
>     setenv PGOPTIONS "-fh"
>     psql dbase
> The optimizer will then usually choose mergejoins, which are reasonable
> in performance if you have indexes on the columns being joined by.
> (There ought to be a SET variable that controls this, but there isn't.)

I have indexes on almost every one of my columns, but in many cases the
optimiser always chooses hash joins. Just today, I found a query which
wouldn't run with the SET cost hacks and lo and behold it works perfectly
with -fh. Do the values for the cost of merge join need changing?

The merge join is great, does 6.5 use it more or less than 6.4 for joins
with indexes? From what I've read, 6.5 is using more hash joins. I'm going
to put the -fh switch into my little toolbox of hacks to try for special
occasions when all else fails :) 


> Again, that doesn't directly prevent the optimizer from using hash
> joins, it just skews the cost estimates so that index scans will be used
> in preference to sequential scans, whereupon you get silly plans like
> the one you quoted:
> 
> > Hash Join  (cost=30000000.00 size=43361 width=20)
> >   ->  Index Scan using unix_snum_inv_index on unix
> > (cost=20311001006080.00 size=20311 width=16)
> >   ->  Hash  (cost=0.00 size=0 width=0)
> >         ->  Index Scan using services_snum_inv_index on services
> > (cost=43360999964672.00 size=43361 width=4)
> 
> This is silly because hash join doesn't care whether its inputs are
> sorted or not --- the extra cost of scanning the index is just being
> wasted here.  (Unless you have WHERE conditions that can be combined
> with the index to allow not scanning the whole table.  From the
> size estimates it looks like that might be happening for services,
> but not for the unix table, so the index scan on unix is definitely
> a waste of time.)

I did it as a way of tricking the optimiser into doing something that made
it work. When I first tried it, I didn't really understand what I was
doing, but when it made my queries run that previously wouldn't work, I
just started using it without question because I needed to get out of a
jam :)

Hmmm, I'll have to go through some more examples to check on this, maybe
the above example I gave was wrong ....

I did a more complicated query which does a join plus some extra
conditions:

*** Hash table out of memory
Hash Join  (cost=4662.25 size=4317 width=320) ->  Seq Scan on unix  (cost=1212.26 size=20310 width=204) ->  Hash
(cost=0.00size=0 width=0)       ->  Seq Scan on services  (cost=2204.91 size=4317 width=116)
 

*** Works with set cost hack
Hash Join  (cost=30000000.00 size=4317 width=320) ->  Seq Scan on services  (cost=4336099929358336.00 size=4317
width=116)->  Hash  (cost=0.00 size=0 width=0)       ->  Seq Scan on unix  (cost=2031099966390272.00 size=20310
width=204)

*** Works with forbid hash "-fh"
Nested Loop  (cost=11054.74 size=4317 width=320) ->  Seq Scan on services  (cost=2204.91 size=4317 width=116) ->  Index
Scanusing unix_snum_inv_index on unix  (cost=2.05 size=20310 width=204)
 


The extra conditions is that services.snum = services.id and
services.snum = unix.snum and services.invalid = 0 and unix.invalid = 0

So the only difference is that it is scanning them in a different order,
which is making it work ok. It could be caused by the size/cost not being
correct for the data actually there?

I guess with the hash table fixes this problem should go away, although I
am thinking about disabling hash joins in a few places to make things more
reliable though for now .... Hmmm, something to think about :) Others
who are having trouble might want to try this as well?


> Using an index scan *is* a good idea for merge join, on the other hand,
> because merge join requires sorted inputs.  (So if you don't have a
> suitable index, the plan will have to include an explicit sort step
> before the join.)

Is merge sort immune from the hash table out of memory thing? If so, then
I can just pay the cost of the sort and live with it, knowing that my
queries will always work?

> > One other problem related to the shared memory buffers is every so often,
> > the postmaster will die with shared memory errors, and device full.
> 
> Er, could you quote the exact error reports?  This is too vague to allow
> any conclusions.

I couldn't find any log files when I wrote the last email, but I had a bad
day today, (postmaster failed 3 times in quick succession) - I had waiting
backends, then I restarted, then had the problem again, then restarted,
then another restart :) and the output was:

DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600
NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died abnormally
andpossibly corrupted shared memory.       I have rolled back the current transaction and am going to terminate your
databasesystem connection and exit.       Please reconnect to the database system and repeat your query.
 
NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died abnormally
andpossibly corrupted shared memory.       I have rolled back the current transaction and am going to terminate your
databasesystem connection and exit.       Please reconnect to the database system and repeat your query.
 
NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died abnormally
andpossibly corrupted shared memory.       I have rolled back the current transaction and am going to terminate your
databasesystem connection and exit.       Please reconnect to the database system and repeat your query.
 
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL

FATAL: s_lock(18001065) at spin.c:125, stuck spinlock. Aborting.
FATAL: s_lock(18001065) at spin.c:125, stuck spinlock. Aborting.
DEBUG:  DumpAllLocks: xidLook->tag.lock = NULL



At this point, the postmaster is dead, and there are lots of postgres
children laying around, all still happy to process queries by the way,
which is pretty neat. But bad because there is no postmaster to control
them.

So I kill everything off, and restart, and everything is ok again (The
indexes on a table got messed up, but I fixed that by dropping them and
recreating them).

The DumpAllLocks implies that things are pretty busy and the deadlock code
is performing checks .... Any reason why it prints NULL, I would have
thought it would print something there like the name of a lock?



And another one:


IpcSemaphoreCreate: semget failed (No space left on device) key=5432015,
num=16, permission=600
NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died
abnormally and possibly corrupted shared memory.       I have rolled back the current transaction and am going to
terminate your database system connection and exit.       Please reconnect to the database system and repeat your
query.

FATAL: s_lock(18001065) at spin.c:125, stuck spinlock. Aborting.

FATAL: s_lock(18001065) at spin.c:125, stuck spinlock. Aborting.





Then another one after restarting everything:

ERROR:  cannot open segment 1 of relation sessions_done_id_index

FATAL: s_lock(1800d37c) at bufmgr.c:657, stuck spinlock. Aborting.

FATAL: s_lock(1800d37c) at bufmgr.c:657, stuck spinlock. Aborting.
NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died
abnormally and possibly corrupted shared memory.       I have rolled back the current transaction and am going to
terminate your database system connection and exit.       Please reconnect to the database system and repeat your
query.


This is where I dropped the index sessions_done_id_index.

I didn't run the ipcclean program, but I've never had to run it before,
and everything always starts up ok. After the third restart, everything
was ok. I think the dying was caused by too many things trying to happen
at once.


Note that the above errors are a normal 6.4.2 with no patches - I am not
running Tom's patch yet. The whole affair was cause by me running a
program which moves 20000 or so tuples from one table to another with
500000 tuples. It takes a few minutes and thrashes the I/O on the machine
- postgres died during this process.



> > Quick question: when you do a query with a join why does the hash table
> > code need to use shared memory? Can't it do the join within its own memory
> > space?
> 
> It doesn't use shared memory.  It's just that for historical reasons,
> the amount of private memory allocated for a hashjoin table is the same
> as the amount of shared memory allocated for buffers (ie, the -B
> switch).  I've been thinking of changing it to be driven by the -S
> switch instead, since that seems to make more sense.

Ahhh, cool ... Ok, that explains it then :)


Thankyou very much for you help again, I hope the above is useful for
everyone to look at. I'm not sure how much has been fixed in 6.5 but I
want to make sure that if they still exist in 6.5beta we can flush them
out.

bye,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski                               Tel:     (08) 8221 5221
Research & Development Manager                Fax:     (08) 8221 5220
SE Network Access Pty Ltd                     Mob:     0407 395 889
222 Grote Street                              Email:   wayne@senet.com.au
Adelaide SA 5000                              WWW:     http://www.senet.com.au



[HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another example))

From
"Hiroshi Inoue"
Date:
Hello all,

>
> Hi everyone!
>
>
> Tom Lane Writes:
>
> > Wayne Piekarski <wayne@senet.com.au> writes:
> > > Currently, I start up postmaster with -B 192, which I guess
> puts it below
> > > the value of 256 which causes problems. Apart from when I got past 256
> > > buffers, does the patch fix anything else that might be
> causing problems?
> >

[snip]

>
> Then another one after restarting everything:
>
> ERROR:  cannot open segment 1 of relation sessions_done_id_index
>

I got the same error in my test cases.
I don't understand the cause of this error.

But it seems I found another problem instead.
   spinlock io_in_progress_lock of a buffer page is not   released by operations called by elog() such as
ProcReleaseSpins(),ResetBufferPool()etc.
 
   For example,the error we have encountered probably occured   in ReadBufferWithBufferLock().   When elog(ERROR/FATAL)
occursin smgrread/extend() which   is called from ReadBufferWithBufferLock(),smgrread/extend()   don't release the
io_in_progress_lockspinlock of the page.   If other transactions get that page as a free Buffer page,those
transactionswait the release of io_in_progress_lock spinlock   and would abort with message such as
 

> FATAL: s_lock(1800d37c) at bufmgr.c:657, stuck spinlock. Aborting.
>
> FATAL: s_lock(1800d37c) at bufmgr.c:657, stuck spinlock. Aborting.

Comments ?

I don't know details about spinlock stuff.
Sorry,if my thought is off the point.

And I have another question.

It seems elog(FATAL) doesn't release allocated buffer pages.
It's OK ?
AFAIC elog(FATAL) causes proc_exit(0) and proc_exit() doesn't
call ResetBufferPool().

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp



DROP TABLE does not drop a table completely

From
"Hiroshi Inoue"
Date:
Hi all,
This mail is about the original cause of [HACKERS] spinlock freeze ?(Re:
INSERT/UPDATE waiting (another example)).

> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Hiroshi Inoue
> Sent: Thursday, May 13, 1999 7:29 PM
> To: Tom Lane; Wayne Piekarski
> Cc: pgsql-hackers@postgreSQL.org
> Subject: [HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another
> example))
>
>

[snip]

> >
> > Then another one after restarting everything:
> >
> > ERROR:  cannot open segment 1 of relation sessions_done_id_index
> >
>
> I got the same error in my test cases.
> I don't understand the cause of this error.
>

I got this error message by dropping a table while concurrent transactions
inserting rows to the same table.

I think other transactions should abort with message "Table does not
exist". But in most cases the result is not so.

It seems that other transactions could proceed before DROP TABLE
command is completed.

AFAIC heap_destroy_with_catalog() acquires AccessExclusiveLock and
releases the lock inside the function.

I think that heap_destroy_with_catalog() (or upper level function) should
not
release the lock.

Comments ?

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] DROP TABLE does not drop a table completely

From
Vadim Mikheev
Date:
Hiroshi Inoue wrote:
> 
> > >
> > > ERROR:  cannot open segment 1 of relation sessions_done_id_index
> > >
> >
> > I got the same error in my test cases.
> > I don't understand the cause of this error.
> >
> 
> I got this error message by dropping a table while concurrent transactions
> inserting rows to the same table.
> 
> I think other transactions should abort with message "Table does not
> exist". But in most cases the result is not so.
> 
> It seems that other transactions could proceed before DROP TABLE
> command is completed.
> 
> AFAIC heap_destroy_with_catalog() acquires AccessExclusiveLock and
> releases the lock inside the function.
> 
> I think that heap_destroy_with_catalog() (or upper level function) should
> not
> release the lock.

You're right - this should be done keeping in mind that DROP is allowed
inside BEGIN/END (long transactions), but I'm not sure that this
will help generally: does it matter when unlock dropped relation -
in heap_destroy_with_catalog() or in commit? The real problem is
that heap/index_open open file _before_ acquiring any locks and
doesn't check t_xmax of relation/index tuple. I believe that
this is old problem. There are another ones, sure. 
Catalog cache code must be re-designed.

Vadim


RE: [HACKERS] DROP TABLE does not drop a table completely

From
"Hiroshi Inoue"
Date:

> -----Original Message-----
> From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim
> Mikheev
> Sent: Monday, May 17, 1999 4:01 PM
> To: Hiroshi Inoue
> Cc: pgsql-hackers
> Subject: Re: [HACKERS] DROP TABLE does not drop a table completely
> 
> 
> Hiroshi Inoue wrote:
> > 
> > > >
> > > > ERROR:  cannot open segment 1 of relation sessions_done_id_index
> > > >
> > >
> > > I got the same error in my test cases.
> > > I don't understand the cause of this error.
> > >
> > 
> > I got this error message by dropping a table while concurrent 
> transactions
> > inserting rows to the same table.
> > 
> > I think other transactions should abort with message "Table does not
> > exist". But in most cases the result is not so.
> > 
> > It seems that other transactions could proceed before DROP TABLE
> > command is completed.
> > 
> > AFAIC heap_destroy_with_catalog() acquires AccessExclusiveLock and
> > releases the lock inside the function.
> > 
> > I think that heap_destroy_with_catalog() (or upper level 
> function) should
> > not
> > release the lock.
> 
> You're right - this should be done keeping in mind that DROP is allowed
> inside BEGIN/END (long transactions), but I'm not sure that this
> will help generally: does it matter when unlock dropped relation -
> in heap_destroy_with_catalog() or in commit?

Unlocking dropped relation before commit enables other transactions 
proceed and the transactions regard the relation as still alive before the                       ^^^^^^^^^ 
commit of DROP TABLE command(It's right,I think). As a result,those 
transactions behave strangely,though I don't know more about the 
reason.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp


spinlock freeze again

From
"Hiroshi Inoue"
Date:
Hi all,

ProcReleaseSpins() does nothing unless MyProc is set.
So both elog(ERROR/FATAL) and proc_exit(0) before 
InitProcess() don't release spinlocks.

Comments ?

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] spinlock freeze again

From
Bruce Momjian
Date:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> Hi all,
> 
> ProcReleaseSpins() does nothing unless MyProc is set.
> So both elog(ERROR/FATAL) and proc_exit(0) before 
> InitProcess() don't release spinlocks.

Are their any locks acquired before InitProcess()?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: [HACKERS] spinlock freeze again

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
> Sent: Thursday, July 08, 1999 3:21 AM
> To: Hiroshi Inoue
> Cc: pgsql-hackers
> Subject: Re: [HACKERS] spinlock freeze again
> 
> 
> [Charset iso-8859-1 unsupported, filtering to ASCII...]
> > Hi all,
> > 
> > ProcReleaseSpins() does nothing unless MyProc is set.
> > So both elog(ERROR/FATAL) and proc_exit(0) before 
> > InitProcess() don't release spinlocks.
> 
> Are their any locks acquired before InitProcess()?
>

OidGenLockId spinlock is acquired in InitTransactionLog().

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] spinlock freeze again

From
Bruce Momjian
Date:
> > Are their any locks acquired before InitProcess()?
> >
> 
> OidGenLockId spinlock is acquired in InitTransactionLog().
> 
> Regards.
> 

Well, seems we have a Proc queue that holds locks, but for these other
cases, we don't.  We could use the on_shmexit queue to add an cleanup
handler once we get the lock, and remove it from the queue once we
release the lock.  We don't currently have the ability to remove
specific queue entries, but we could easily do that.

Is the lock failure a problem that happens a lot?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: [HACKERS] spinlock freeze again

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
> Sent: Thursday, July 08, 1999 10:09 AM
> To: Hiroshi Inoue
> Cc: pgsql-hackers
> Subject: Re: [HACKERS] spinlock freeze again
> 
> 
> > > Are their any locks acquired before InitProcess()?
> > >
> > 
> > OidGenLockId spinlock is acquired in InitTransactionLog().
> > 
> > Regards.
> > 
> 
> Well, seems we have a Proc queue that holds locks, but for these other
> cases, we don't.  We could use the on_shmexit queue to add an cleanup
> handler once we get the lock, and remove it from the queue once we
> release the lock.  We don't currently have the ability to remove
> specific queue entries, but we could easily do that.
> 
> Is the lock failure a problem that happens a lot?
>

It doesn't happen oridinarily.
I don't remember well how it happend.
Probably it was caused by other spinlock(io_in_progress_lock ?)
freeze while testing 6.5-beta.
Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] spinlock freeze again

From
Bruce Momjian
Date:
> It doesn't happen oridinarily.
> I don't remember well how it happend.
> Probably it was caused by other spinlock(io_in_progress_lock ?)
> freeze while testing 6.5-beta.

OK, let's see if it happens again.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: [HACKERS] spinlock freeze again

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
> Sent: Thursday, July 08, 1999 10:09 AM
> To: Hiroshi Inoue
> Cc: pgsql-hackers
> Subject: Re: [HACKERS] spinlock freeze again
> 
> 
> > > Are their any locks acquired before InitProcess()?
> > >
> > 
> > OidGenLockId spinlock is acquired in InitTransactionLog().
> > 
> > Regards.
> > 
> 
> Well, seems we have a Proc queue that holds locks, but for these other
> cases, we don't.  We could use the on_shmexit queue to add an cleanup
> handler once we get the lock, and remove it from the queue once we
> release the lock.  We don't currently have the ability to remove
> specific queue entries, but we could easily do that.
> 
> Is the lock failure a problem that happens a lot?
>

It doesn't happen oridinarily.
I don't remember well how it happend.
Probably it was caused by other spinlock(io_in_progress_lock ?)
freeze while testing 6.5-beta.
Regards.

Hiroshi Inoue
Inoue@tpf.co.jp