Thread: Dead Lock problem with 8.1.3

Dead Lock problem with 8.1.3

From
Kai Hessing
Date:
Hi Folks,

I have a strange Problem (to be honest there are more than one, but this
is one of it) after Upgrading to Postgres 8.1.3. The following SQL seems
to produce a deadlock while doing an endless reading of a temp table:

SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND v.veraid =
34 AND s.sid NOT IN ( SELECT sid FROM stud_vera WHERE veraid = 2 );

It should give me a List of sids which are connected to veraid 34 but
not to veraid 2 at the same time. When removing the 'AND v.veraid = 34':

SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND s.sid NOT
IN ( SELECT sid FROM stud_vera WHERE veraid = 2 );

everything works really fine (even if I include an additional criteria
like 'AND stud_vera.status > 0' ). On Postgres 8.0.8 this problem
doesn't exist. The first clause works with a lot of additional criteria
very fast.

Any Idea?

*greets*
Kai

Re: Dead Lock problem with 8.1.3

From
Kai Hessing
Date:
No one any idea? *sigh*

Kai Hessing wrote:
> Hi Folks,
>
> I have a strange Problem (to be honest there are more than one, but this
> is one of it) after Upgrading to Postgres 8.1.3. The following SQL seems
> to produce a deadlock while doing an endless reading of a temp table:
>
> SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND v.veraid =
> 34 AND s.sid NOT IN ( SELECT sid FROM stud_vera WHERE veraid = 2 );
>
> It should give me a List of sids which are connected to veraid 34 but
> not to veraid 2 at the same time. When removing the 'AND v.veraid = 34':
>
> SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND s.sid NOT
> IN ( SELECT sid FROM stud_vera WHERE veraid = 2 );
>
> everything works really fine (even if I include an additional criteria
> like 'AND stud_vera.status > 0' ). On Postgres 8.0.8 this problem
> doesn't exist. The first clause works with a lot of additional criteria
> very fast.
>
> Any Idea?
>
> *greets*
> Kai

Re: Dead Lock problem with 8.1.3

From
Martijn van Oosterhout
Date:
On Tue, Sep 26, 2006 at 03:59:48PM +0200, Kai Hessing wrote:
> No one any idea? *sigh*

It probably has something to with the fact that you didn't explain what
you meant by "deadlock". Also, you refer to a temp table, yet don't
indicate which table it is.

You'll need to be a lot more specific about your problem before we can
help. For example, a script to reproduce the problem, or a copy of psql
output demonstrating it.

Hope this helps,

> Kai Hessing wrote:
> > Hi Folks,
> >
> > I have a strange Problem (to be honest there are more than one, but this
> > is one of it) after Upgrading to Postgres 8.1.3. The following SQL seems
> > to produce a deadlock while doing an endless reading of a temp table:
> >
> > SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND v.veraid =
> > 34 AND s.sid NOT IN ( SELECT sid FROM stud_vera WHERE veraid = 2 );

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Dead Lock problem with 8.1.3

From
Alban Hertroys
Date:
Kai Hessing wrote:
> No one any idea? *sigh*
>
> Kai Hessing wrote:
>> Hi Folks,
>>
>> I have a strange Problem (to be honest there are more than one, but this
>> is one of it) after Upgrading to Postgres 8.1.3. The following SQL seems
>> to produce a deadlock while doing an endless reading of a temp table:
>>
>> SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND v.veraid =
>> 34 AND s.sid NOT IN ( SELECT sid FROM stud_vera WHERE veraid = 2 );

I'm pretty sure it's not a deadlock. It probably takes very long for
some reason; maybe an explain of that query will give some insight. You
probably lack some indices.

Alternatively, have you tried an EXCEPT query? I think this is what you
try to query:

SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND v.veraid = 34
EXCEPT
SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND v.veraid = 2


--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: Dead Lock problem with 8.1.3

From
Joe Conway
Date:
Martijn van Oosterhout wrote:
> On Tue, Sep 26, 2006 at 03:59:48PM +0200, Kai Hessing wrote:
>
>>No one any idea? *sigh*
>
>
> It probably has something to with the fact that you didn't explain what
> you meant by "deadlock". Also, you refer to a temp table, yet don't
> indicate which table it is.
>
> You'll need to be a lot more specific about your problem before we can
> help. For example, a script to reproduce the problem, or a copy of psql
> output demonstrating it.

Also EXPLAIN output, and information regarding the number of rows in
each table, and the number of rows matching veraid = 2 and veraid = 34
might help.

While the query is running, how much CPU is the process consuming, and
what does vmstat show for disk and swap I/O?

Joe


Re: Dead Lock problem with 8.1.3

From
Tom Lane
Date:
Kai Hessing <kai.hessing@hobsons.de> writes:
> No one any idea? *sigh*

What makes you think it's a deadlock and not a very slow query?  I'd be
checking if the tables were all ANALYZEd and comparing EXPLAIN output
to the old database ...

            regards, tom lane

Re: Dead Lock problem with 8.1.3

From
Kai Hessing
Date:
Alban Hertroys wrote:
>>> SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND v.veraid =
>>> 34 AND s.sid NOT IN ( SELECT sid FROM stud_vera WHERE veraid = 2 );
>
> I'm pretty sure it's not a deadlock. It probably takes very long for
> some reason; maybe an explain of that query will give some insight. You
> probably lack some indices.

No. The system goes into an endless loop. The part ('SELECT sid FROM
stud_vera WHERE veraid = 2') seems to create a temporary table again and
again and again ....
The same clause needs around 5 seconds under Postgres 8.0.8. On 8.1.3 we
killed the process after 40 hours while using constantly 80% CPU power.
(Btw. Explain leads to the same problem, it just hangs up...)

We're currently trying to separate the problem into a simple setup.

> Alternatively, have you tried an EXCEPT query? I think this is what you
> try to query:
>
> SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND v.veraid = 34
> EXCEPT
> SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND v.veraid = 2

Thanks, I will try this out. To be honest, I didn't know the EXCEPT
query (and I think the programmer also doesn't *g*).

*greets*
Kai

Re: Dead Lock problem with 8.1.3

From
Kai Hessing
Date:
Joe Conway wrote:
> Martijn van Oosterhout wrote:
>> It probably has something to with the fact that you didn't explain what
>> you meant by "deadlock". Also, you refer to a temp table, yet don't
>> indicate which table it is.

Deadlock means it hangs up and doesn't terminate through timeout.
The system goes into an endless loop. The part ('SELECT sid FROM
stud_vera WHERE veraid = 2') seems to create a temporary table again and
again and again ....

>> You'll need to be a lot more specific about your problem before we can
>> help. For example, a script to reproduce the problem, or a copy of psql
>> output demonstrating it.

There is no output. It just takes forever. We're currently working to
seperate this problem into a simple setup. If we have one, I will post it.

> Also EXPLAIN output, and information regarding the number of rows in
> each table, and the number of rows matching veraid = 2 and veraid = 34
> might help.

Explain produces the same problem. It just takes forever...

> While the query is running, how much CPU is the process consuming, and
> what does vmstat show for disk and swap I/O?

~80% CPU power. Disk usage is not noticable.

Thanks and
*greets*
Kai

Re: Dead Lock problem with 8.1.3

From
Kai Hessing
Date:
Tom Lane wrote:
> Kai Hessing <kai.hessing@hobsons.de> writes:
>> No one any idea? *sigh*
>
> What makes you think it's a deadlock and not a very slow query?  I'd be
> checking if the tables were all ANALYZEd and comparing EXPLAIN output
> to the old database ...

*hmm* the difference of 5 seconds on 8.0.8 compared to killing the
process after 40 hours?
All databases are ANALYZEd and EXPLAIN output cannot be compared because
it hangs also up on 8.0.8.

Re: Dead Lock problem with 8.1.3

From
Joe Conway
Date:
Kai Hessing wrote:
> Deadlock means it hangs up and doesn't terminate through timeout.
> The system goes into an endless loop. The part ('SELECT sid FROM
> stud_vera WHERE veraid = 2') seems to create a temporary table again and
> again and again ....

It is possible for a query to run for many days, and still finish. This
classifies as slow, not hung. The difference is important in
troubleshooting to determine the cause.

>>Also EXPLAIN output, and information regarding the number of rows in
>>each table, and the number of rows matching veraid = 2 and veraid = 34
>>might help.
>
> Explain produces the same problem. It just takes forever...

Did you try EXPLAIN, or EXPLAIN ANALYZE? The former only does the
planning, the latter actually materializes the result in order to get
actual timings, and will therefore take at least as long as the query
itself.

>>While the query is running, how much CPU is the process consuming, and
>>what does vmstat show for disk and swap I/O?
>
> ~80% CPU power. Disk usage is not noticable.

Can you attach with a debugger and see exactly what's going on? If not
we'd need that self contained test case to reproduce the problem.

Joe

Re: Dead Lock problem with 8.1.3

From
Andrew Sullivan
Date:
On Wed, Sep 27, 2006 at 06:46:42PM +0200, Kai Hessing wrote:
>
> Deadlock means it hangs up and doesn't terminate through timeout.

No, it doesn't.  Deadlock means, for the two deadlocked queries, both
cannot possibly finish because each waits on a lock that the other
one holds.

You can cause such deadlocks in your application, too, of course, but
they're not database deadlocks.

Also. . .

> There is no output. It just takes forever.

. . .define "forever".  Is it doing any work?  Do you see i/o?  Is it
in SELECT WAITING state?

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.
        --Dennis Ritchie

Re: Dead Lock problem with 8.1.3

From
Alban Hertroys
Date:
Kai Hessing wrote:
> Alban Hertroys wrote:
>>>> SELECT s.sid FROM stud s, stud_vera v WHERE s.sid = v.sid AND v.veraid =
>>>> 34 AND s.sid NOT IN ( SELECT sid FROM stud_vera WHERE veraid = 2 );
>> I'm pretty sure it's not a deadlock. It probably takes very long for
>> some reason; maybe an explain of that query will give some insight. You
>> probably lack some indices.
>
> No. The system goes into an endless loop. The part ('SELECT sid FROM
> stud_vera WHERE veraid = 2') seems to create a temporary table again and
> again and again ....

Ah, this is where part of the confusion stems from. That subquery just
results in a resultset being created (maybe not even that, depends a bit
on the query planner). That's not a temporary table.

I rather doubt that postgres would repeat that query, it's results won't
change between comparisons with rows from your main query (this depends
a bit on what type of transaction isolation you use).

> The same clause needs around 5 seconds under Postgres 8.0.8. On 8.1.3 we
> killed the process after 40 hours while using constantly 80% CPU power.
> (Btw. Explain leads to the same problem, it just hangs up...)

EXPLAIN without ANALYZE locking up?!? Maybe some application is holding
a lock on a record in your result set. Did you try this query without
any other applications connecting to that database?

If you can't do that, you could dump that database and restore it in a
different one for testing cases like this.

Lastly, considering the odd behaviour and the huge differences between
minor versions of the database, you might have a corrupted index
somewhere. You can fix those with REINDEX.
--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: Dead Lock problem with 8.1.3

From
Kai Hessing
Date:
> EXPLAIN without ANALYZE locking up?!? Maybe some application is holding
> a lock on a record in your result set.

OK, this was a good hint. I got EXPLAIN working. My mistake was, that I
tried explain the last time, while the 'Killer-SQL' was running. Then
EXPLAIN didn't answer until I killed the CPU-eating process. Interesting
is, that it seems to be not possible to access pg_locks while this
process is running. A 'select * from pg_locks' takes also for... a very
long time...

> Did you try this query without
> any other applications connecting to that database?
>
> If you can't do that, you could dump that database and restore it in a
> different one for testing cases like this.

Yes, I surely have a testing environment with the same setup as the live
system. The Query does not work. But after terminating all running
queries the EXPLAIN did work. This is the result (The 8.0.8 database has
a dataset which is around 2 weeks older):

PostgreSQL 8.1.4 (The one which is taking... a very long time...)

Merge Join  (cost=7751.81..50026810.45 rows=7364 width=4)
  Merge Cond: ("outer".sid = "inner".sid)
  ->  Index Scan using stud_pkey on stud s  (cost=7751.81..49994210.01
rows=56607 width=4)
        Filter: (NOT (subplan))
        SubPlan
          ->  Materialize  (cost=7751.81..8497.94 rows=53613 width=4)
                ->  Seq Scan on stud_vera  (cost=0.00..7488.20
rows=53613 width=4)
                      Filter: (veraid = 2)
  ->  Index Scan using stud_vera_sid_veraid_idx on stud_vera v
(cost=0.00..37646.74 rows=14729 width=4)
        Index Cond: (veraid = 34)


PostgreSQL 8.0.8 (The one which takes only a few seconds... on a much
slower system...)

Merge Join  (cost=22724.05..23019.46 rows=731 width=4)
  Merge Cond: ("outer".sid = "inner".sid)
  ->  Sort  (cost=15223.02..15363.42 rows=56159 width=4)
        Sort Key: s.sid
        ->  Seq Scan on stud s  (cost=7427.89..10792.85 rows=56159 width=4)
              Filter: (NOT (hashed subplan))
              SubPlan
                ->  Seq Scan on stud_vera  (cost=0.00..7424.24 rows=1461
width=4)
                      Filter: (veraid = 2)
  ->  Sort  (cost=7501.03..7504.69 rows=1461 width=4)
        Sort Key: v.sid
        ->  Seq Scan on stud_vera v  (cost=0.00..7424.24 rows=1461 width=4)
              Filter: (veraid = 34)

> Lastly, considering the odd behaviour and the huge differences between
> minor versions of the database, you might have a corrupted index
> somewhere. You can fix those with REINDEX.

I did REINDEX and REINDEX FORCE and VACUUM FULL ANALYZE and all of this
doesn't take any noticable effect neither in calling the query nor in
explaining it.

Another thing I did was to do strace to the problematic process and
noticed that it is also repeatedly accessing a file called (for the
whole time): postgresql/8.1/main/base/1740468/pgsql_tmp/pgsql_tmp21938.0
It doesn't cost much disk performance but is permanently working.


Re: Dead Lock problem with 8.1.3

From
Kai Hessing
Date:
Andrew Sullivan wrote:
> No, it doesn't.  Deadlock means, for the two deadlocked queries, both
> cannot possibly finish because each waits on a lock that the other
> one holds.

Thanks for the clarification. I thought a deadlock is also, when the
system runs into an endless loop.

>> There is no output. It just takes forever.
>
> . . .define "forever".  Is it doing any work?  Do you see i/o?  Is it
> in SELECT WAITING state?

No it is not in WAITING state, it is permanently working. strace is
telling me that it is constantly accessing a file:
'postgresql/8.1/main/base/1740468/pgsql_tmp/pgsql_tmp21938.0' It doesn't
cost much performance, but it is running all the time.

Please see the full update at Message-ID: <4o2bkgFclmdhU1@individual.net>

Re: Dead Lock problem with 8.1.3

From
Kai Hessing
Date:
Joe Conway wrote:
> It is possible for a query to run for many days, and still finish. This
> classifies as slow, not hung. The difference is important in
> troubleshooting to determine the cause.

OK, what do you suggest, how long should the process run, until I can
except it not to end?

>>>Also EXPLAIN output, and information regarding the number of rows in
>>>each table, and the number of rows matching veraid = 2 and veraid = 34
>>>might help.
>>
>> Explain produces the same problem. It just takes forever...
>
> Did you try EXPLAIN, or EXPLAIN ANALYZE? The former only does the
> planning, the latter actually materializes the result in order to get
> actual timings, and will therefore take at least as long as the query
> itself.

I got EXPLAIN to work. Please see the full update at Message-ID:
<4o2bkgFclmdhU1@individual.net>
So, I don't have to post on different Thread-Branches and can keep the
information together. Thanks ;)

>>>While the query is running, how much CPU is the process consuming, and
>>>what does vmstat show for disk and swap I/O?
>>
>> ~80% CPU power. Disk usage is not noticable.
>
> Can you attach with a debugger and see exactly what's going on? If not
> we'd need that self contained test case to reproduce the problem.

Sorry, I'm currently not able to attach a debugger. I think I'm right
that this means a recompilation of the package would be necessary. The
only thing I was seeing with strace was that the Process is permanently
accessing a file called:
postgresql/8.1/main/base/1740468/pgsql_tmp/pgsql_tmp21938.0

Re: Dead Lock problem with 8.1.3

From
Kai Hessing
Date:
One Addition:

> PostgreSQL 8.1.4 (The one which is taking... a very long time...)
>
> Merge Join  (cost=7751.81..50026810.45 rows=7364 width=4)
>   Merge Cond: ("outer".sid = "inner".sid)
>   ->  Index Scan using stud_pkey on stud s  (cost=7751.81..49994210.01
> rows=56607 width=4)
>         Filter: (NOT (subplan))
>         SubPlan
>           ->  Materialize  (cost=7751.81..8497.94 rows=53613 width=4)
>                 ->  Seq Scan on stud_vera  (cost=0.00..7488.20
> rows=53613 width=4)
>                       Filter: (veraid = 2)
>   ->  Index Scan using stud_vera_sid_veraid_idx on stud_vera v
> (cost=0.00..37646.74 rows=14729 width=4)
>         Index Cond: (veraid = 34)

The interesting thing is that (like I've written in the main post) if I
remove the 'AND veraid=34' it working fine (0.3seconds). The explain for
this working query is:

Merge Join  (cost=45500.73..46741955.20 rows=147288 width=4)
  Merge Cond: ("outer".sid = "inner".sid)
  ->  Index Scan using stud_pkey on stud s  (cost=7734.28..46701101.71
rows=56607 width=4)
        Filter: (NOT (subplan))
        SubPlan
          ->  Materialize  (cost=7734.28..8431.06 rows=50078 width=4)
                ->  Seq Scan on stud_vera  (cost=0.00..7488.20
rows=50078 width=4)
                      Filter: (veraid = 2)
  ->  Sort  (cost=37766.46..38502.90 rows=294576 width=4)
        Sort Key: v.sid
        ->  Seq Scan on stud_vera v  (cost=0.00..6751.76 rows=294576
width=4)

Re: Dead Lock problem with 8.1.3

From
Tom Lane
Date:
Kai Hessing <kai.hessing@hobsons.de> writes:
> PostgreSQL 8.1.4 (The one which is taking... a very long time...)

>         Filter: (NOT (subplan))
>         SubPlan
>           ->  Materialize  (cost=7751.81..8497.94 rows=53613 width=4)

> PostgreSQL 8.0.8 (The one which takes only a few seconds... on a much
> slower system...)

>               Filter: (NOT (hashed subplan))
>               SubPlan
>                 ->  Seq Scan on stud_vera  (cost=0.00..7424.24 rows=1461
> width=4)
>                       Filter: (veraid = 2)

I bet the problem is that you're not getting a hashed subplan in 8.1.
What do you have work_mem set to on the two systems?  The fact that the
rowcount estimate for the subplan is much larger may be causing the
change, too.  Why is that --- perhaps you haven't ANALYZEd stud_vera
recently on one system or the other?  How many rows will really come out
of the sub-select (ie, what's "select count(*) from stud_vera
where veraid = 2")?

            regards, tom lane

Re: Dead Lock problem with 8.1.3

From
Kai Hessing
Date:
Tom Lane wrote:
> I bet the problem is that you're not getting a hashed subplan in 8.1.
> What do you have work_mem set to on the two systems?  The fact that the
> rowcount estimate for the subplan is much larger may be causing the
> change, too.  Why is that --- perhaps you haven't ANALYZEd stud_vera
> recently on one system or the other?  How many rows will really come out
> of the sub-select (ie, what's "select count(*) from stud_vera
> where veraid = 2")?

HEUREKA...
Increasing the work_mem had solved the problem. Result: 3170 rows in
1,487.927ms.
Working_mem was set to the default of 1MB. Increasing it to 4 solved it.
Are there any suggestions for the size. Maybe depending on database size?

Many thanks!

But I still wonder why it didn't work with the lower size of working_mem
and the temporary file on the disk? I had a similar problem in the past
where I forget to ANALYZE and before ANALYZEing also a temporary file
was used. The time difference had been from ten seconds to five minutes
and not from one second to over 40 hours. The temporary file is created
and has a size of around 2MB and doesn't change size while working. I
fear to run into the same problem, if I use more complex queries.

And, the problem with the estimated rows is still very interesting. The
Table is FULLy ANALYZEd and the actual count of stud_vera with verid=2
is 49176, so the 53000 are real close. Don't now, why it is so low on 8.0.8?

One other interesting thing is that the query plan has changed after
increasing the working mem:

Hash Join  (cost=12991.28..21472.83 rows=7512 width=4)
  Hash Cond: ("outer".sid = "inner".sid)
  ->  Bitmap Heap Scan on stud_vera v  (cost=1841.02..5834.80 rows=15023
width=4)
        Recheck Cond: (veraid = 34)
        ->  Bitmap Index Scan on stud_vera_sid_veraid_idx
(cost=0.00..1841.02 rows=15023 width=0)
              Index Cond: (veraid = 34)
  ->  Hash  (cost=11008.74..11008.74 rows=56607 width=4)
        ->  Seq Scan on stud s  (cost=7617.57..11008.74 rows=56607 width=4)
              Filter: (NOT (hashed subplan))
              SubPlan
                ->  Seq Scan on stud_vera  (cost=0.00..7488.20
rows=51747 width=4)
                      Filter: (veraid = 2)

*greets*
Kai