Join query crashes 7.3.1 backend... - Mailing list pgsql-general

From Mark Cave-Ayland
Subject Join query crashes 7.3.1 backend...
Date
Msg-id C1379626F9C09A4C821D6977AA6A54570634F1@webbased8.wb8.webbased.co.uk
Whole thread Raw
Responses Re: Join query crashes 7.3.1 backend...
List pgsql-general
Hi everyone,

I'm having a problem executing a join query which causes the postgresql
backend to crash. The query I'm trying to execute is:

select t2.resid from res t2, line t1 where t2.resid=t1.resid and
t1.layer=355::bigint;

When this runs then it causes the backend to die and disconnect from
psql :(. I did some initial tests with a limit clause to ensure to see
where things were going wrong and I managed to locate roughly where the
problem was since I could find where the query failed.

select t2.resid from res t2, line t1 where t2.resid=t1.resid and
t1.layer=355::bigint limit 710050;

...worked fine whereas....

select t2.resid from res t2, line t1 where t2.resid=t1.resid and
t1.layer=355::bigint limit 710075;

...caused the backend to die. So I was beginning to suspect that somehow
the data in the table had been corrupted as I've been executing the same
query with many different values of layer before without any problems.
So firstly I tried the line table:

select resid from line where layer=355::bigint;

...which worked without any problems whatsoever. So I was guessing that
the corrupted rows were in the res table. Using the \o command, I dumped
all the resids from the above query into a file and wrote a libpq
program to read in all the resids from the file and query them one by
one in the form 'select resid,* from res where resid=x::bigint' where x
was each line from the above query.

This is where things start getting strange because the program happily
returned all rows produced by the 'select resid from line where
layer=355::bigint' on line far beyond the problem area... up until the
2M+ mark when I decided to stop it. So I was now thinking that the data
in res was not corrupted either. My next step was to verify that the
rows in both the joined and unjoined queries on the line table were
being returned in the same order.

I did an explain which returned the following:

                                                       QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------
 Hash Join  (cost=113489207.76..119550368.76 rows=28567936 width=16)
   Hash Cond: ("outer".resid = "inner".resid)
   ->  Seq Scan on res t2  (cost=100000000.00..103939122.92
rows=72475392 width=8)
   ->  Hash  (cost=13306193.92..13306193.92 rows=28567936 width=8)
         ->  Index Scan using line_resid_idx on line t1
(cost=0.00..13306193.92 rows=28567936 width=8)
               Filter: (layer = 355::bigint)
(6 rows)


So my assumption from this was that the data was being returned in the
order of the line table. To check this, I executed the last query that
worked again:

select t2.resid from res t2, line t1 where t2.resid=t1.resid and
t1.layer=355::bigint limit 710050;

...but this time recorded the output into a file. And then the
following:

select resid from line where layer=355::bigint;

I then did a diff against the output of both queries to make sure that
the resid rows were being returned in the same order.

As suspected, the diff showed that the resids were being returned in the
same order in both cases up to the 710050th result (which is within 50
rows of whatever was causing it to crash).

So now I'm stuck as it seems pulling out the data around offset 710050
separately from both tables does not cause a problem whereas executing
the join directly causes the backend to crash. I have verified (to
within 25 rows) that the rows are being accessed in the same order in
both cases. Can anyone suggest why this is happening? I can supply
additional debug information if people can direct me as to what to do.


Many thanks,

Mark.


---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.


pgsql-general by date:

Previous
From: "Peter Gibbs"
Date:
Subject: Re: how to detect psql's errors?
Next
From: "Richard Sand"
Date:
Subject: error compiling 7.3.2 on solaris 8- library conflict