Join query crashes 7.3.1 backend... - Mailing list pgsql-general
From | Mark Cave-Ayland |
---|---|
Subject | Join query crashes 7.3.1 backend... |
Date | |
Msg-id | C1379626F9C09A4C821D6977AA6A54570634F1@webbased8.wb8.webbased.co.uk Whole thread Raw |
Responses |
Re: Join query crashes 7.3.1 backend...
|
List | pgsql-general |
Hi everyone, I'm having a problem executing a join query which causes the postgresql backend to crash. The query I'm trying to execute is: select t2.resid from res t2, line t1 where t2.resid=t1.resid and t1.layer=355::bigint; When this runs then it causes the backend to die and disconnect from psql :(. I did some initial tests with a limit clause to ensure to see where things were going wrong and I managed to locate roughly where the problem was since I could find where the query failed. select t2.resid from res t2, line t1 where t2.resid=t1.resid and t1.layer=355::bigint limit 710050; ...worked fine whereas.... select t2.resid from res t2, line t1 where t2.resid=t1.resid and t1.layer=355::bigint limit 710075; ...caused the backend to die. So I was beginning to suspect that somehow the data in the table had been corrupted as I've been executing the same query with many different values of layer before without any problems. So firstly I tried the line table: select resid from line where layer=355::bigint; ...which worked without any problems whatsoever. So I was guessing that the corrupted rows were in the res table. Using the \o command, I dumped all the resids from the above query into a file and wrote a libpq program to read in all the resids from the file and query them one by one in the form 'select resid,* from res where resid=x::bigint' where x was each line from the above query. This is where things start getting strange because the program happily returned all rows produced by the 'select resid from line where layer=355::bigint' on line far beyond the problem area... up until the 2M+ mark when I decided to stop it. So I was now thinking that the data in res was not corrupted either. My next step was to verify that the rows in both the joined and unjoined queries on the line table were being returned in the same order. I did an explain which returned the following: QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------ Hash Join (cost=113489207.76..119550368.76 rows=28567936 width=16) Hash Cond: ("outer".resid = "inner".resid) -> Seq Scan on res t2 (cost=100000000.00..103939122.92 rows=72475392 width=8) -> Hash (cost=13306193.92..13306193.92 rows=28567936 width=8) -> Index Scan using line_resid_idx on line t1 (cost=0.00..13306193.92 rows=28567936 width=8) Filter: (layer = 355::bigint) (6 rows) So my assumption from this was that the data was being returned in the order of the line table. To check this, I executed the last query that worked again: select t2.resid from res t2, line t1 where t2.resid=t1.resid and t1.layer=355::bigint limit 710050; ...but this time recorded the output into a file. And then the following: select resid from line where layer=355::bigint; I then did a diff against the output of both queries to make sure that the resid rows were being returned in the same order. As suspected, the diff showed that the resids were being returned in the same order in both cases up to the 710050th result (which is within 50 rows of whatever was causing it to crash). So now I'm stuck as it seems pulling out the data around offset 710050 separately from both tables does not cause a problem whereas executing the join directly causes the backend to crash. I have verified (to within 25 rows) that the rows are being accessed in the same order in both cases. Can anyone suggest why this is happening? I can supply additional debug information if people can direct me as to what to do. Many thanks, Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person.
pgsql-general by date: