Thread: Trouble running nested select - Join query
Hi,
I have been working with Postgresxc 1.1 on a cluster, each cluster compute node having 48GB RAM.
I have instantiated the coordinator on one of the compute nodes and have 4 datanodes (2+2) running on two other compute nodes.
I have been trying to run the following query which is a select on two large, same size tables - miami_2d(head int, tail int) and miami_directednetwork(head int, tail int) (both of them have 105400516 rows).
miami_2d has been replicated on all datanodes and miami_directednetwork has been hash distributed on the head column across all 4 datanodes.
I have indexes on both the tables.
The query is:
copy (select * from (select s1.head as h1,s1.tail as t1 ,s2.tail as neighbor from miami_2d s1, miami_directednetwork s2 where s1.tail=s2.head and s2.tail not in (select tail from miami_2d where head=s1.head)) as O where O.h1!=O.neighbor) to '/tmp/tmp.txt'
Basically, the two tables are just a copy of each other having 'tail' as neighbor of 'head'. I am trying to find the distance-2 or 2 hop neighbor of the head column.
For example:
miami_directednetwork
head | tail
400 | 336
400 | 209
400 | 487
336 | 400
336 | 209
336 | 500
487 | 400
487 | 391
miami_2d
head | tail
400 | 336
400 | 209
400 | 487
336 | 400
336 | 209
336 | 500
487 | 400
487 | 391
Result:
head | tail | neighbor
400 | 336 | 500
400 | 209 | 391
The issue is that every time I run the above query, it either gets killed by the system or it just keeps running for hours( I kill it after I see it running for more than a day).
I am using the following settings in the postgers.conf file:
For the datanodes:
effective_cache_size=10480MB
shared_buffers=3072MB
work_mem=10480MB
For the coordinator:
effective_cache_size=1048MB
shared_buffers=512MB
work_mem=1024MB
My question is: Is there anything wrong with the way I have framed the query or perhaps something else wrong?
Any help is appreciated!
Regards,
Rushi
Rushi wrote > copy (select * from (select s1.head as h1,s1.tail as t1 ,s2.tail as > neighbor from miami_2d s1, miami_directednetwork s2 where s1.tail=s2.head > and s2.tail not in (select tail from miami_2d where head=s1.head)) as O > where O.h1!=O.neighbor) to '/tmp/tmp.txt' Some thoughts: The O sub-query could probably be top-level A correlated sub-query in the where clause should generally use EXISTS, not IN ... ... Though if you can get rid of the whole thing that would be better You might want to use the statement_timeout GUC to kill the query if it takes too long You could separate the the select and the copy into two statements via use of a temporary table - depending on how large the output is that would split the action into two statements that could be independently monitored. Though, to be honest now that I re-read I have no clue how this query inter-plays with PostgresXC... Can you run a smaller input size and use that estimate how long a 100M^2 query should take to complete? Can you calculate in smaller block sizes on the input? David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Trouble-running-nested-select-Join-query-tp5804938p5804995.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.