Re: max length of sql select statement (long!) - Mailing list pgsql-sql

From markus brosch
Subject Re: max length of sql select statement (long!)
Date
Msg-id 1058187439.5983.2.camel@sam
Whole thread Raw
In response to Re: max length of sql select statement ?  (markus brosch <brosch@gmx.de>)
Responses Re: max length of sql select statement (long!)
List pgsql-sql
It's me again - with the easy and complex statement at the same time ;-)
Who can help me?!

As far, my solution is still unsatisfactory. 
I'll explain my full task this time, 
otherwise it is hard to explain!

My tables: 
(redundant for speed reasons - doesn't matter - only requests!)

===============================================================

table: goGraph
primarykey(parent,child)
index (B-tree) on parent column and on child column.

parent     |   child
--------------------------
GO:0000001 |   GO:0000002
GO:0000001 |   GO:0000003
GO:0000002 |   GO:0000005
GO:0000002 |   GO:0000006
GO:0000003 |   GO:0000006
GO:0000003 |   GO:0000007
...        |   ...
about 15000 elements more!

This represents a direct acyclic graph:
                GO:0000001                /        \               v          v       GO:0000002        GO:0000003
/        \        /        \      v          v      v          v
 
GO:0000005       GO:0000006        GO:0000007

===============================================================

table: binaryInteraction
primarykey(baitProtein, preyProtein, gogoKey)
index (B-tree) on gogokey.

baitProtein | preyProtein | "baitGoId" | "preyGoId" | gogoKey
--------------------------------------------------------------
123         | 123         | GO:0000001 | GO:0000001 | 11
123         | 123         | GO:0000020 | GO:0000001 | 120
123         | 123         | GO:0000020 | GO:0000020 | 2020
123         | 345         | GO:0000001 | GO:0000001 | 11 
123         | 345         | GO:0000020 | GO:0000001 | 120
...         | ...         | ...        | ...        | ...
up to several millions entries!

===============================================================

So, what I have to do is:
Someone choose 2 goIds. For these ids and all their children
(information from goGraph) I want the count(*) of "fits" within the
table binaryInteraction. That means, baitGoId and preyGoId must be
either the two choosen goIds or one of the related children. I use the
gogoKey for this task: I compute Integers (within Java) from the goIds
and sort smallerGoId.concatenate(biggerGoId) - so I've unique keys for
baitGo / preyGo pairs. One more problem: The goGraph (see table) is a
graph ... and for that reason one node can have more parents. If I
choose e.g. GO:000002 and GO:000003 from the example above and want to
compute their children, I don't want to take GO:000006 into account (two
parents in which I am intested in). That menas, whenever I ask for
children of two nodes, I want a DISTINCT SET of children.  

Example how I am processing the data at the moment:

User chosse GoId1: GO:0000005 and GO:0000008;

Java: 
- get all childs of GO:000005 by jdbc from goGraph - collection "one"
- get all childs of GO:000006 by jdbc from goGraph - collection "two"
- compute intersection of the two collections
- substract of each collection the intersection
- result: distinct children of two goIds

Now the binaryInteraction table came into our game ;-)
I apply the one distinct collection "one" and collection "two" to
gogoKey:

-----------------------

1. SELECT count(*)   FROM (SELECT DISTINCT bait, prey         FROM binaryInteraction         WHERE gogoKey IN
(gogokey1,gogokey2, gogokey3, ... )       )  AS foo;
 

gogokeyX will be processed by java -> it generates for each
go-go-combination the unique gogokey. 

Speed: medium

Limit: only 10000 Elements within the IN statement can be processed by
default. In postgres.conf I can change this limit, as Rod Taylor already
posted. But in the manual there is a warning of BufferOverflow if the
value it too high ...

If I reach the limit, I get something like:
java.sql.SQLException: ERROR:  Expression too complex: nesting depth
exceeds max_expr_depth = 10000

BTW: Why nested depth? Within IN I compute thousands of "OR"s, but why
nested ... ???

-----------------------

2. for all gogokeys I generate a temporary table and do a join with the
binaryInteraction table.

SELECT COUNT(*)
FROM (SELECT DISTINCT binaryInteraction.bait binaryInteraction.prey      FROM binaryInteraction, tempTable      WHERE
binaryInteraction.gogokey= tempTable.gogokey) 
 
AS temp)

Speed: very slow, but no limit!

-----------------------

After all, I want to compute and precalculate every possible
goId-goId-combination ;-) So, each of these things I explained here,
I've to do millions of time! I tried to find a recursive solution -
impossible!

Who can help? Maybe there is a way to compute it in one but more complex
sql-statemnt. Me as far I am still new to sql I don't know the "tricks"!
If there would be a way to calculate the intersection ... then maybe a
solution is not that far ... Also I was thinking about a denormalized
goGraph table: parent -> allChilds. Then I compute (within SQL?) the
intersection between the 2 GoIds and then I apply it to
binaryInteraction.

Thanks for any ideas and thanks at least 
for reading that long story ;-)

Cheers Markus

PS: If you have any additional questions: please feel free to ask!




pgsql-sql by date:

Previous
From: Steve Wampler
Date:
Subject: Re: Replacing a simple nested query?
Next
From: Matt Tenenbaum
Date:
Subject: ad hoc referential integrity