Thread: max length of sql select statement ?
Hi All! I was searching the archive and was wondering why nobody asked this strange(!) question (or I've not found it?): "What is the max allowed length of a sql statement or query?" I want to combine hundrets or thousands 'OR' within a select statement. Possible or not (e.g: SELECT * FROM table WHERE col='a' OR col='b' OR col='c' OR ...... ) This would be a very "dirty" or some would say, a "horrible" solution; but how are you searching through your table with sql, when your SELECT is against a collection of different elements (e.g. Array of Strings, which should fit on one column and return all records which fit) Hope for help cheers Markus - sql beginner ;-)
Depending on ur original problem EXISTS or IN may be usable EXISTS is efficient and IN can be used efficiently in 7.4 version of postgresql regds mallah. > Hi All! > > I was searching the archive and was wondering why nobody asked this > strange(!) question (or I've not found it?): > > "What is the max allowed length of a sql statement or query?" > I want to combine hundrets or thousands 'OR' within a select statement. > Possible or not (e.g: SELECT * FROM table WHERE col='a' OR col='b' OR > col='c' OR ...... ) > > This would be a very "dirty" or some would say, a "horrible" solution; > > but how are you searching through your table with sql, when your SELECT > is against a collection of different elements (e.g. Array of Strings, > which should fit on one column and return all records which fit) > > Hope for help > cheers Markus - sql beginner ;-) > > > > > ---------------------------(end of > broadcast)--------------------------- TIP 1: subscribe and unsubscribe > commands go to majordomo@postgresql.org ----------------------------------------- Get your free web based email at trade-india.com. "India's Leading B2B eMarketplace.!" http://www.trade-india.com/
On Mon, 2003-07-07 at 17:57, mallah@trade-india.com wrote: > Depending on ur original problem EXISTS or IN may be usable > EXISTS is efficient and IN can be used efficiently in 7.4 version > of postgresql Could be a solution?! The question is - how long could the IN be? I mean, if I write something like: SELECT * FROM table WHERE columnX IN ('a', 'b', 'c', 'd', 'e', ... ); How long can the collection (list) within IN be? Also thousands of elements? And what means efficient? Goes the DB only once through the table? Cheers, Markus
> On Mon, 2003-07-07 at 17:57, mallah@trade-india.com wrote: >> Depending on ur original problem EXISTS or IN may be usable >> EXISTS is efficient and IN can be used efficiently in 7.4 version of >> postgresql > > Could be a solution?! > The question is - how long could the IN be? > > I mean, if I write something like: > SELECT * FROM table WHERE columnX IN ('a', 'b', 'c', 'd', 'e', ... ); > How long can the collection (list) within IN be? Also thousands of > elements? Well i DO NOT know the exact limit. May be someone else can answer it accurately. But you could produce the list within IN using a subselect that again depends on the exact problem. regds Mallah. > > And what means efficient? Goes the DB only once through the table? > > Cheers, Markus > > > > > ---------------------------(end of > broadcast)--------------------------- TIP 2: you can get off all lists > at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) ----------------------------------------- Get your free web based email at trade-india.com. "India's Leading B2B eMarketplace.!" http://www.trade-india.com/
> Could be a solution?! > The question is - how long could the IN be? I'm not sure about IN specifically, but I know you can do: SELECT * FROM table WHERE col = '<1GB long file>'; It tends not to be friendly for Ram though :)
On Mon, 2003-07-07 at 18:13, mallah@trade-india.com wrote: > Well i DO NOT know the exact limit. > May be someone else can answer it accurately. > > But you could produce the list within IN using a subselect > that again depends on the exact problem. > Maybe anybody knows how many? Anyway: My exact problem is "in words" quite easy: col1 | col2 ------------123 | 958143 | 394124 | 345324 | 345346 | 541743 | 144346 | 986 Imagine, this table is really big (millions records). Now, I want to retrieve for all records in col A OR col B where either 123, 124, 144, 541 (and a view thousands more ...) fits. As far as I understud you: SELECT * FROM table WHERE col1 IN (123,124,144,541,...) OR col2 IN (123,124,144,541,...); Cheers, Markus
On 7 Jul 2003, markus brosch wrote: > I was searching the archive and was wondering why nobody asked this > strange(!) question (or I've not found it?): > > "What is the max allowed length of a sql statement or query?" AFAIR in recent versions it's effectively limited only by resources (how much bandwidth/memory do you want to use). > I want to combine hundrets or thousands 'OR' within a select statement. > Possible or not (e.g: SELECT * FROM table WHERE col='a' OR col='b' OR > col='c' OR ...... ) It should be possible. > This would be a very "dirty" or some would say, a "horrible" solution; > > but how are you searching through your table with sql, when your SELECT > is against a collection of different elements (e.g. Array of Strings, > which should fit on one column and return all records which fit) Are the 'a', 'b', etc... fixed or generated on the fly? I'm not sure that the plan for a thousand OR clauses (or equivalently a thousand entries in a static IN) is going to necessarily be terribly good. You might have better luck setting it up to do a join with a table (possibly a temporary table if they're generated on the fly).
> > but how are you searching through your table with sql, when your SELECT > > is against a collection of different elements (e.g. Array of Strings, > > which should fit on one column and return all records which fit) > > Are the 'a', 'b', etc... fixed or generated on the fly? I'm not sure > that the plan for a thousand OR clauses (or equivalently a thousand > entries in a static IN) is going to necessarily be terribly good. You > might have better luck setting it up to do a join with a table (possibly a > temporary table if they're generated on the fly). Generated "on the fly" by JDBC within Java ;-) I already tried the join, but it takes ages *g*
On Mon, 2003-07-07 at 18:20, Rod Taylor wrote: > > Could be a solution?! > > The question is - how long could the IN be? > > I'm not sure about IN specifically, but I know you can do: > SELECT * FROM table WHERE col = '<1GB long file>'; > It tends not to be friendly for Ram though :) Hi again! After I decided to use this statement (discussion two days ago!): SELECT * FROM table WHERE columnX IN ('a', 'b', 'c', 'd', 'e', ... ); I got this error :-((( java.sql.SQLException: ERROR: Expression too complex: nesting depth exceeds max_expr_depth = 10000 I was searching the web and only found this discussion: http://groups.google.de/groups?hl=de&lr=&ie=UTF-8&oe=UTF-8&threadm=3B2EA1E9.843D940A%40selectacast.net&rnum=1&prev=/groups%3Fhl%3Dde%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26q%3DExpression%2Btoo%2Bcomplex%253A%2Bnesting%2Bdepth%2Bexceeds%2Bmax_expr_depth%2B%253D%2B10000%26btnG%3DGoogle%2BSuche Maybe we can disuss that problem here again?! What exactly means "max_expr_depth"? Thanks for any help Cheers Markus
> Maybe we can disuss that problem here again?! What exactly means > "max_expr_depth"? Thanks for any help If I'm not mistaken, max_expr_depth is used to catch runaway recursion (view a is select * from b, view b is select * from a). It's a tunable in postgresql.conf. Toss a couple of 0's behind the number and uncomment the line.
On Wed, 2003-07-09 at 17:45, Rod Taylor wrote: > > Nobody a better idea? Why is the join of a temporary table (the IN > > paramters) and the original table so slow? Any tricks here? > > Did you index and ANALYZE the temporary table? No! I have to do this "strange" and "long" statement also a view thousand times ... so I always have to create the temp table again and again and if I index the table it costs me more time. As far I am relativly new to SQL, I ask you: Where's the difference between and indexed temp (!!!) table and an unindexed one? The main table is of course fully indexed ;-) Cheers, Markus
On Wed, 2003-07-09 at 15:51, markus brosch wrote: > On Wed, 2003-07-09 at 17:45, Rod Taylor wrote: > > > Nobody a better idea? Why is the join of a temporary table (the IN > > > paramters) and the original table so slow? Any tricks here? > > > > Did you index and ANALYZE the temporary table? > > No! > > I have to do this "strange" and "long" statement also a view thousand > times ... so I always have to create the temp table again and again and > if I index the table it costs me more time. So.. which costs more. Building the index + fast query or doing the slow query? If you have anything more than a thousand lines in the temp table, I bet indexing it would be the better way to go. > As far I am relativly new to SQL, I ask you: > Where's the difference between and indexed temp (!!!) table and an > unindexed one? The main table is of course fully indexed ;-) One has an index, which means that other (much faster) methods may be available for the database to use.
It's me again - with the easy and complex statement at the same time ;-) Who can help me?! As far, my solution is still unsatisfactory. I'll explain my full task this time, otherwise it is hard to explain! My tables: (redundant for speed reasons - doesn't matter - only requests!) =============================================================== table: goGraph primarykey(parent,child) index (B-tree) on parent column and on child column. parent | child -------------------------- GO:0000001 | GO:0000002 GO:0000001 | GO:0000003 GO:0000002 | GO:0000005 GO:0000002 | GO:0000006 GO:0000003 | GO:0000006 GO:0000003 | GO:0000007 ... | ... about 15000 elements more! This represents a direct acyclic graph: GO:0000001 / \ v v GO:0000002 GO:0000003 / \ / \ v v v v GO:0000005 GO:0000006 GO:0000007 =============================================================== table: binaryInteraction primarykey(baitProtein, preyProtein, gogoKey) index (B-tree) on gogokey. baitProtein | preyProtein | "baitGoId" | "preyGoId" | gogoKey -------------------------------------------------------------- 123 | 123 | GO:0000001 | GO:0000001 | 11 123 | 123 | GO:0000020 | GO:0000001 | 120 123 | 123 | GO:0000020 | GO:0000020 | 2020 123 | 345 | GO:0000001 | GO:0000001 | 11 123 | 345 | GO:0000020 | GO:0000001 | 120 ... | ... | ... | ... | ... up to several millions entries! =============================================================== So, what I have to do is: Someone choose 2 goIds. For these ids and all their children (information from goGraph) I want the count(*) of "fits" within the table binaryInteraction. That means, baitGoId and preyGoId must be either the two choosen goIds or one of the related children. I use the gogoKey for this task: I compute Integers (within Java) from the goIds and sort smallerGoId.concatenate(biggerGoId) - so I've unique keys for baitGo / preyGo pairs. One more problem: The goGraph (see table) is a graph ... and for that reason one node can have more parents. If I choose e.g. GO:000002 and GO:000003 from the example above and want to compute their children, I don't want to take GO:000006 into account (two parents in which I am intested in). That menas, whenever I ask for children of two nodes, I want a DISTINCT SET of children. Example how I am processing the data at the moment: User chosse GoId1: GO:0000005 and GO:0000008; Java: - get all childs of GO:000005 by jdbc from goGraph - collection "one" - get all childs of GO:000006 by jdbc from goGraph - collection "two" - compute intersection of the two collections - substract of each collection the intersection - result: distinct children of two goIds Now the binaryInteraction table came into our game ;-) I apply the one distinct collection "one" and collection "two" to gogoKey: ----------------------- 1. SELECT count(*) FROM (SELECT DISTINCT bait, prey FROM binaryInteraction WHERE gogoKey IN (gogokey1,gogokey2, gogokey3, ... ) ) AS foo; gogokeyX will be processed by java -> it generates for each go-go-combination the unique gogokey. Speed: medium Limit: only 10000 Elements within the IN statement can be processed by default. In postgres.conf I can change this limit, as Rod Taylor already posted. But in the manual there is a warning of BufferOverflow if the value it too high ... If I reach the limit, I get something like: java.sql.SQLException: ERROR: Expression too complex: nesting depth exceeds max_expr_depth = 10000 BTW: Why nested depth? Within IN I compute thousands of "OR"s, but why nested ... ??? ----------------------- 2. for all gogokeys I generate a temporary table and do a join with the binaryInteraction table. SELECT COUNT(*) FROM (SELECT DISTINCT binaryInteraction.bait binaryInteraction.prey FROM binaryInteraction, tempTable WHERE binaryInteraction.gogokey= tempTable.gogokey) AS temp) Speed: very slow, but no limit! ----------------------- After all, I want to compute and precalculate every possible goId-goId-combination ;-) So, each of these things I explained here, I've to do millions of time! I tried to find a recursive solution - impossible! Who can help? Maybe there is a way to compute it in one but more complex sql-statemnt. Me as far I am still new to sql I don't know the "tricks"! If there would be a way to calculate the intersection ... then maybe a solution is not that far ... Also I was thinking about a denormalized goGraph table: parent -> allChilds. Then I compute (within SQL?) the intersection between the 2 GoIds and then I apply it to binaryInteraction. Thanks for any ideas and thanks at least for reading that long story ;-) Cheers Markus PS: If you have any additional questions: please feel free to ask!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > ... I don't want to take GO:000006 into account (two parents in which > I am intested in). That menas, whenever I ask for children of two nodes, > I want a DISTINCT SET of children. To start with, you can avoid the Java and do this in SQL: SELECT child FROM gograph WHERE parent='GO:0000002' OR parent='GO:0000005' EXCEPT (SELECT child FROM gograph WHERE parent='GO:0000002' INTERSECT SELECT child FROM gograph WHERE parent='GO:0000005'); And yes, I would certainly start by normalizing things a little bit: CREATE SEQUENCE goid_seq; CREATE TABLE goID ( idname TEXT, id INTEGER NOT NULL DEFAULT nextval('goid_seq') ); INSERT INTO goid(idname) SELECT DISTINCT parent FROM gograph; INSERT INTO goid(idname) SELECT DISTINCT child FROM gograph WHERE NOT EXISTS (SELECT 1 FROM goID WHERE idname = child); CREATE TABLE gomap ( parent INTEGER, child INTEGER ); INSERT INTO gomap SELECT (SELECT id FROM goid WHERE idname=parent), (SELECT id FROM goid WHERE idname=child) FROM gograph As far as the binaryInteraction table, a little more information is needed: how are each of these tables being populated? Why the distinct? Is it because there may be duplicate rows in the table? The reason I as is that it might be better to ue triggers to compute some of the information as it comes in, depending on which tables are changes and how often. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200307151035 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE/FBGrvJuQZxSWSsgRAlEfAKCL4ttDdTRHxPRW9N00nowPh1/q/QCgqrkv e7Ncj4al4aJ4ihktEyweJJo= =Z/rk -----END PGP SIGNATURE-----