Thread: Multiple IN
Maybe this is a silly question. I have a table t with two fields f1 and f2. Those fields together form the primary key. I want to select a list of pairs from that table. The simplest SQL clause would be SELECT * FROM t WHERE (f1 = v1 AND f2 = v2) OR (f1 = v3 AND f2 = v4) ...; When that list is long, on the one hand I suppose it won't have a good performance; on the other, the resultant clause is clumsy and too long. Is there any other (more elegant) solution for this? If the field would be just one, a simple IN would do better, but as far as I know there is no IN for multiple fields. Thank you in advance. Leandro Fanzone
On Tue, Dec 11, 2001 at 12:43:08PM -0300, Leandro Fanzone <leandro@hasar.com> wrote: > > When that list is long, on the one hand I suppose it won't have a good > performance; on the other, the resultant clause is clumsy and too long. > Is there any other (more elegant) solution for this? If the field would > be just one, a simple IN would do better, but as far as I know there is > no IN for multiple fields. I am not sure if this is a new feature or just more complete documenation, but you might want to take a look at this from the 7.2 docs: http://developer.postgresql.org/docs/postgres/functions-subquery.html
Leandro, > Maybe this is a silly question. I have a table t with two fields f1 > and > f2. Those fields together form the primary key. I want to select a > list > of pairs from that table. The simplest SQL clause would be > > SELECT * FROM t WHERE (f1 = v1 AND f2 = v2) OR (f1 = v3 AND f2 = v4) > ...; We'll need database schema (table designs) if you want us to help you. The above isn't enough information to go on. For example, where did "v1" come from? Is this a field? An interface language variable? Please be explicit! Also, consider picking up Joe Celko's SQL for Smarties, reviewed at http://techdocs.postgresql.org/bookreviews.php -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
I'm sorry for not being precise. The table is simple enough: it has only those two integer fields (f1, f2) and a third text field f3 that has the value I'm looking for. Fields f1 and f2 together are the primary key. An example: f1 f2 f3 1 1 'a' 1 2 'b' 2 1 'c' 2 4 'd' .. .. ... 8 5 'z' etc. Now I have a list of integer pairs to match f1 and f2 from that table; let's say I have those values in a vector. For example: { (1, 2), (2, 1), (8, 5) } I want the corresponding return values for f3: { b, c, z } The problem is that I have to match the pair, so my SQL again would be SELECT f3 FROM t WHERE (f1 = 1 AND f2 = 2) OR (f1 = 2 AND f2 = 1) OR (f1 = 8 OR f2 = 5); for the former example. I can guarantee that all the values I put in the OR list exist in the table, and I want to know the f3 of each one, exactly. The values from the list are arbitrary or random, and are not coming from a SELECTion. I want to avoid the AND/OR list, because the vector could be very long. Thank you again for your help. Leandro Fanzone. Josh Berkus wrote: > Leandro, > > > Maybe this is a silly question. I have a table t with two fields f1 > > and > > f2. Those fields together form the primary key. I want to select a > > list > > of pairs from that table. The simplest SQL clause would be > > > > SELECT * FROM t WHERE (f1 = v1 AND f2 = v2) OR (f1 = v3 AND f2 = v4) > > ...; > > We'll need database schema (table designs) if you want us to help you. > The above isn't enough information to go on. For example, where did > "v1" come from? Is this a field? An interface language variable? > > Please be explicit! Also, consider picking up Joe Celko's SQL for > Smarties, reviewed at http://techdocs.postgresql.org/bookreviews.php > > -Josh > > ______AGLIO DATABASE SOLUTIONS___________________________ > Josh Berkus > Complete information technology josh@agliodbs.com > and data management solutions (415) 565-7293 > for law firms, small businesses fax 621-2533 > and non-profit organizations. San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Leandro Fanzone <leandro@hasar.com> writes: > The problem is that I have to match the pair, so my SQL again would be > SELECT f3 FROM t > WHERE (f1 = 1 AND f2 = 2) OR (f1 = 2 AND f2 = 1) OR (f1 = 8 OR f2 = 5); > for the former example. I can guarantee that all the values I put in the OR > list exist in the table, and I want to know the f3 of each one, exactly. The > values from the list are arbitrary or random, and are not coming from a > SELECTion. I want to avoid the AND/OR list, because the vector could be very > long. Actually, that is the standard way of doing it, and the performance is likely to be less bad than you think. If f1/f2 are indexed (which they would be, since they're the primary key) you should get a plan like this: regression=# create table foo (f1 int, f2 int, primary key (f1,f2)); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'foo_pkey' for table 'foo' CREATE regression=# explain regression-# SELECT * FROM foo WHERE regression-# (f1 = 1 and f2 = 2) or regression-# (f1 = 3 and f2 = 4) or regression-# (f1 = 1 and f2 = 4); NOTICE: QUERY PLAN: Index Scan using foo_pkey, foo_pkey, foo_pkey on foo (cost=0.00..14.51 rows=1 width=8) EXPLAIN The multiple index scan represents three successive index probes using the three OR'd parts of the WHERE condition. You will eventually see it switch over to a seqscan if you get up into many hundreds or thousands of OR'd conditions, but at that point I'd suggest that you need to stick the probe values into a temp table and do a join. regards, tom lane
Leandro, There are several ways you can do your 2-column comparison. as Tom pointed out, there is nothing wrong with the query as you suggested it. Let me outline a few of the options: 1. Use your interface language (what are you using?) to generate an ad-hoc query along the line you already suggested, using a loop in the interface language to generate the query. SELECT tablea.text FROM tablea WHERE (f1=v1 and f2=v2) OR (f1=v3 and f2=v4) OR ... This query will work fine, with the caveat that a few interface technologies (Microsoft ODBC for one) will restrict the total length of your query, so test for that. Certainly the above query formation is likely to be the fastest to execute on large tables (> 10000 rows), as it involves no iterative loops or subqueries. 2. Load all the values into a "lookups" temporary table from the interface. Then match against that: CREATE TEMPORARY TABLE lookups ( v1 INT, v2 INT ); interface language loop: INSERT INTO lookups VALUES ( $v1, $v2 ) end loop; SELECT tablea.text FROM tablea JOIN lookups ON tablea.f1=lookups.v1 and tablea.f2=lookups.v2 ORDER BY tablea.text; This has the advantage of requiring less dynamic query generation from you. It also would probably be a better approach if you are going to display the results in sets (SELECT ... LIMIT 25 OFFSET 50;). However, on a straight query, you are likely to get much slower results from this method; it requires 4 to dozens of database commands instead of one, and temporary tables cannot be effectively indexed. There are also some headaches associated with temporary table management. 3. You could create a PL/pgSQL function which would accept two arrays of values and return true as soon as it found a match: CREATE FUNCTION match_dual_array ( INT, INT, INT[], INT[]) RETURNS BOOLEAN AS ' ... compare each f1 + f2 pair against each v1[] + v2[] array. Return True if any match, and false if no match. Then: SELECT tablea.text FROM tablea WHERE match_dual_array(f1, f2, v1[], v2[]); While this function approach has a certain amount of elegance about it (much simpler queries, for one thing), performance-wise, it's likely to be the slowest of the three approaches on very large datasets due to the necessity of procedurally looping through the array for each row in tablea. If, however, you have relatively few rows in tablea (a few hundred) this might be the approach for you. FURTHER READING: Joe Celko's "SQL for Smarties" see: http://techdocs.postgresql.org/bookreviews.php Roberto Mello's and others' contributions to PL/pgSQL documentation and sample functions: http://techdocs.postgresql.org/ Have fun! -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Well, thank you very much for all your help. I was certainly worried for the OR list, but I see there's no fundamental problem with it. I think I stick with the point one of your list. I want it to be as portable as possible, so that discards PL/pgSQL, and as you said, temporary tables for point two are tricky. I'm working with the libpq++ interface, so there's no problem on looping the vector to construct dynamically the query. Thanks again! Leandro Fanzone Josh Berkus wrote: > Leandro, > > There are several ways you can do your 2-column comparison. as Tom > pointed out, there is nothing wrong with the query as you suggested it. > Let me outline a few of the options: > > 1. Use your interface language (what are you using?) to generate an > ad-hoc query along the line you already suggested, using a loop in the > interface language to generate the query. > SELECT tablea.text FROM tablea WHERE (f1=v1 and f2=v2) OR (f1=v3 and > f2=v4) OR ... > This query will work fine, with the caveat that a few interface > technologies (Microsoft ODBC for one) will restrict the total length of > your query, so test for that. Certainly the above query formation is > likely to be the fastest to execute on large tables (> 10000 rows), as > it involves no iterative loops or subqueries. > > 2. Load all the values into a "lookups" temporary table from the > interface. Then match against that: > CREATE TEMPORARY TABLE lookups ( > v1 INT, v2 INT ); > interface language loop: > INSERT INTO lookups VALUES ( $v1, $v2 ) > end loop; > SELECT tablea.text > FROM tablea JOIN lookups ON tablea.f1=lookups.v1 and > tablea.f2=lookups.v2 > ORDER BY tablea.text; > This has the advantage of requiring less dynamic query generation from > you. It also would probably be a better approach if you are going to > display the results in sets (SELECT ... LIMIT 25 OFFSET 50;). However, > on a straight query, you are likely to get much slower results from this > method; it requires 4 to dozens of database commands instead of one, and > temporary tables cannot be effectively indexed. There are also some > headaches associated with temporary table management. > > 3. You could create a PL/pgSQL function which would accept two arrays of > values and return true as soon as it found a match: > CREATE FUNCTION match_dual_array ( INT, INT, INT[], INT[]) RETURNS > BOOLEAN > AS ' ... compare each f1 + f2 pair against each v1[] + v2[] array. > Return True if any match, and false if no match. > Then: > SELECT tablea.text FROM tablea > WHERE match_dual_array(f1, f2, v1[], v2[]); > While this function approach has a certain amount of elegance about it > (much simpler queries, for one thing), performance-wise, it's likely to > be the slowest of the three approaches on very large datasets due to the > necessity of procedurally looping through the array for each row in > tablea. If, however, you have relatively few rows in tablea (a few > hundred) this might be the approach for you. > > FURTHER READING: > Joe Celko's "SQL for Smarties" see: > http://techdocs.postgresql.org/bookreviews.php > > Roberto Mello's and others' contributions to PL/pgSQL documentation and > sample functions: > http://techdocs.postgresql.org/ > > Have fun! > > -Josh > > ______AGLIO DATABASE SOLUTIONS___________________________ > Josh Berkus > Complete information technology josh@agliodbs.com > and data management solutions (415) 565-7293 > for law firms, small businesses fax 621-2533 > and non-profit organizations. San Francisco