Re: [SQL] Joining bug???? - Mailing list pgsql-interfaces
From | Gene Selkov Jr. |
---|---|
Subject | Re: [SQL] Joining bug???? |
Date | |
Msg-id | 199810300110.TAA04192@antares.mcs.anl.gov Whole thread Raw |
List | pgsql-interfaces |
> hi Gene, > Would you mind sharing some code samples, so i could attempt to do > the same thing... OK, but first let me describe my database setup. Yours may be entirely different, to the extent it would be difficult tojustify using such method. Basically, I use the tables in postgres database not as a primary storage but as multi-key indicesto objects stored elsewhere. That's similar to how postgres handles large objects. The difference is that the objectsI have to deal with are rather complex and usually don't have adequate relational representation, but can be parsedto extract and index individual attributes. Each attribute is stored in a separate table together with its locationdata. It is sort of a keyword search system. In this setup, the task of retrieving an object that has an attribute with particular value, or a range of values, is trivial,but it is not so if you are interested in using multi-attribute constraints. That task would be better addressedby a multi-column table, but since my objects do not conform to any particular format and there are random combinationsof more than 400 attributes, I can't afford a single 400-column table with millions of variable length cells,some of them recursively structured. So, it seems like the only way of retrieving the right objects is to use multiple joins on the tables representing individualattributes. I still have to find out whether I can trust postgres optimizer to do a join between 20 tables, oreven 10 -- that's the most typical constraint size that I have to deal with. > I am in the process of getting more in tune with perl hashes. ~~~~~~~~~~~~~~~~~~~~ 1. If you did not use perl hashes before, here's the idea: a hash object in perl is a two-column table, with unique hash-indexedvalues in the first column, from which you can extract one row at a time, using an equivalent of the '=' operator.It's pretty restrictive, but the advantage of using hashes in perl to post-process your SQL query is that they areincredibly fast. To place a value in the hash %h, you use the assignment $h{'key'} = 'value', or $h{$key} = $value ifyou want to use variables. To retrieve the value back, use expressions like $val = $h{'a'}; to test for existence of akey, use something like this: if ($h{'a'}) { ... } This illustrates the second advantage of hashes: easy programming (although a bit cryptic for the beginners). Now, here is the simplest example: your client program obtained one set of keys in array @a and the second set in array @b(using Pg.pm or any other client library). Intersection of @a and @b is computed like this (a classical perl example):http://www.pun.org/bram/Class/Perl/2/Lab_Hash.html ~~~~~~~~~~~~~~~~~~~~ 2. For the real life example of the use of intersection with multiple attributes, please refer to my metabolic pathways page:http://wit.mcs.anl.gov/MPW/ The CGI that is doing this job is here: http://wit.mcs.anl.gov/~selkovjr/intersect.tgz The script was is done in an extremely ad hoc fashion, with multiple repetitions in the code, but the essence is this (therest is preparation of the queries and error handling): For each query term, you get the result (key set) and place it into a hash, %match: if ( $result->ntuples > 0 ) { for $i ( 0 .. $result->ntuples - 1 ) { $match{$result->getvalue($i, 0)}++; } } $terms++; After all individual queries are executed, each key in %match will point to the number of times that key was returned bythe queries. Naturally, this number must be equal to the number of query terms in order for that key to be supplied tothe output: $cgi->print( "<p><pre>\n"); foreach ( keys %match ) { if ( $match{$_} == $terms ) { $cgi->print( "<a href=\"/WIT2/CGI/pw.cgi?&pw=$_\">$_</a>\n"); } } This example is very simple, and although it does its job for the fixed-type query form, where all criteria are supposedto be ANDed, it is not generic enough. Here is a more involved one: ~~~~~~~~~~~~~~~~~~~~~~ 3. Using boolean expressions to control set operations on the query results. Example is this (hope your mailer does not screwthe url because of its length): http://wit.mcs.anl.gov/EMP/select_emp_advanced.cgi?E1.ec_code=ec&E1.ec_code.op=%7E%09is+in+range&E1.ec_code.patt=1.1&ec_code.count=1&T1.text=ccc&T1.text.op=%7E*%09matches+regex.%2C+case-insensitive&T1.text.patt=bacteria&T2.text=tax&T2.text.op=%7E*%09matches+regex.%2C+case-insensitive&T2.text.patt=fungi&T3.text=phd&T3.text.op=%7E*%09matches+regex.%2C+case-insensitive&T3.text.patt=v&text.count=1&N1.seg=pho&N1.seg.op=%7E%09contained+in&N1.seg.patt=7.5+..+8&seg.count=1&constraint=%28T1+%7C+T2%29+%26+%28N1+%26%26+%21T3%29+%26+E1&do=Run+the+query The CGI for doing this kind of queries and its support modules can be obtained here: http://wit.mcs.anl.gov/~selkovjr/boolean.tgz To build the parser, unpack the archive and cd to boolean/, then type perl Makefile.PL; make It will build the module BoolParser.pm, also available at http://wit.mcs.anl.gov/~selkovjr/BoolParser.pm The upper directory (where CGI runs) has a symlink to BoolParser.pm (to avoid global installation), but you can substituteit with the real BoolParser.pm. This module is used by Parser.pm which is used by the cgi script and is sort ofobject-oriented encapsulation for BoolParser.pm, and this is where user actions for set operations are implemented, forexample, this rule in BoolParser.y | expr STRICT_AND expr { $$ = scalar:SELF->actionStrictAnd($1, $3); } invokes the method actionStrictAnd() in Parser.pm, which does the actual job (doesn't it look like an example from the Camelbook?): sub actionStrictAnd { my ($self, $ar1, $ar2) = @_; grep ($tmp{$_}++, @$ar2); @result = grep ($tmp{$_}, @$ar1); $self->{SIZE} = $#result; return \@result; } Oops! Almost forgot to say that you will need http://wit.mcs.anl.gov/~selkovjr/camel-1.24.tar.gz to build BoolParser.pm fromBoolParser.y (It is actually bison-1.24 disguised as "camel". The original bison can't build parsers in perl) This is a very quick hack and I did not give it a lot of testing (imagine, I had two days left before the publication deadlinewhen I realized that multiple joins are doing me no good). Also, doing such things on the client side is a completeheresy as far as SQL goes, but, considering the circumstances, I think it is not an unreasonable solution. It neverlet me down and I never had to wait more than a few seconds for the most complex queries. --Gene
pgsql-interfaces by date: