Re: [SQL] Joining bug???? - Mailing list pgsql-interfaces

From Gene Selkov Jr.
Subject Re: [SQL] Joining bug????
Date
Msg-id 199810300110.TAA04192@antares.mcs.anl.gov
Whole thread Raw
List pgsql-interfaces
> hi Gene,
>    Would you mind sharing some code samples, so i could attempt to do
> the same thing...

OK, but first let me describe my database setup. Yours may be entirely different, to the extent it would be difficult
tojustify using such method. Basically, I use the tables in postgres database not as a primary storage but as multi-key
indicesto objects stored elsewhere. That's similar to how postgres handles large objects. The difference is that the
objectsI have to deal with are rather complex and usually don't have adequate relational representation, but can be
parsedto extract and index individual attributes. Each attribute is stored in a separate table together with its
locationdata. It is sort of a keyword search system. 

In this setup, the task of retrieving an object that has an attribute with particular value, or a range of values, is
trivial,but it is not so if you are interested in using multi-attribute constraints. That task would be better
addressedby a multi-column table, but since my objects do not conform to any particular format and there are random
combinationsof more than 400 attributes, I can't afford a single 400-column table with millions of variable length
cells,some of them recursively structured.  

So, it seems like the only way of retrieving the right objects is to use multiple joins on the tables representing
individualattributes. I still have to find out whether I can trust postgres optimizer to do a join between 20 tables,
oreven 10 -- that's the most typical constraint size that I have to deal with. 

> I am in the process of getting more in tune with perl hashes.

~~~~~~~~~~~~~~~~~~~~
1. If you did not use perl hashes before, here's the idea: a hash object in perl is a two-column table, with unique
hash-indexedvalues in the first column, from which you can extract one row at a time, using an equivalent of the '='
operator.It's pretty restrictive, but the advantage of using hashes in perl to post-process your SQL query is that they
areincredibly fast. To place a value in the hash %h, you use the assignment $h{'key'} = 'value', or $h{$key} = $value
ifyou want to use variables. To retrieve the value back, use expressions like $val = $h{'a'}; to test for existence of
akey, use something like this: if ($h{'a'}) { ... } 

This illustrates the second advantage of hashes: easy programming (although a bit cryptic for the beginners).

Now, here is the simplest example: your client program obtained one set of keys in array @a and the second set in array
@b(using Pg.pm or any other client library). Intersection of @a and @b is computed like this (a classical perl
example):http://www.pun.org/bram/Class/Perl/2/Lab_Hash.html 

~~~~~~~~~~~~~~~~~~~~
2. For the real life example of the use of intersection with multiple attributes, please refer to my metabolic pathways
page:http://wit.mcs.anl.gov/MPW/ 
The CGI that is doing this job is here: http://wit.mcs.anl.gov/~selkovjr/intersect.tgz
The script was is done in an extremely ad hoc fashion, with multiple repetitions in the code, but the essence is this
(therest is preparation of the queries and error handling): 

For each query term, you get the result (key set) and place it into a hash, %match:

  if ( $result->ntuples > 0 ) {
    for $i ( 0 .. $result->ntuples - 1 ) {
      $match{$result->getvalue($i, 0)}++;
    }
  }
  $terms++;

After all individual queries are executed, each key in %match will point to the number of times that key was returned
bythe queries. Naturally, this number must be equal to the number of query terms in order for that key to be supplied
tothe output: 

$cgi->print( "<p><pre>\n");
foreach ( keys %match ) {
  if ( $match{$_} == $terms ) {
    $cgi->print( "<a href=\"/WIT2/CGI/pw.cgi?&pw=$_\">$_</a>\n");
  }
}

This example is very simple, and although it does its job for the fixed-type query form, where all criteria are
supposedto be ANDed, it is not generic enough. Here is a more involved one: 

~~~~~~~~~~~~~~~~~~~~~~
3. Using boolean expressions to control set operations on the query results. Example is this (hope your mailer does not
screwthe url because of its length): 


http://wit.mcs.anl.gov/EMP/select_emp_advanced.cgi?E1.ec_code=ec&E1.ec_code.op=%7E%09is+in+range&E1.ec_code.patt=1.1&ec_code.count=1&T1.text=ccc&T1.text.op=%7E*%09matches+regex.%2C+case-insensitive&T1.text.patt=bacteria&T2.text=tax&T2.text.op=%7E*%09matches+regex.%2C+case-insensitive&T2.text.patt=fungi&T3.text=phd&T3.text.op=%7E*%09matches+regex.%2C+case-insensitive&T3.text.patt=v&text.count=1&N1.seg=pho&N1.seg.op=%7E%09contained+in&N1.seg.patt=7.5+..+8&seg.count=1&constraint=%28T1+%7C+T2%29+%26+%28N1+%26%26+%21T3%29+%26+E1&do=Run+the+query

The CGI for doing this kind of queries and its support modules can be obtained here:
http://wit.mcs.anl.gov/~selkovjr/boolean.tgz
To build the parser, unpack the archive and cd to boolean/, then type

    perl Makefile.PL; make

It will build the module BoolParser.pm, also available at http://wit.mcs.anl.gov/~selkovjr/BoolParser.pm

The upper directory (where CGI runs) has a symlink to BoolParser.pm (to avoid global installation), but you can
substituteit with the real BoolParser.pm. This module is used by Parser.pm which is used by the cgi script and is sort
ofobject-oriented encapsulation for BoolParser.pm, and this is where user actions for set operations are implemented,
forexample, this rule in BoolParser.y 

  | expr STRICT_AND expr {
    $$ = scalar:SELF->actionStrictAnd($1, $3);
  }

invokes the method actionStrictAnd() in Parser.pm, which does the actual job (doesn't it look like an example from the
Camelbook?): 

sub actionStrictAnd {
  my ($self, $ar1, $ar2) = @_;
  grep ($tmp{$_}++, @$ar2);
  @result = grep ($tmp{$_}, @$ar1);
  $self->{SIZE} = $#result;
  return \@result;
}

Oops! Almost forgot to say that you will need http://wit.mcs.anl.gov/~selkovjr/camel-1.24.tar.gz to build BoolParser.pm
fromBoolParser.y (It is actually bison-1.24 disguised as "camel". The original bison can't build parsers in perl) 

This is a very quick hack and I did not give it a lot of testing (imagine, I had two days left before the publication
deadlinewhen I realized that multiple joins are doing me no good). Also, doing such things on the client side is a
completeheresy as far as SQL goes, but, considering the circumstances, I think it is not an unreasonable solution. It
neverlet me down and I never had to wait more than a few seconds for the most complex queries. 


--Gene

pgsql-interfaces by date:

Previous
From: Byron Nikolaidis
Date:
Subject: Re: Standalone driver for Unix
Next
From: "Ken J. Wright"
Date:
Subject: loosing connection with serial port connection