Re: Q on views and performance - Mailing list pgsql-performance

From Kynn Jones
Subject Re: Q on views and performance
Date
Msg-id c2350ba40802230821t2446a151j3279b8ea901c31b3@mail.gmail.com
Whole thread Raw
In response to Re: Q on views and performance  ("Dean Gibson (DB Administrator)" <postgresql@ultimeth.com>)
Responses Re: Q on views and performance  ("Dean Gibson (DB Administrator)" <postgresql@ultimeth.com>)
List pgsql-performance
Hi, Dean.  The system I'm working with is very similar "in spirit" to a large multilingual dictionary covering 100 languages.  Using this analogy, the "type" column would correspond to the language, and the zipk column would correspond to some language-independent key associated with a concept ("concept key" for short).  So, if it were indeed a multilingual dictionary, records in T would look like

  word   | zipk | language
---------+------+-----------
 house   | 1234 | <english>
 casa    | 1234 | <spanish>
 haus    | 1234 | <german>
 piano   | 2345 | <english>
 piano   | 2345 | <spanish>
 cat     | 3456 | <english>
 chat    | 3456 | <french>
 chat    | 4567 | <english>
 plausch | 4567 | <german>

...where I used the notation <lang> to denote "the integer id assigned to language lang".  Therefore typically there are about 100 records in T for any given zipk, one for each language.  But the correspondence is not perfect, since, for example, some languages have, proverbially, more than one word for snow, and some (maybe from some tropical island in the South Pacific) have none.  (This last case, BTW, is what accounts for the use of left joins, as will become clear in a minute.)

The table S can be thought of a table consisting of a collection of words to be translated to some target language.  In the first type of query (Q1), all the words in S are effectively declared to belong to the same source language, whereas in the second type of query (Q2) the source language for the words in S is left unspecified (in this case S may contain words from various languages, or words--like "piano" or "chat" in the example above--that belong simultaneously to different languages, and which may (e.g. piano) or may not (e.g. chat) have the same zipk [concept key] for each of these languages).

So, regarding your question about (Q1**) and (Q2**):

(Q1**)  SELECT a1.word, sq.word FROM
               S      JOIN T a1 USING ( word )
                 LEFT JOIN ( SELECT * FROM T a2
                             WHERE a2.type = <int2> ) sq USING ( zipk )
         WHERE a1.type = <int1>;

(Q2**)  SELECT a1.word, sq.word FROM
               S      JOIN T a1 USING ( word )
                 LEFT JOIN ( SELECT * FROM T a2
                             WHERE a2.type = <int2> ) sq USING ( zipk )

...the inner join with S is intended to pick out all the records in the source table (either T<int1> in Q1** or T in Q2**) corresponding to words in S, while the second (left) join, is there to find all the "translations" in the target language.  I use a left join so that even those words in S for which no translations exist will show up in the query results.

3. Why not write:

CREATE VIEW txt AS
  SELECT a1.word AS word1, a1.type AS type1, a2.word AS word2, a2.type AS type2
    FROM T a1 [LEFT] JOIN T a2 USING( zipk );  -- Use "LEFT" if appropriate
SELECT word1, word1
  FROM S JOIN txt ON word = word1

  WHERE type1 = <int1> AND type2 = <int2>;


This is would indeed produce the same results as Q1, but this approach would require defining about 10,000 views, one for each possible pair of int1 and int2 (or pair of languages, to continue the multilingual dictionary analogy), which freaks me out for some reason.  (Actually, the number of such views would be many more than that, because in the actual application there is not just one T but several dozen, similar to what would happen to the schema in the multilingual dictionary analogy if we wanted to pre-segregate the words according to some categories, say a T for animals, a T for fruits, a T for verbs, a T for professions, etc.)

(I need to do a bit more work before I can post the EXPLAIN results.)

kynn

pgsql-performance by date:

Previous
From: "Dean Gibson (DB Administrator)"
Date:
Subject: Re: Q on views and performance
Next
From: "Dean Gibson (DB Administrator)"
Date:
Subject: Re: Q on views and performance