How to speed up this "translation" query? - Mailing list pgsql-performance

From tlm
Subject How to speed up this "translation" query?
Date
Msg-id 4d6d2c130608011109k2759ae60wd6dd89bff2e8e4a9@mail.gmail.com
Whole thread Raw
Responses Re: How to speed up this "translation" query?
List pgsql-performance
I need some expert advice on how to optimize a "translation" query (this word choice will become clear shortly, I hope).
 
Say I have a HUMONGOUS table of foreign language "translations" (call it TRANS) with records like these:
 
meaning_id: 1
language_id: 5
translation: jidoosha
 
meaning_id: 1
language_id: 2
translation: voiture
 
meaning_id: 1
language_id: 5
translation: kuruma
 
meaning_id: 2
language_id: 2
translation: chat
 
meaning_id: 2
language_id: 5
translation: neko
 
meaning_id: 2
language_id: 3
translation: katz
 
meaning_id: 3
language_id: 4
translation: pesce
 
meaning_id: 3
language_id: 2
translation: poisson
 
meaning_id: 3
language_id: 5
translation: sakana
 
For the sake of this description, let's assume that the records above are all the records in TRANS (though in fact the number of records in TRANS is really about ten million times greater).
 
Now suppose I have a tiny table called INPUT consisting of single text field (say, word).  E.g. suppose that INPUT looks like this:
 
katz
voiture
pesce
 
Now, let's fix a language_id, say 5.  This is the "target" language_id.  Given this target language_id, and this particular INPUT table, I want the results of the query to be something like this:
 
neko
jidoosha
kuruma
sakana
 
I.e. for each word W in INPUT, the query must first find each record R in TRANS that has W as its translation field; then find each record Q in TRANS whose language_id is 5 (the target language_id) AND has the same meaning_id as R does.  E.g. if W is 'katz', then R is
 
meaning_id: 2
language_id: 3
translation: katz
 
and therefore the desired Q is
 
meaning_id: 2
language_id: 5
translation: neko
 
...and so on.
 
The only difficulty here is that performance is critical, and in real life, TRANS has around 50M records (and growing), while INPUT has typically between 500 and 1000 records.
 
Any advice on how to make this as fast as possible would be much appreciated.
 
Thanks!
 
G.
 
P.S.  Just to show that this post is not just from a college student trying to get around doing homework, below I post my most successful query so far.  It works, but it's performance isn't great.  And it is annoyingly complex, to boot; I'm very much the SQL noob, and if nothing else, at least I'd like to learn to write "better" ( i.e. more elegant, more legible, more clueful) SQL that this:
 
SELECT q3.translation, q2.otherstuff
FROM
(
  SELECT INPUT.word, q1.meaning_id, INPUT.otherstuff
  FROM
  INPUT
  INNER JOIN
  (
    SELECT translation, meaning_id
    FROM TRANS
    WHERE translation IN (SELECT word FROM INPUT)
  ) AS q1
  ON INPUT.word = q1.translation
) AS q2
LEFT JOIN
(
  SELECT translation, meaning_id
  FROM TRANS
  WHERE language_id=5
) AS q3
ON q2.meaning_id=q3.meaning_id;
 
As you can see, there are additional fields that I didn't mention in my original description (e.g. INPUT.otherstuff).  Also the above is actually a subquery in a larger query, but it is by far, the worst bottleneck.  Last, there's an index on TRANS(translation).
 

pgsql-performance by date:

Previous
From: Arjen van der Meijden
Date:
Subject: Re: PostgreSQL scalability on Sun UltraSparc T1
Next
From: Niklas Johansson
Date:
Subject: Re: How to speed up this "translation" query?