>>> The goal of my query is: given a book, what did other people who
>>> bought this book also buy? I plan the list the 5 most popular such
>>> books.
You can use the table listing ordered products directly, for example :
table ordered_products: order_id, product_id, quantity
SELECT b.product_id, sum(quantity) as rank FROM ordered_products a,
ordered_products b WHERE a.product_id=(the product id) AND
b.order_id=a.order_id AND b.product_id != a.product_id GROUP BY
b.product_id ORDER BY rank DESC LIMIT 6;
This will need indexes on order_id and product_id that you probably
already have.
It will also be slow.
You can also have a cache table :
cache prod_id_a, prod_id_b, quantity
With a constraint that prod_id_a < prod_id_b
You add a trigger on insert, update or delete to ordered_products to
insert or update rows in this table, modifying the quantity according to
the purchase.
To select you do :
SELECT * FROM
(
(SELECT prod_id_b as pid, quantity FROM cache WHERE prod_id_a=(your id)
ORDER BY prod_id_a DESC, quantity DESC LIMIT 5)
UNION ALL
(SELECT prod_id_a as pid, quantity FROM cache WHERE prod_id_b=(your id)
ORDER BY prod_id_b DESC, quantity DESC LIMIT 5)
) as foo
ORDER BY quantity DESC
LIMIT 5;
It will be probably very fast but the table will grow huge and need
various indexes :
(prod_id_a, quantity)
(prod_id_b quantity)
(prod_id_a, prod_id_b) (the primary key)
You'll get 1/2 * N * (N-1) rows, N being the number of products on your
site. If you remove the constraint prod_id_a < prod_id_b you'll get N^2 rows which is worse.
Another solution :
Table cache : product_id integer, also_purchased integer[]
After every order, update also_purchased with the results of the query
using the self join on ordered_products tables above.
This query should not be fast enough to use in a product webpage but it
shouldn't be slow enough to be used like thi, only when orders are made.
To get the "also purchased products" all you have to do is read a line in
this table.