Re: people who buy A, also buy C, D, E - Mailing list pgsql-sql

From PFC
Subject Re: people who buy A, also buy C, D, E
Date
Msg-id op.sszjw2thth1vuj@localhost
Whole thread Raw
In response to Re: people who buy A, also buy C, D, E  (Jan Wieck <JanWieck@Yahoo.com>)
Responses Re: people who buy A, also buy C, D, E
List pgsql-sql

>>> The goal of my query is: given a book, what did other people who
>>> bought this book also buy?  I plan the list the 5 most popular such
>>> books.
You can use the table listing ordered products directly, for example :

table ordered_products:    order_id,  product_id,  quantity

SELECT b.product_id, sum(quantity) as rank FROM ordered_products a,  
ordered_products b WHERE a.product_id=(the product id) AND  
b.order_id=a.order_id AND b.product_id != a.product_id GROUP BY  
b.product_id ORDER BY rank DESC LIMIT 6;

This will need indexes on order_id and product_id that you probably  
already have.
It will also be slow.
You can also have a cache table :

cache    prod_id_a, prod_id_b, quantity
With a constraint that prod_id_a < prod_id_b

You add a trigger on insert, update or delete to ordered_products to  
insert or update rows in this table, modifying the quantity according to  
the purchase.

To select you do :

SELECT * FROM
(
(SELECT prod_id_b as pid, quantity FROM cache WHERE prod_id_a=(your id)  
ORDER BY prod_id_a DESC, quantity DESC LIMIT 5)
UNION ALL
(SELECT prod_id_a as pid, quantity FROM cache WHERE prod_id_b=(your id)  
ORDER BY prod_id_b DESC, quantity DESC LIMIT 5)
) as foo
ORDER BY quantity DESC
LIMIT 5;

It will be probably very fast but the table will grow huge and need  
various indexes :
(prod_id_a, quantity)
(prod_id_b quantity)
(prod_id_a, prod_id_b)    (the primary key)

You'll get 1/2 * N * (N-1) rows, N being the number of products on your  
site. If you remove the constraint  prod_id_a < prod_id_b you'll get N^2 rows which is worse.
Another solution :

Table cache : product_id integer, also_purchased integer[]

After every order, update also_purchased with the results of the query  
using the self join on ordered_products tables above.
This query should not be fast enough to use in a product webpage but it  
shouldn't be slow enough to be used like thi, only when orders are made.

To get the "also purchased products" all you have to do is read a line in  
this table.














pgsql-sql by date:

Previous
From: "Jim Buttafuoco"
Date:
Subject: Re: multiple PK with a non UNIQUE field
Next
From: PFC
Date:
Subject: assorted problems with intarray and other GiST contribs.