Thread: Need some advice on a difficult query

Need some advice on a difficult query

From
Mike Christensen
Date:
Hi all - I'd like some advice on how to write a rather complicated (for me, anyway) query and if there's any nifty Postgres features I can take advantage of in this situation.  Imagine a database, if you will, used to store recipes.  I have a recipes table:

RecipeId
RecipeTitle
RecipeRating

And an ingredients table:

RecipeId
IngredientId
Amount

When the user searches for a new pasta dish, the UI would generate a query something like this:

SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';

I only need the data from the recipes table since I display a summary of the search results and don't load the full recipe until the user clicks on the link.  This works great.  However, I'm now in the process of implementing an ingredient blacklist.  This means NEVER show me any recipes which have one of my blacklisted ingredients, as if I ingest any I will die a painful death.  Or maybe my god will smite me or something.  This table would look like:

UserId
IngredientId

Ok, now the question.  What's the best way to query for my pasta dish above while excluding any recipe that contains one or more of my blacklisted ingredients?  I don't want to join in Ingredients because I'd then have a row for each ingredient of each recipe (which would cause me massive headache and redesign)..   I thought about some weird nested query, but I'm concerned it might be slow (searching for recipes is the central feature of my site and must be fast, fast and fast).  This basically boils down to some weird Cartesian product thing.

I am, however, willing to redesign the mechanism I store blacklisted ingredients with.  I could store this as an array on the Users table, or anything else that might make things easier.  Any advice on this?

Mike

Re: Need some advice on a difficult query

From
Mike Christensen
Date:
Awesome, I'll give this a shot..  Blacklist.UserId will be indexed and all the recipe links are of course already indexed, but I'll run it under the query analyzer to see if there's any other fine tuning needed.  I appreciate your help!

Mike

On Wed, Dec 16, 2009 at 1:27 AM, Timo Klecker <klecker@decoit.de> wrote:

Hi Mike,

 

here is an untested "weird nested query" for your problem:

 

SELECT * FROM Recipes r where lower(RecipeTitle) like lower('%pasta%')

and not exists

(select 1 from ingredients inner join blacklist using (IngredientId) where RecipeId = r.RecipeId and blacklist.UserId = 123 limit 1);

 

Should be quite fast with the right indexes.

 

Timo

 

 

Von: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Im Auftrag von Mike Christensen
Gesendet: Mittwoch, 16. Dezember 2009 09:48
An: pgsql-general@postgresql.org
Betreff: [GENERAL] Need some advice on a difficult query

 

Hi all - I'd like some advice on how to write a rather complicated (for me, anyway) query and if there's any nifty Postgres features I can take advantage of in this situation.  Imagine a database, if you will, used to store recipes.  I have a recipes table:

RecipeId
RecipeTitle
RecipeRating

And an ingredients table:

RecipeId
IngredientId
Amount

When the user searches for a new pasta dish, the UI would generate a query something like this:

SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';

I only need the data from the recipes table since I display a summary of the search results and don't load the full recipe until the user clicks on the link.  This works great.  However, I'm now in the process of implementing an ingredient blacklist.  This means NEVER show me any recipes which have one of my blacklisted ingredients, as if I ingest any I will die a painful death.  Or maybe my god will smite me or something.  This table would look like:

UserId
IngredientId

Ok, now the question.  What's the best way to query for my pasta dish above while excluding any recipe that contains one or more of my blacklisted ingredients?  I don't want to join in Ingredients because I'd then have a row for each ingredient of each recipe (which would cause me massive headache and redesign)..   I thought about some weird nested query, but I'm concerned it might be slow (searching for recipes is the central feature of my site and must be fast, fast and fast).  This basically boils down to some weird Cartesian product thing.

I am, however, willing to redesign the mechanism I store blacklisted ingredients with.  I could store this as an array on the Users table, or anything else that might make things easier.  Any advice on this?

Mike


Re: Need some advice on a difficult query

From
"Timo Klecker"
Date:

Hi Mike,

 

here is an untested "weird nested query" for your problem:

 

SELECT * FROM Recipes r where lower(RecipeTitle) like lower('%pasta%')

and not exists

(select 1 from ingredients inner join blacklist using (IngredientId) where RecipeId = r.RecipeId and blacklist.UserId = 123 limit 1);

 

Should be quite fast with the right indexes.

 

Timo

 

 

Von: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] Im Auftrag von Mike Christensen
Gesendet: Mittwoch, 16. Dezember 2009 09:48
An: pgsql-general@postgresql.org
Betreff: [GENERAL] Need some advice on a difficult query

 

Hi all - I'd like some advice on how to write a rather complicated (for me, anyway) query and if there's any nifty Postgres features I can take advantage of in this situation.  Imagine a database, if you will, used to store recipes.  I have a recipes table:

RecipeId
RecipeTitle
RecipeRating

And an ingredients table:

RecipeId
IngredientId
Amount

When the user searches for a new pasta dish, the UI would generate a query something like this:

SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';

I only need the data from the recipes table since I display a summary of the search results and don't load the full recipe until the user clicks on the link.  This works great.  However, I'm now in the process of implementing an ingredient blacklist.  This means NEVER show me any recipes which have one of my blacklisted ingredients, as if I ingest any I will die a painful death.  Or maybe my god will smite me or something.  This table would look like:

UserId
IngredientId

Ok, now the question.  What's the best way to query for my pasta dish above while excluding any recipe that contains one or more of my blacklisted ingredients?  I don't want to join in Ingredients because I'd then have a row for each ingredient of each recipe (which would cause me massive headache and redesign)..   I thought about some weird nested query, but I'm concerned it might be slow (searching for recipes is the central feature of my site and must be fast, fast and fast).  This basically boils down to some weird Cartesian product thing.

I am, however, willing to redesign the mechanism I store blacklisted ingredients with.  I could store this as an array on the Users table, or anything else that might make things easier.  Any advice on this?

Mike

Re: Need some advice on a difficult query

From
Sam Mason
Date:
On Wed, Dec 16, 2009 at 12:47:36AM -0800, Mike Christensen wrote:
> When the user searches for a new pasta dish, the UI would generate a query
> something like this:
>
> SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';
>
> I only need the data from the recipes table since I display a summary of the
> search results and don't load the full recipe until the user clicks on the
> link.  This works great.  However, I'm now in the process of implementing an
> ingredient blacklist.  This means NEVER show me any recipes which have one
> of my blacklisted ingredients, as if I ingest any I will die a painful
> death.

If you expect the number of blacklisted recipes to be low, the
following may be a good alternative as well:

  SELECT r.*
  FROM recipes r LEFT JOIN (
    SELECT i.recipeid FROM ingredients i, blacklist b
    WHERE i.ingredientid = b.ingredientid
      AND b.userid = 123
    GROUP BY i.recipeid) x ON r.recipeid = x.recipeid
  WHERE r.recipetitle ILIKE '%pasta%'
    AND x.recipeid IS NULL;

Note that it's generally considered bad form to include "*" in the
return of a query when it's code dealing with the response.  Humans can
deal with the columns coming back differently, but code has the habit of
getting confused.

Also, you may want to consider using full text search when searching the
titles.  That ILIKE requires a full table scan and will slow down as
more recipes get added.

--
  Sam  http://samason.me.uk/

Re: Need some advice on a difficult query

From
Mike Christensen
Date:
Thanks!  The queries I wrote in my email were just an example, my actual implementation specifies all column names required and also uses full text search.  I just didn't want to paste in that much cruft :)

I'll do some tests with your technique below and see which works better..

Mike

On Wed, Dec 16, 2009 at 7:01 AM, Sam Mason <sam@samason.me.uk> wrote:
On Wed, Dec 16, 2009 at 12:47:36AM -0800, Mike Christensen wrote:
> When the user searches for a new pasta dish, the UI would generate a query
> something like this:
>
> SELECT * FROM Recipes where RecipeTitle ilike '%pasta%';
>
> I only need the data from the recipes table since I display a summary of the
> search results and don't load the full recipe until the user clicks on the
> link.  This works great.  However, I'm now in the process of implementing an
> ingredient blacklist.  This means NEVER show me any recipes which have one
> of my blacklisted ingredients, as if I ingest any I will die a painful
> death.

If you expect the number of blacklisted recipes to be low, the
following may be a good alternative as well:

 SELECT r.*
 FROM recipes r LEFT JOIN (
   SELECT i.recipeid FROM ingredients i, blacklist b
   WHERE i.ingredientid = b.ingredientid
     AND b.userid = 123
   GROUP BY i.recipeid) x ON r.recipeid = x.recipeid
 WHERE r.recipetitle ILIKE '%pasta%'
   AND x.recipeid IS NULL;

Note that it's generally considered bad form to include "*" in the
return of a query when it's code dealing with the response.  Humans can
deal with the columns coming back differently, but code has the habit of
getting confused.

Also, you may want to consider using full text search when searching the
titles.  That ILIKE requires a full table scan and will slow down as
more recipes get added.

--
 Sam  http://samason.me.uk/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general