Re: faster way to calculate top "tags" for a "resource" based on a column - Mailing list pgsql-general

From Marc Mamin
Subject Re: faster way to calculate top "tags" for a "resource" based on a column
Date
Msg-id B6F6FD62F2624C4C9916AC0175D56D8828AF19D3@jenmbs01.ad.intershop.net
Whole thread Raw
In response to faster way to calculate top "tags" for a "resource" based on a column  (Jonathan Vanasco <postgres@2xlp.com>)
Responses Re: faster way to calculate top "tags" for a "resource" based on a column
List pgsql-general
>I've been able to fix most of my slow queries into something more acceptable, but I haven't been able to shave any
timeoff this one.  I'm hoping someone has another strategy. 
>
>I have 2 tables:
>        resource
>        resource_2_tag
>
>I want to calculate the top 25 "tag_ids" in "resource_2_tag " for resources that match a given attribute on the
"resource"table. 
>
>both tables have around 1.6million records.
>
>If the database needs to warm up and read into cache, this can take 60seconds to read the data off disk.
>If the database doesn't need to warm up, it averages 1.76seconds.
>
>The 1.76s time is troubling me.
>Searching for the discrete elements of this is pretty lightweight.
>
>here's an explain --  http://explain.depesz.com/s/PndC
>
>I tried a subquery instead of a join, and the query optimized the plan to the same.
>
>i'm hoping someone will see something that I just don't see.
>
>
>
>      Table "public.resource_2_tag"
>        Column         |  Type   | Modifiers
>-----------------------+---------+-----------
> resource_id           | integer |
> tag_id                | integer |
>Indexes:
>    "_idx_speed_resource_2_tag__resource_id" btree (resource_id)
>    "_idx_speed_resource_2_tag__tag_id" btree (tag_id)
>
>                                                  Table "public.resource"
>               Column                |            Type             |                        Modifiers
      

>-------------------------------------+-----------------------------+----------------------------------------------------------
> id                                  | integer                     | not null default
nextval('resource_id_seq'::regclass)
>resource_attribute1_id               | integer                     |
>lots of other columns                |                             |
>Indexes:
>        "resource_attribute1_idx" btree (resource_attribute1_id)
>
>--------------------------------------------------------------------------------
>
>select count(*) from resource;
>-- 1669729
>
>select count(*) from resource_2_tag;
>-- 1676594
>
>select count(*) from resource where resource_attribute1_id = 614;
>-- 5184
>-- 4.386ms
>
>select id from resource where resource_attribute1_id = 614;
>-- 5184
>-- 87.303ms
>
>popping the 5k elements into an "in" clause, will run the query in around 100ms.
>
>
>EXPLAIN ANALYZE
>SELECT
>        resource_2_tag.tag_id AS resource_2_tag_tag_id,
>        count(resource_2_tag.tag_id) AS counted
>FROM
>        resource_2_tag
>JOIN resource ON resource.id = resource_2_tag.resource_id
>WHERE
>        resource.resource_attribute1_id = 614
>GROUP BY resource_2_tag.tag_id
>ORDER BY counted DESC
>LIMIT 25 OFFSET 0;

Hi,
it seems to me that your subquery may deliver duplicate ids.
And with the selectivity of your example, I would expect an index usage
instead of a table scan. You may check how up to date your statistics are
and try to raise the statistic target on the column resource_2_tag.tag_id.
Also try a CTE form for your query:


WITH A as (SELECT DISTINCT id FROM resource
           WHERE resource_attribute1_id = 614
          )
SELECT
        resource_2_tag.tag_id AS resource_2_tag_tag_id,
        count(resource_2_tag.tag_id) AS counted
FROM
        resource_2_tag
JOIN A ON A.id = resource_2_tag.resource_id
ORDER BY counted DESC
LIMIT 25;

regards,
Marc Mamin

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Really strange foreign key constraint problem blocking delete
Next
From: Sergey Konoplev
Date:
Subject: Re: Converting char to varchar automatically