index choosing problem - Mailing list pgsql-performance

From Rural Hunter
Subject index choosing problem
Date
Msg-id 4F61A909.20109@gmail.com
Whole thread Raw
Responses Re: index choosing problem  (Ants Aasma <ants.aasma@eesti.ee>)
List pgsql-performance
I have a table with serveral million records. they are divided into
about one hundred catagory(column cid). I created index includes the cid
as the first column. I had a problem with some cids they only have few
records comparing with other cids. Some of them only have serveral
thousand rows. Some queries are not using index on the cids. I got the
explain for the queries.
Note:
article_others_cid_time_style_idx is the index contains cid as the first
column
article_others_pkey is the primary key on an auto incremented column aid.

# select count(*) from article_others;
   count
---------
  6888459
(1 row)

# select count(*) from article_others where cid=74;
  count
-------
   4199
(1 row)

1. # explain select count(*) from article_others where cid=74;
                                                       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=32941.95..32941.96 rows=1 width=0)
    ->  Index Scan using article_others_cid_time_style_idx on
article_others  (cost=0.00..32909.34 rows=13047 width=0)
          Index Cond: (cid = 74)
(3 rows)

2. # explain select aid from article_others where cid=74 limit 10;
                                                       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..25.22 rows=10 width=8)
    ->  Index Scan using article_others_cid_time_style_idx on
article_others  (cost=0.00..32909.34 rows=13047 width=8)
          Index Cond: (cid = 74)
(3 rows)

3. # explain select aid from article_others where cid=74 order by aid
desc limit 10;
                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..1034.00 rows=10 width=8)
    ->  Index Scan Backward using article_others_pkey on article_others
(cost=0.00..1349056.65 rows=13047 width=8)
          Filter: (cid = 74)
(3 rows)

4. # explain select aid from article_others where cid=74 order by aid
desc limit 1;
                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..103.40 rows=1 width=8)
    ->  Index Scan Backward using article_others_pkey on article_others
(cost=0.00..1349060.65 rows=13047 width=8)
          Filter: (cid = 74)
(3 rows)

5. # explain select max(aid) from article_others where cid=74;
                                                         QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
  Result  (cost=104.70..104.71 rows=1 width=0)
    InitPlan 1 (returns $0)
      ->  Limit  (cost=0.00..104.70 rows=1 width=8)
            ->  Index Scan Backward using article_others_pkey on
article_others  (cost=0.00..1365988.55 rows=13047 width=8)
                  Index Cond: (aid IS NOT NULL)
                  Filter: (cid = 74)
(6 rows)

Now the query 3-5 using article_others_pkey are quite slow. The rows for
cid 74 are very old and seldom get updated. I think pg needs to scan
quite a lot on article_others_pkey before it gets the rows for cid 74.
The same query for other cids with new and majority of rows runs very
fast. for example:
# explain select max(aid) from article_others where cid=258;
                                                          QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
  Result  (cost=1.54..1.55 rows=1 width=0)
    InitPlan 1 (returns $0)
      ->  Limit  (cost=0.00..1.54 rows=1 width=8)
            ->  Index Scan Backward using article_others_pkey on
article_others  (cost=0.00..1366260.55 rows=889520 width=8)
                  Index Cond: (aid IS NOT NULL)
                  Filter: (cid = 258)

So I think if pg chooses to use index article_others_cid_time_style_idx
the performance would be much better. or any other solution I can take
to improve the query performance for those cids like 74?
Another question, why the plan shows rows=13047 for cid=74 while
actually it only has 4199 rows? There is almost no data changes for cid
74 and I just vacuum/analyzed the table this morning.

pgsql-performance by date:

Previous
From: Rural Hunter
Date:
Subject: Re: Gin index insert performance issue
Next
From: Ants Aasma
Date:
Subject: Re: index choosing problem