Home > mailing lists

Re: The standard 'why does it take so long' question - Mailing list pgsql-general

From	Nigel J. Andrews
Subject	Re: The standard 'why does it take so long' question
Date	August 9, 2002 07:12:53
Msg-id	Pine.LNX.4.21.0208091136240.3235-100000@ponder.fairway2k.co.uk Whole thread
In response to	Re: The standard 'why does it take so long' question (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: The standard 'why does it take so long' question
List	pgsql-general

Tree view


First I'll just say thanks to Joe Conway for the suggestion of an index on the
poster_id column. Although I didn't show it I do have an index on that field as
well, however, as I understand it the index on (poster_id,time) should fulfil
this same function.


On Fri, 9 Aug 2002, Tom Lane wrote:

> "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> > The first question is why would the index scan on chat_user take
> > significantly longer than before?
>
> I think the indexscan is having to skip over more irrelevant data in the
> second case.  Think about it: you can scan an index range consisting of
> a single user's posts between times T1 and T2, or you can scan an index
> range consisting of all posts between times T1 and T2.  The second
> column of the index will save you from actually going to the heap for
> posts from other users, but you'll still have to pass over those index
> entries, because the contiguous range of index entries that covers the
> data you want will include a a lot of posts from other users.

That was the idea that popped into my head while I was writing the questions so
that certainly seems reasonable to me.

>
> What interests me is why the planner chose the second index when it
> had a choice; I'd have thought its cost models were good enough to
> handle this subtlety.  Can you post the pg_stats rows for the columns
> in question?

I really don't know sufficient to be able to see why the stats would favour one
index over the other. Although looking at the pg_stats entries below now I
notice that the correlation for the time column is 1, compared to 0.058 for the
poster_id. As there is a natural order to the data, i.e. it's loaded in time
order, I am not entirely surprised that there's a high correlation, just
slightly surprised the stats gatherer picked up on it.

Note that the poster_ids for the example query are also shown below and are not
listed in the most_common_vals array. I've also shown the human understandable
values for the time limits in the query and due to the nature/distribution of
this column it's unsurprising to find that I managed to pick a time period that
didn't include one of the more common times.

So, what's making the planner choose the slower index is the correlation figure
right?


archive=> select * from pg_stats where attname = 'time' or attname = 'poster_id' and tablename = 'chat_post';
 tablename |  attname  | null_frac | avg_width | n_distinct |
                     most_common_vals                                                                                 |
                                  most_common_freqs                                    |
                                                                                                  histogram_bounds

   | correlation 

-----------+-----------+-----------+-----------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 chat_post | poster_id |         0 |         2 |        338 | {2149,731,1130,2595,1879,1656,688,1842,2473,1747}
                                                                                                                      |
{0.0603333,0.0306667,0.0306667,0.03,0.0286667,0.0276667,0.0273333,0.025,0.02,0.0196667}|
{4,252,521,819,1112,1423,1639,1820,2096,2427,2879}

                                  |   0.0583958 
 chat_post | time      |         0 |         8 |  -0.350654 | {"1998-03-26 11:09:00-05","1999-05-13
00:49:00-04","2000-12-1413:05:00-05","2001-01-25 12:18:00-05","2002-04-04 11:37:00-05","2002-05-28
15:21:00-04","2002-07-1009:53:00-04"} |
{0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}  | {"1998-03-06
01:26:00-05","1998-08-2111:00:00-04","1999-02-22 15:15:00-05","1999-07-22 05:37:00-04","2000-01-24
09:53:00-05","2000-10-2317:45:00-04","2001-02-26 16:22:00-05","2001-06-28 20:19:00-04","2001-12-06
14:06:00-05","2002-04-2616:05:00-04","2002-08-09 00:10:00-04"} |           1 
(2 rows)

archive=> select * from chat_user where name in ('thrifty', 'hope1');
  id  |  name   | lower_name
------+---------+------------
 2494 | thrifty | thrifty
 1112 | hope1   | hope1
(2 rows)


archive=> select 'epoch'::timestamptz + '959904000 seconds'::interval as from, 'epoch'::timestamptz + '1023667200
seconds'::intervalas to; 
          from          |           to
------------------------+------------------------
 2000-06-02 01:00:00+01 | 2002-06-10 01:00:00+01
(1 row)


Thanks for the input Tom.


--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

pgsql-general by date:

From: Nigel J. Andrews
Date: 09 August 2002, 06:49:00
Subject: [GENERAL] The standard 'why does it take so long' question

From: "Cornelia Boenigk"
Date: 09 August 2002, 07:24:29
Subject: Re: Warning: PostgreSQL query failed: ERROR: [my_tabel]: Permission denied in [my_php_program]

Re: The standard 'why does it take so long' question - Mailing list pgsql-general

Previous

Next