Re: Moving postgresql.conf tunables into 2003... - Mailing list pgsql-performance

From Sean Chittenden
Subject Re: Moving postgresql.conf tunables into 2003...
Date
Msg-id 20030807202426.GI94710@perrin.int.nxad.com
Whole thread Raw
In response to Re: Moving postgresql.conf tunables into 2003...  (Manfred Koizar <mkoi-pg@aon.at>)
Responses Re: Moving postgresql.conf tunables into 2003...
List pgsql-performance
> >> I have an experimental patch lying around somewhere that tries to
> >> work around these problems by offering different estimation methods
> >> for index scans.  If you are interested, I'll dig it out.
> >
> >Sure, I'll take a gander... had my head in enough Knuth recently to
> >even hopefully have some kind of a useful response to the patch.
>
> Sean, the patch is at http://www.pivot.at/pg/16-correlation-732.diff.
> A short description of its usage can be found at
> http://archives.postgresql.org/pgsql-performance/2002-11/msg00256.php.
> If you are interested how the different interpolation methods work,
> read the source - it shouldn't be too hard to find.
>
> You might also want to read the thread starting at
> http://archives.postgresql.org/pgsql-hackers/2002-10/msg00072.php.

Hrm... let me bop back in my archives and reply there...  very
interesting work though.  I hope a reasonable algorythm can be found
in time for 7.5, or even 7.4 as this seems to be biting many people
and the current algo is clearly not right.

> >>  does this mean that sensorid, evtime, and action are not
> >> independent?
> >
> >Hrm...  sensorid is sequence and grows proportional with evtime,
> >obviously.
>
> So a *low* sensorid (7) is quite uncommon for a *late* evtime?  This
> would help understand the problem.  Unfortunately I have no clue what
> to do about it.  :-(

Correct.

> >Having spent a fair amount of time looking at the two following plans,
> >it seems as though an additional statistic is needed to change the
> >cost of doing an index lookup when the index is linearly ordered.
>
> I'm not sure I understand what you mean by "index is linearly
> ordered",  but I guess correlation is that statistic you are talking
> about.  However, it is calculated per column, not per index.

If two rows are id's 123456 and 123457, what are the odds that the
tuples are going to be on the same page?  ie, if 123456 is read, is
123457 already in the OS or PostgreSQL's disk cache?

> >Whether CLUSTER does this or not, I don't know,
>
> If you CLUSTER on an index and then ANALYSE, you get a correlation of
> 1.0 (== optimum) for the first column of the index.

Correlating of what to what?  Of data to nearby data?  Of data to
related data (ie, multi-column index?)? Of related data to pages on
disk?  Not 100% sure in what context you're using the word
correlation...

But that value will degrade after time and at what rate?  Does ANALYZE
maintain that value so that it's kept acurrate?  The ANALYZE page was
lacking in terms of implementation details in terms of how many rows
ANALYZE actually scans on big tables, which could dramatically affect
the correlation of a table after time if ANALYZE is maintaining the
correlation for a column.

> > I never heard back from him after getting the runtime down to a
> > few ms.  :-/
>
> Pity!  I'd have liked to see EXPLAIN ANALYSE for
>
>     SELECT *
>       FROM mss_fwevent
>      WHERE sensorid = 7
>        AND evtime > (now() - '6 hours'::INTERVAL)
>        AND NOT action;
>
>     SELECT *
>       FROM mss_fwevent
>      WHERE sensorid = 7
>        AND evtime > (now() - '6 hours'::INTERVAL);
>
>     SELECT *
>       FROM mss_fwevent
>      WHERE evtime > (now() - '6 hours'::INTERVAL);
>
>     SELECT *
>       FROM mss_fwevent
>      WHERE sensorid = 7;

ditto

> > Are indexes
> >on linearly ordered data rebalanced somehow?  I thought CLUSTER only
> >reordered data on disk.  -sc
>
> AFAIK CLUSTER re-creates all indices belonging to the table.

As of 7.3 or 7.4, yes.  -sc

--
Sean Chittenden

pgsql-performance by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: PostgreSQL performance problem -> tuning
Next
From: Scott Cain
Date:
Subject: Re: EXTERNAL storage and substring on long strings