Re: Weird indices - Mailing list pgsql-general

From Joseph Shraibman
Subject Re: Weird indices
Date
Msg-id 3A93103B.8F549B94@selectacast.net
Whole thread Raw
In response to Re[2]: Weird indices  (Jean-Christophe Boggio <cat@thefreecat.org>)
Responses Re: Weird indices  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Jean-Christophe Boggio wrote:
>
> Joseph,
>
> I think you're going a bit too far... Tom and Stephan have been very
> patient explaining you the basics of indices.
>
They are being patient.  I thank them.  I'm not trying to point fingers
at anybody and say 'you idiot! why aren't you thinking like me?'.  It's
just that this whole visibility meets index meets planner thing is still
eluding me.

> >> The name of the game here is to make a plan *without* actually going
> >> out and expending large amounts of time to find out the true state of
> >> affairs; by the time you know for sure, you've already done the query.
>
> Believe this. All the best DB engines including PostgreSQL work that
> way. This is based on measures, on real life.
>
I believe it.  I just seemed to me that that looking at the index should
have been before the cutoff point.  And actually seemed to be in my
little experiment.

> JS> Well I'd hope that extracting the count from the index should be very
> JS> low cost. That is what indecies are for.
>
> No, indices are made for finding a record in one go or for isolating a
> small range of values.
>

When I learned data structures in cs 102 I learned how to use an index
to quickly get a count.  I assume the postgres developers know this as
well.

> JS> But certain things could be done.  Like planning for the case of there
> JS> being a single not null value, and updating the indecies not to point at
> JS> expired rows.
>
> And then you'll ask when there are 2 not null values...?

I think I mispoke myself there.  I meant the case of there being the
single repeated value.  I can think offhand of a few ways to do it but I
don't know how postgres uses indeces.
>
> JS> Isn't the point of a vacuum to get rid of old rows?  Then
> JS> why doesn't it update the index as well?
>
> It does. Look at vacuum verbose.
>
In my test case it didn't.  My database is vacuumed every night, and I
hadn't touched it that day before I did my test.  Therefore it should
have known there were only 16 matches and not 52.

> JS> I mean the explain shows that getting the count(*) from the field that
> JS> is indexed has to do a seq scan, presumably to determine if the rows are
> JS> in fact valid.
>
> count(*) means you want all the rows that have all the fields "not
> null". Read carefully : ALL THE FIELDS.

Uh, no.  Tom said so in a different reply.

I understand that keeping different views for different open
transactions can be difficult, but after a transaction  that updates a
row is over why isn't the row marked as 'universally visible' for all
new transactions until another update occurs?

Maybe I'm not making myself understood.  Another way of asking the same
thing:
Say there is a transaction that is looking at a non-current version of a
row.  'non-current' could be the value it was at the start of the
transaction (and was updated by another transaction) or was updated by
this transaction but not committed yet.  When this transaction is over
is it really that hard to get rid of the refrence to the old version of
the row?  There should be a 1 bit field 'is old value and isn't being
used by any transaction'.  Is that really hard?

Maybe this is part of the whole 'vacuum later' vs. 'update now'
philosophy.  If the point of vacuum later is to put off the performance
hit until later if it is causing these performance hits on queries
because index scans aren't being used then doesn't that mean 'update
now' is more likely to pay off in the short run?


--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com

pgsql-general by date:

Previous
From: Joseph Shraibman
Date:
Subject: Re: Weird indices
Next
From: Stephan Szabo
Date:
Subject: Re: Weird indices