Re: usage of indexes for inner joins - Mailing list pgsql-general

From Jan Theodore Galkowski
Subject Re: usage of indexes for inner joins
Date
Msg-id 1191263733.5341.1213549179@webmail.messagingengine.com
Whole thread Raw
In response to Re: usage of indexes for inner joins  ("Scott Marlowe" <scott.marlowe@gmail.com>)
List pgsql-general
thanks for all your useful comments.  i will study all of them.

a couple of inline comments below, just for clarification to the group,
marked with asterisks.

On Mon, 1 Oct 2007 13:13:23 -0500, "Scott Marlowe"
<scott.marlowe@gmail.com> said:
> On 10/1/07, Jan Theodore Galkowski <bayesianlogic@acm.org> wrote:
> > Scott,
> >
> > i didn't think this belonged in the general list, but the example i
> > gave for discussion was a toy, for illustration.  i could not very
> > well post the actual example for many reasons, including proprietary
> > ones and, given this is how things work, because the 1.5 million row
> > table in question is its own smallest description.
>
> This is the exact kind of question that belongs on -general.  But it
> does get asked a lot, especially by people coming from other
> databases.
>
> > while indexes are being used on that table, there's a companion
> > table which is much smaller -- a "mere" 75000 rows -- which is
> > suffering a sequential scan, and i was trying to eliminate those.
>
> Well, don't think of sequential scans as plain bad.  Sometimes they're
> the best choice, sometimes they're not.
>
> Now, if an index scan is provably and noticeably faster than the
> sequential scan, then the planner is making the wrong decision.  Have
> you tried running your query with
>
> set enable_seqscan=off;

***actually, yes.  the engine just ignored it.***
>
> to see how it behaves?  I've found many somewhat slow queries got
> really fast or really slow when I did that.
>
> Note that you shouldn't blindly run a query all the time with that
> setting, as there are many instances where seqscan is the right
> answer.  Also, your explain cost estimates will all be way off.
>
> > perhaps it is true that ANALYZE isn't being done often enough.
> > perhaps VACUUMs aren't being done often enough either.  we're leary
> > of scheduling repeated VACUUMs having encountered a case where the
> > VACUUM took over an hour to complete.
>
> Run "analyze verbose" on your db and see what it says about number of
> page slots needed versus used.  that will help you tell if you're
> vacuuming enough.
>
> How long vacuum takes isn't really that important.  What is important
> is how much of an impact it's having on the system.  there are
> several vacuum parameters in the postgresql.conf file that can lower
> the impact vacuum has on your system I/O wise while increasing its
> run time.
>
> Vacuum full is another story.  Think of it as a recovery tool, not a
> periodic maintenance tool.
>
> > it may, too, be because the tables use user-defined types heavily
> > and the original UPDATE involved a SELECT ... IN ... having a GROUP
> > BY with a few references to columns deep within user-defined types.
>
> Hard to say without a query and an explain analyze output.  It's
> common for user defined functions to produce estimates in the
> planner that are way off.  user defined types, not so much.  But the
> more complex the query the more likely it is that the query planner
> will make a bad estimate of the number of rows somewhere and choose
> a bad method.
>
> >  that wouldn't have been my choice, but, then, they were given to
> >  me to work, not my design.  in fact, PG is the first relational
> >  database implementation i've used that offered such things in a
> >  big way.
>
> Extensibility is quite a useful tool.
>
> > i also don't understand some other things, which are surprising,
> > like why some UPDATEs take so much longer when wrapped in a BEGIN
> > TRANSACTION- COMMIT than when having the transaction at a statement
> > level.
>
> that is strange.  I'd expect that maybe you've got something happening
> with the transaction waiting on other transactions, so that it's not
> so much running hard as just tapping its toe waiting for the other
> transaction to commit or roll back.

*** yes, i thought it was odd, too.  there wasn't anything else in that
transaction, and the table was set up for an experiment.  of course, the
experiment was one of those "UPDATE foo SET foo.x = 1 + foo.x WHERE
foo.y < k" things. ***
>
> > I come from an Oracle, DB2, Informix world, and in my experience
> > plans for queries are more stable.   i have loitered in and around
> > MySQL for a while.   i'm not surprised there's a learning curve with
> > PG.  i am surprised it breaks so marked with mainstay database
> > experience.
>
> Oh, I've seen Oracle get stupid due to lack of proper statistics as
> well.  You like had a good DBA who kept all that stuff hidden from
> you  though.

No comment, actually, since I worked right alongside the the DBA, and
sometimes did things myself.

> .... But PostgreSQL and mainline db experience are often
> incompatible.  The very things that people are used to using to make
> other dbs fast (forcing index usage for instance) can make postgresql
> noticeably slower.
>
> You might find that partial index help for some circumstances.  If you
> are using a query that has a where clause that looks at a field that
> has one value 99% of the time and another value 1% of the time, you
> can index that 1% only, and an index scan will be ultra quick.  The
> standard case for that is a boolean field.
>
> create table test (id int, info text, btest bool); insert 100,000
> rows, with 1% having btest=true, the rest false. create index
> test_btest_true on test(btest) where btest IS TRUE; analyze test;
> explain analyze select * from test where btest is true;
>
> Generally, postgresql offers different ways to solve the same problems
> as other database, and knowing those ways can really help troubleshoot
> and fix poorly performing queries.

pgsql-general by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: PostgreSQL Conference Fall 2007, final schedule
Next
From: "Matthew T. O'Connor"
Date:
Subject: Re: PostgreSQL Conference Fall 2007, final schedule