Thread: index skipped in favor of seq scan.

index skipped in favor of seq scan.

From
ryan.a.roemmich@mail.sprint.com
Date:
I am working with putting syslog logs into a database, I'm parsing the
logs and using the key information for my fields.  With my test data of
~200K rows the optimizer used my b-tree index that I created for an
oft-used where clause.  When the table grew to over 800K rows the index
was no longer used.  The field in question contains IP addresses, but
uses varchar.  The values are _not_ unique.  One particular address has
150K entries.  How can I keep my where queries speedy?

-ryan


Re: index skipped in favor of seq scan.

From
Peter Eisentraut
Date:
ryan.a.roemmich@mail.sprint.com writes:

> I am working with putting syslog logs into a database, I'm parsing the
> logs and using the key information for my fields.  With my test data of
> ~200K rows the optimizer used my b-tree index that I created for an
> oft-used where clause.  When the table grew to over 800K rows the index
> was no longer used.  The field in question contains IP addresses, but
> uses varchar.  The values are _not_ unique.  One particular address has
> 150K entries.  How can I keep my where queries speedy?

For 150k out of 800k rows, a sequential scan is definitely the better
choice.  If you can prove otherwise, please post data.

For problems with the optimizer in general you should post the schema, the
queries, and the explain output.

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter


RE: index skipped in favor of seq scan.

From
ryan.a.roemmich@mail.sprint.com
Date:
On the other end of the spectrum there are many addresses with only one
entry.  When I use one of these addresses in the WHERE clause it takes
just as long as the address with 150k rows.  If the sequential scan is
better for 150k rows out of 800k rows, what about 1 out of 800k?  It
seems that when my table grew to this size the index was no longer used.
 If that's true is there any point in having the index?

-----Original Message-----
From: peter.e [mailto:peter_e@gmx.net]
Sent: Monday, July 09, 2001 4:26 PM
To: ryan.a.roemmich
Cc: pgsql-general
Subject: Re: [GENERAL] index skipped in favor of seq scan.


ryan.a.roemmich@mail.sprint.com writes:

> I am working with putting syslog logs into a database, I'm parsing the
> logs and using the key information for my fields.  With my test data
of
> ~200K rows the optimizer used my b-tree index that I created for an
> oft-used where clause.  When the table grew to over 800K rows the
index
> was no longer used.  The field in question contains IP addresses, but
> uses varchar.  The values are _not_ unique.  One particular address
has
> 150K entries.  How can I keep my where queries speedy?

For 150k out of 800k rows, a sequential scan is definitely the better
choice.  If you can prove otherwise, please post data.

For problems with the optimizer in general you should post the schema,
the
queries, and the explain output.

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Re: index skipped in favor of seq scan.

From
Tom Lane
Date:
ryan.a.roemmich@mail.sprint.com writes:
> On the other end of the spectrum there are many addresses with only one
> entry.  When I use one of these addresses in the WHERE clause it takes
> just as long as the address with 150k rows.  If the sequential scan is
> better for 150k rows out of 800k rows, what about 1 out of 800k?  It
> seems that when my table grew to this size the index was no longer used.

The problem is that the 150k-duplicates value is dominating the
planner's rather inadequate statistics, and causing it to believe that
the table contains only a few values that all occur many times.  If that
were the true scenario then the use of seq scan would be the correct
choice.

This is fixed (I hope) for 7.2, but there's not much to be done about
it in current releases, unless you can avoid storing the 150k-duplicates
value.  Is that a real value, or just a dummy?  If you could replace it
with NULL then the right things would happen, because the statistics do
already distinguish NULL from regular data values.

            regards, tom lane

RE: index skipped in favor of seq scan.

From
ryan.a.roemmich@mail.sprint.com
Date:
> -----Original Message-----
> From: tgl [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, July 10, 2001 10:56 AM
> To: ryan.a.roemmich
> Cc: pgsql-general
> Subject: Re: [GENERAL] index skipped in favor of seq scan.
>
>
> ryan.a.roemmich@mail.sprint.com writes:
> > On the other end of the spectrum there are many addresses
> with only one
> > entry.  When I use one of these addresses in the WHERE
> clause it takes
> > just as long as the address with 150k rows.  If the
> sequential scan is
> > better for 150k rows out of 800k rows, what about 1 out of
> 800k?  It
> > seems that when my table grew to this size the index was no
> longer used.
>
> The problem is that the 150k-duplicates value is dominating the
> planner's rather inadequate statistics, and causing it to believe that
> the table contains only a few values that all occur many
> times.  If that
> were the true scenario then the use of seq scan would be the correct
> choice.
>
> This is fixed (I hope) for 7.2, but there's not much to be done about
> it in current releases, unless you can avoid storing the
> 150k-duplicates
> value.  Is that a real value, or just a dummy?  If you could
> replace it
> with NULL then the right things would happen, because the
> statistics do
> already distinguish NULL from regular data values.
>

It's an real IP address.  I'm logging from a Cisco PIX firewall and the
system at the address has been hammering the system.  Once we get the
problem resolved with the machine in question I'll be able to remove all
of its entries from the table, and regain my precious indexes.

Does the planner make the choice based on a percentage?  You said that
with the 150k rows out of 800k rows a seq scan is better.  What if the
total number of rows was a few million?  Does the planner ever consider
using the index again, or is it dependant on the "slices" as opposed to
the whole?


Re: index skipped in favor of seq scan.

From
Tom Lane
Date:
ryan.a.roemmich@mail.sprint.com writes:
> Does the planner make the choice based on a percentage?

Essentially (actually, it's a nonlinear cost model, so "percentage" is
too simplistic).

It sounds like the only solution available to you until 7.2 comes out
is to use a sledgehammer:

    set enable_seqscan to off;

See the documentation.

            regards, tom lane

Foreign Keys

From
Ryan Mahoney
Date:
How can you get a listing of foreign keys to a table?

TIA

-r
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.251 / Virus Database: 124 - Release Date: 4/26/01

Re: Foreign Keys

From
Bruce Momjian
Date:
> How can you get a listing of foreign keys to a table?
>

We haven't figure out a good way yet.  The pg_depend discussion on
hackers may lead to a solution if we evern implement it.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Foreign Keys

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> How can you get a listing of foreign keys to a table?

> We haven't figure out a good way yet.

You can figure it out by rooting through pg_triggers looking for RI
triggers that reference your table ... but of course this is pretty
painful and ugly.  Sometime we should build a view to do it for you.

            regards, tom lane