Re: Why it is using/not using index scan? - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Why it is using/not using index scan?
Date
Msg-id BANLkTinCy32=+At5czcUgUXHC0r-YxTEwg@mail.gmail.com
Whole thread Raw
In response to Why it is using/not using index scan?  (Laszlo Nagy <gandalf@shopzeus.com>)
List pgsql-performance
On Thu, Mar 31, 2011 at 12:26 PM, Laszlo Nagy <gandalf@shopzeus.com> wrote:
> For this query:
>
> select pp.id,pp.product_id,pp.selling_site_id,pp.asin
> from product_price pp
> where
> (pp.asin is not null and pp.asin<>'')
> and (pp.upload_status_id<>1)
> and pp.selling_site_id in (8,7,35,6,9)
> and (pp.last_od < 'now'::timestamp - '1 week'::interval )
> limit 5000
>
> Query plan is:
>
> "Limit  (cost=9182.41..77384.80 rows=3290 width=35)"
> "  ->  Bitmap Heap Scan on product_price pp  (cost=9182.41..77384.80
> rows=3290 width=35)"
> "        Recheck Cond: ((last_od < '2011-03-24 13:05:09.540025'::timestamp
> without time zone) AND (selling_site_id = ANY ('{8,7,35,6,9}'::bigint[])))"
> "        Filter: ((asin IS NOT NULL) AND (asin <> ''::text) AND
> (upload_status_id <> 1))"
> "        ->  Bitmap Index Scan on idx_product_price_last_od_ss
>  (cost=0.00..9181.59 rows=24666 width=0)"
> "              Index Cond: ((last_od < '2011-03-24
> 13:05:09.540025'::timestamp without time zone) AND (selling_site_id = ANY
> ('{8,7,35,6,9}'::bigint[])))"
>
> For this query:
>
> select pp.id,pp.product_id,pp.selling_site_id,pp.asin
> from product_price pp
> where
> (pp.asin is not null and pp.asin<>'')
> and (pp.upload_status_id<>1)
> and pp.selling_site_id in (8,7,35,6,9)
> and (pp.last_od + '1 week'::interval < 'now'::timestamp )
> limit 5000
>
> Query plan is:
>
> "Limit  (cost=0.00..13890.67 rows=5000 width=35)"
> "  ->  Seq Scan on product_price pp  (cost=0.00..485889.97 rows=174898
> width=35)"
> "        Filter: ((asin IS NOT NULL) AND (asin <> ''::text) AND
> (upload_status_id <> 1) AND ((last_od + '7 days'::interval) < '2011-03-31
> 13:06:17.460013'::timestamp without time zone) AND (selling_site_id = ANY
> ('{8,7,35,6,9}'::bigint[])))"
>
>
> The only difference is this: instead of (pp.last_od < 'now'::timestamp - '1
> week'::interval ) I have used (pp.last_od + '1 week'::interval <
> 'now'::timestamp )
>
> First query with index scan opens in 440msec. The second query with seq scan
> opens in about 22 seconds. So the first one is about 50x faster.
>
> My concern is that we are working on a huge set of applications that use
> thousands of different queries on a database. There are programs that we
> wrote years ago. The database structure continuously changing. We are adding
> new indexes and columns, and of course we are upgrading PostgreSQL when a
> new stable version comes out. There are cases when a change in a table
> affects 500+ queries in 50+ programs. I really did not think that I have to
> be THAT CAREFUL with writing conditions in SQL. Do I really have to manually
> analyze all those queries and "correct" conditions like this?
>
> If so, then at least I would like to know if there is a documentation or
> wiki page where I can learn about "how not to write conditions". I just
> figured out that I need to put constant expressions on one side of any
> comparison, if possible. But probably there are other rules I wouldn't think
> of.
>
> Might it be possible to change the optimizer so that it tries to rally
> constant expressions in the first place? That cannot be bad, right?

It's pretty well understood by database developers that indexable
expressions are such that the expression being compared is in the same
form being used in 'create index'.  Even if you did not understand
that, simple trial and error gave the answer immediately using the
standard tools (explain,timing etc) provided by the database.   If you
are concerned, just start logging slow queries
(log_min_duration_statement) and fix them if the sql is bad.

merlin

pgsql-performance by date:

Previous
From: Radhya sahal
Date:
Subject: postgresql benchmark
Next
From: Ireneusz Pluta
Date:
Subject: Re: Background fsck