Thread: Query-Planer from 6seconds TO DAYS

Query-Planer from 6seconds TO DAYS

From
Böckler Andreas
Date:
Hi,

i've got a very strange problem on PostgreSQL 8.4, where the queryplaner goes absolutely havoc, when slightly changing
oneparameter. 

First the Tables which are involved:
1.                                                Table "public.spsdata"
           Column            |            Type             |                           Modifiers
   

-----------------------------+-----------------------------+---------------------------------------------------------------
 data_id                     | bigint                      | not null default nextval('spsdata_data_id_seq'::regclass)
 machine_id                  | integer                     |
 timestamp                   | timestamp with time zone    |
 value1                      | ….
 value2                      | ….
 errorcode                   | integer
...

This table is partitioned (per month) and holds about 3.86203 * 10^9 records (the machines are generating data every 5
seconds)
Every partition (=month) has about 36 * 10^6 records and has following indexes/constraints:
Indexes:
    "spsdata_2012m09_machine_id_key" UNIQUE, btree (machine_id, "timestamp")
Check constraints:
    "spsdata_2012m09_timestamp_check" CHECK ("timestamp" >= '2012-09-01 00:00:00+02'::timestamp with time zone AND
"timestamp"< '2012-10-01 00:00:00+02'::timestamp with time zone) 
Inherits: spsdata

constraint_exclusion is set to 'partition'

2.                                               Table "public.events"
        Column         |            Type             |                           Modifiers
-----------------------+-----------------------------+----------------------------------------------------------------
 event_id              | bigint                      | not null default nextval('events_event_id_seq'::regclass)
 machine_id            | integer                     |
 timestamp             | timestamp without time zone |
 code                  | integer                     |
Indexes:
    "events_pkey" PRIMARY KEY, btree (event_id)
    "events_unique_key" UNIQUE, btree (machine_id, "timestamp", code)
    "events_code" btree (code)
    "events_timestamp" btree ("timestamp");

THE PROBLEM:
We're trying to select certain rows from the spsdata-table which happened before the event. The event is filtered By
code.Because the timestamp of event and data is not in sync, we look into the last 30 seconds. Here is the select: 
db=# SELECT  m.machine_id, s.timestamp, s.errorcode
FROM events m INNER JOIN spsdata as s ON (m.machine_id= m.machine_id AND s.timestamp BETWEEN m.timestamp - interval '30
seconds'AND m.timestamp) 
WHERE m.code IN 2024 AND m.timestamp BETWEEN '2012-08-14' AND '2012-08-29' AND s.errorcode in '2024';
 machine_id |       timestamp        | errorcode
------------+------------------------+-----------
        183 | 2012-08-18 18:21:29+02 |      2024
        216 | 2012-08-20 15:40:39+02 |      2024
        183 | 2012-08-21 12:56:49+02 |      2024
        183 | 2012-08-27 17:04:34+02 |      2024
        214 | 2012-08-27 23:33:44+02 |      2024
(5 rows)

Time: 6087.911 ms

When I'm changing "m.timestamp BETWEEN '2012-08-14' AND '2012-08-29'" to "m.timestamp BETWEEN '2012-08-13' AND
'2012-08-29'"the query takes HOURS.  
Here are some statistics for different ranges
2012-08-14' AND '2012-08-29' -> ca 4sec
2012-08-14' AND '2012-09-30' -> ca 4sec
2012-08-13' AND '2012-08-15' -> ca 4sec
2012-08-13' AND '2012-08-22' -> ca 4sec
2012-08-13' AND '2012-08-25' -> ca 4sec
2012-08-13' AND '2012-08-26' -> FOREVER
2012-08-14' AND '2012-08-26' -> ca 4sec
2012-08-13' AND  ( >'2012-08-26' ) -> FOREVER

The problem is the change of the query plan.
FAST:
                                                                            QUERY PLAN
                                          

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..144979241.24 rows=42662 width=14)
   Join Filter: ((s."timestamp" <= m."timestamp") AND (m.machine_id = s.machine_id) AND (s."timestamp" >=
(m."timestamp"- '00:00:30'::interval))) 
   ->  Index Scan using events_code on events m  (cost=0.00..4911.18 rows=25 width=12)
         Index Cond: (code = 2024)
         Filter: (("timestamp" >= '2012-08-14 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2012-08-26
00:00:00'::timestampwithout time zone)) 
   ->  Append  (cost=0.00..5770958.44 rows=1400738 width=14)
         ->  Index Scan using spsdata_machine_id on spsdata s  (cost=0.00..4.11 rows=1 width=14)
               Index Cond: (s.machine_id = m.machine_id)

SLOW:
                                                                                         QUERY PLAN
                                                                     

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=631.37..158275670.34 rows=47782 width=14)
   Hash Cond: (s.machine_id = m.machine_id)
   Join Filter: ((s."timestamp" <= m."timestamp") AND (s."timestamp" >= (m."timestamp" - '00:00:30'::interval)))
   ->  Append  (cost=0.00..158152325.56 rows=3071675 width=14)
         ->  Seq Scan on spsdata s  (cost=0.00..10.75 rows=1 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2009m11 s  (cost=0.00..10.75 rows=1 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2009m12 s  (cost=0.00..24897.60 rows=32231 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2010m01 s  (cost=0.00..113650.43 rows=153779 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2010m02 s  (cost=0.00..451577.41 rows=9952 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2010m03 s  (cost=0.00..732979.41 rows=16001 width=14)
               Filter: (errorcode = 2024::smallint)
         ->  Seq Scan on spsdata_2010m04 s  (cost=0.00..940208.95 rows=17699 width=14)

As you can imagine, Seq Scanning a Table(s) with  3.86203 * 10^9 records is not a good idea.
What can I do to prevent that behavior ?

Thanks

Andy

--
Andreas Böckler
andy@boeckler.org



Re: Query-Planer from 6seconds TO DAYS

From
Böckler Andreas
Date:
Hi Jeff,

thanks for your answer!

Am 24.10.2012 um 19:00 schrieb Jeff Janes:

> On Wed, Oct 24, 2012 at 8:41 AM, Böckler Andreas <andy@boeckler.org> wrote:
>
>> SELECT  m.machine_id, s.timestamp, s.errorcode
>> FROM events m INNER JOIN spsdata as s ON (m.machine_id= s.machine_id
>
> m.machine_id is equal to itself?  you must be retyping the query by hand…
Yes I did …  i changed the vars from german to english ..
That should be m.machine_id=s.machine_id
>
> You should report the results of "EXPLAIN ANALYZE" rather than merely
> EXPLAIN, as that would make it much easier to verify where the
> selectivity estimates are off.
>
OK ..
i can do that for the FAST query.
But the other one would take days. (see below )

>
>> FAST:
>>                                                                            QUERY PLAN
>>
------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Nested Loop  (cost=0.00..144979241.24 rows=42662 width=14)
>>   Join Filter: ((s."timestamp" <= m."timestamp") AND (m.machine_id = s.machine_id) AND (s."timestamp" >=
(m."timestamp"- '00:00:30'::interval))) 
>>   ->  Index Scan using events_code on events m  (cost=0.00..4911.18 rows=25 width=12)
>>         Index Cond: (code = 2024)
>>         Filter: (("timestamp" >= '2012-08-14 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2012-08-26
00:00:00'::timestampwithout time zone)) 
>>   ->  Append  (cost=0.00..5770958.44 rows=1400738 width=14)
>>         ->  Index Scan using spsdata_machine_id on spsdata s  (cost=0.00..4.11 rows=1 width=14)
>>               Index Cond: (s.machine_id = m.machine_id)
>
> Was there more to the plan that you snipped?  If not, why isn't it
> checking all the other partitions?

Your right. It's checking all partitions!. So the constraint exclusion doesn't kick in.
This can be fixed with
SELECT
    m.machine_id, s.timestamp, s.errorcode
FROM
    events m
    INNER JOIN spsdata as s ON (m.machine_id=s.machine_id AND s.timestamp BETWEEN m.timestamp - interval '30 seconds'
ANDm.timestamp) 
WHERE
    m.code IN (2024)
    AND m.timestamp  BETWEEN '2012-08-01' AND '2012-08-29'
    AND s.timestamp  BETWEEN '2012-08-01' AND '2012-08-29'
    AND s.errorcode in ('2024');

It doesn't take hours to end, but it's not the performance gain you would expect.

I'v changed the query to one partition spsdata_2012m08  and attached the slow and fast cases with EXPLAIN ANALYZE.

The difference is one day in the WHERE-Clause
290.581 ms VS  687887.674 ms !
Thats 2372 times slower.

How can i force the fast query plan in a select?

At least I know that spsdata_2012m08 has way more records than events
spsdata_2012m08: reltuples -> 5.74082 * 10^7
events: count(1) for that time range -> 51383

>
> If you can't fix the selectivity estimates, one thing you could do to
> drive it to the faster query is to decrease random_page_cost to be the
> same seq_page_cost.  That should push the cross-over point to the
> sequential scan out to a region you might not care about.  However, it
> could also drive other queries in your system to use worse plans than
> they currently are.
> Or, you could "set enable_seqscan = off" before running this
> particular query, then reset it afterwards.
>
> Cheers,
>
> Jeff
I've played with seq_page_cost and enable_seqscan already, but you have to know the right values before SELECT to get
goodresults ;) 

Cheers,

Andy


--
Andreas Böckler
andy@boeckler.org

Attachment

Re: Query-Planer from 6seconds TO DAYS

From
Jeff Janes
Date:
On Wed, Oct 24, 2012 at 11:51 AM, Böckler Andreas <andy@boeckler.org> wrote:
>>
>> Was there more to the plan that you snipped?  If not, why isn't it
>> checking all the other partitions?
>
> Your right. It's checking all partitions!. So the constraint exclusion doesn't kick in.
> This can be fixed with
> SELECT
>         m.machine_id, s.timestamp, s.errorcode
> FROM
>         events m
>         INNER JOIN spsdata as s ON (m.machine_id=s.machine_id AND s.timestamp BETWEEN m.timestamp - interval '30
seconds'AND m.timestamp) 
> WHERE
>         m.code IN (2024)
>         AND m.timestamp  BETWEEN '2012-08-01' AND '2012-08-29'
>         AND s.timestamp  BETWEEN '2012-08-01' AND '2012-08-29'
>         AND s.errorcode in ('2024');

Even checking all the partitions it seemed to be pretty fast (78 ms).
Is it worth adding all of that spinach (which could easily get out of
date) just to improve a query that is already fast?



>
> It doesn't take hours to end, but it's not the performance gain you would expect.
>
> I'v changed the query to one partition spsdata_2012m08  and attached the slow and fast cases with EXPLAIN ANALYZE.
>
> The difference is one day in the WHERE-Clause
> 290.581 ms VS  687887.674 ms !
> Thats 2372 times slower.

From the fast case:

         ->  Bitmap Index Scan on spsdata_2012m08_machine_id_key
(cost=0.00..2338.28 rows=56026 width=0) (actual time=0.262..0.262
rows=6 loops=186)
               Index Cond: ((s.machine_id = m.machine_id) AND
(s."timestamp" > (m."timestamp" - '00:00:30'::interval)) AND
(s."timestamp" <= m."timestamp"))

The difference in predicted rows to actual rows, 56026 to 6, is pretty
impressive.  That is why the cost of the fast method is vastly
overestimated, and making it just slightly bigger yet pushes it over
the edge to looking more expensive than the slower sequential scan.
It does seem to be the case of the range selectivity not being
estimate correctly.

> How can i force the fast query plan in a select?

I'd probably punt and do it in the application code.  Do the select on
the event table, then loop over the results issues the queries on the
spsdata table.  That way the range endpoints would be constants rather
than coming from joins, and the planner should do a better job.

Can you load the data into 9.2 and see if it does better?  (I'm not
optimistic that it will be.)


> I've played with seq_page_cost and enable_seqscan already, but you have to know the right values before SELECT to get
goodresults ;) 

Not sure what you mean here.  If you change the settings just for the
query, it should be safe because when the query is already fast it is
not using the seq scan, so discouraging it from using one even further
is not going to do any harm.

Or do you mean you have lots of queries which are slow other than the
one shown, and you can't track all of them down?

Cheers,

Jeff


Re: Query-Planer from 6seconds TO DAYS

From
"Kevin Grittner"
Date:
Böckler Andreas wrote:

> I've played with seq_page_cost and enable_seqscan already, but you
> have to know the right values before SELECT to get good results ;)

The idea is to model actual costs on your system.  You don't show
your configuration or describe your hardware, but you show an
estimate of retrieving over 4000 rows through an index and describe a
response time of 4 seconds, so you must have some significant part of
the data cached.

I would see how the workload behaves with the following settings:

effective_cache_size = <your shared_buffers setting plus what the OS
                        shows as cached pages>
seq_page_cost = 1
random_page_cost = 2
cpu_tuple_cost = 0.05

You can set these in a session and check the plan with EXPLAIN. Try
various other important important queries with these settings and
variations on them. Once you hit the right factors to model your
actual costs, the optimizaer will make better choices without needing
to tinker with it each time.

-Kevin


Re: Query-Planer from 6seconds TO DAYS

From
Böckler Andreas
Date:
Hi,


Am 25.10.2012 um 20:22 schrieb Kevin Grittner:

>
> The idea is to model actual costs on your system.  You don't show
> your configuration or describe your hardware, but you show an
> estimate of retrieving over 4000 rows through an index and describe a
> response time of 4 seconds, so you must have some significant part of
> the data cached.
Sure my effective_cache_size 10 GB
But my right Table has the size of 1.2 TB (yeah Terra) at the moment (partitioned a 40GB slices) and has 3 * 10^9
records

My left table has only the size of 227MB and 1million records. Peanuts.
> I would see how the workload behaves with the following settings:
>
> effective_cache_size = <your shared_buffers setting plus what the OS
>                        shows as cached pages>
> seq_page_cost = 1
> random_page_cost = 2
> cpu_tuple_cost = 0.05
>
> You can set these in a session and check the plan with EXPLAIN. Try
> various other important important queries with these settings and
> variations on them. Once you hit the right factors to model your
> actual costs, the optimizaer will make better choices without needing
> to tinker with it each time.

 i've played with that already ….

NESTED LOOP -> GOOD
SEQSCAN -> VERY BAD

SET random_page_cost = 4;
2012-08-14' AND '2012-08-30' -> NESTED LOOP
2012-08-13' AND '2012-08-30' -> SEQSCAN
SET random_page_cost = 2;
2012-08-14' AND '2012-08-30' -> NESTED LOOP
2012-08-07' AND '2012-08-30' -> NESTED LOOP
2012-08-06' AND '2012-08-30' -> SEQSCAN
SET random_page_cost = 1;
2012-08-14' AND '2012-08-30' -> NESTED LOOP
2012-08-07' AND '2012-08-30' -> NESTED LOOP
2012-07-07' AND '2012-08-30' -> NESTED LOOP
2012-07-06' AND '2012-08-30' -> SEQSCAN

The thing is ..
- You can alter what you want. The planner will switch at a certain time range.
- There is not one case, where the SEQSCAN-Method will be better .. It's not possible.

So the only way to tell the planner that he's doomed is
SET enable_seqscan=0
which is not very elegant. (Query Hints would be BTW jehovah!)

You would be forced to write something like this:
var lastValueEnable_seqscan = "SHOw enable_seqscan"
SET enable_seqscan=0;
SELECT ...
SET enable_seqscan=lastValueEnable_seqscan;

Kind regards

Andy

--
Andreas Böckler
andy@boeckler.org



Re: Query-Planer from 6seconds TO DAYS

From
"ktm@rice.edu"
Date:
On Fri, Oct 26, 2012 at 04:37:33PM +0200, Böckler Andreas wrote:
> Hi,
>
>
> Am 25.10.2012 um 20:22 schrieb Kevin Grittner:
>
> >
> > The idea is to model actual costs on your system.  You don't show
> > your configuration or describe your hardware, but you show an
> > estimate of retrieving over 4000 rows through an index and describe a
> > response time of 4 seconds, so you must have some significant part of
> > the data cached.
> Sure my effective_cache_size 10 GB
> But my right Table has the size of 1.2 TB (yeah Terra) at the moment (partitioned a 40GB slices) and has 3 * 10^9
records
>
> My left table has only the size of 227MB and 1million records. Peanuts.
> > I would see how the workload behaves with the following settings:
> >
> > effective_cache_size = <your shared_buffers setting plus what the OS
> >                        shows as cached pages>
> > seq_page_cost = 1
> > random_page_cost = 2
> > cpu_tuple_cost = 0.05
> >
> > You can set these in a session and check the plan with EXPLAIN. Try
> > various other important important queries with these settings and
> > variations on them. Once you hit the right factors to model your
> > actual costs, the optimizaer will make better choices without needing
> > to tinker with it each time.
>
>  i've played with that already ….
>
> NESTED LOOP -> GOOD
> SEQSCAN -> VERY BAD
>
> SET random_page_cost = 4;
> 2012-08-14' AND '2012-08-30' -> NESTED LOOP
> 2012-08-13' AND '2012-08-30' -> SEQSCAN
> SET random_page_cost = 2;
> 2012-08-14' AND '2012-08-30' -> NESTED LOOP
> 2012-08-07' AND '2012-08-30' -> NESTED LOOP
> 2012-08-06' AND '2012-08-30' -> SEQSCAN
> SET random_page_cost = 1;
> 2012-08-14' AND '2012-08-30' -> NESTED LOOP
> 2012-08-07' AND '2012-08-30' -> NESTED LOOP
> 2012-07-07' AND '2012-08-30' -> NESTED LOOP
> 2012-07-06' AND '2012-08-30' -> SEQSCAN
>
> The thing is ..
> - You can alter what you want. The planner will switch at a certain time range.
> - There is not one case, where the SEQSCAN-Method will be better .. It's not possible.
>
> So the only way to tell the planner that he's doomed is
> SET enable_seqscan=0
> which is not very elegant. (Query Hints would be BTW jehovah!)
>
> You would be forced to write something like this:
> var lastValueEnable_seqscan = "SHOw enable_seqscan"
> SET enable_seqscan=0;
> SELECT ...
> SET enable_seqscan=lastValueEnable_seqscan;
>
> Kind regards
>
> Andy
>

Hi Andy,

You have the sequential_page_cost = 1 which is better than or equal to
the random_page_cost in all of your examples. It sounds like you need
a sequential_page_cost of 5, 10, 20 or more.

Regards,
Ken


Re: Query-Planer from 6seconds TO DAYS

From
Böckler Andreas
Date:
Hi Ken,

Am 26.10.2012 um 16:55 schrieb ktm@rice.edu:

> Hi Andy,
>
> You have the sequential_page_cost = 1 which is better than or equal to
> the random_page_cost in all of your examples.
> It sounds like you need
> a sequential_page_cost of 5, 10, 20 or more.

You're right it was sequential_page_cost = 1 because it's really irrelevant what I do here:
set random_page_cost=2;
set seq_page_cost=5;
'2012-05-01' AND '2012-08-30' -> NESTEDLOOP
'2012-04-01' AND '2012-08-30' -> SEQSCAN

a) there will be a point, where things will go bad
 this is like patching up a roof 'till you find the next hole instead of making it right at the beginning of
constructionprocess 
b) they high seq costs might be true for that table (partition at 40gb), but not for the rest of the database
 Seqscan-Costs per table would be great.

Regards,

Andy


--
Andreas Böckler
andy@boeckler.org


Re: Query-Planer from 6seconds TO DAYS

From
"ktm@rice.edu"
Date:
On Fri, Oct 26, 2012 at 05:15:05PM +0200, Böckler Andreas wrote:
> Hi Ken,
>
> Am 26.10.2012 um 16:55 schrieb ktm@rice.edu:
>
> > Hi Andy,
> >
> > You have the sequential_page_cost = 1 which is better than or equal to
> > the random_page_cost in all of your examples.
> > It sounds like you need
> > a sequential_page_cost of 5, 10, 20 or more.
>
> You're right it was sequential_page_cost = 1 because it's really irrelevant what I do here:
> set random_page_cost=2;
> set seq_page_cost=5;
> '2012-05-01' AND '2012-08-30' -> NESTEDLOOP
> '2012-04-01' AND '2012-08-30' -> SEQSCAN
>
> a) there will be a point, where things will go bad
>  this is like patching up a roof 'till you find the next hole instead of making it right at the beginning of
constructionprocess 
> b) they high seq costs might be true for that table (partition at 40gb), but not for the rest of the database
>  Seqscan-Costs per table would be great.
>
> Regards,
>
> Andy
>

Hi Andy,

You can set them per tablespace. Maybe you could put the appropriate tables
that need the higher costing on the same one.

Regards,
Ken


Re: Query-Planer from 6seconds TO DAYS

From
Böckler Andreas
Date:
Am 25.10.2012 um 18:20 schrieb Jeff Janes:

> Can you load the data into 9.2 and see if it does better?  (I'm not
> optimistic that it will be.)

This takes months, the customer has to pay us for that ;)
There are already talks about moving it to a new server, but this is for next year.

And it will be no child's play to migrate about 1.6TB of data from 8.4 to 9.2.

Cheers,

Andy
--
Andreas Böckler
andy@boeckler.org



Re: Query-Planer from 6seconds TO DAYS

From
"Kevin Grittner"
Date:
Böckler Andreas wrote:
> Am 25.10.2012 um 20:22 schrieb Kevin Grittner:

>> The idea is to model actual costs on your system. You don't show
>> your configuration or describe your hardware, but you show an
>> estimate of retrieving over 4000 rows through an index and
>> describe a response time of 4 seconds, so you must have some
>> significant part of the data cached.
> Sure my effective_cache_size 10 GB
> But my right Table has the size of 1.2 TB (yeah Terra) at the
> moment (partitioned a 40GB slices) and has 3 * 10^9 records

You're getting up to a third of the size of what I've managed, so
we're in the same ballpark. I've gone to hundreds of millions of rows
in a table without partitioning with good performance. I realize in
raw rowcount for one table you're at almost ten times what I've run
that way, but I have no reason to think that it falls over between
those points. There are situations where partitioning helps, but I
have not found raw rowcount to be a very good basis for making the
call. What are your reasons for going that way?

> My left table has only the size of 227MB and 1million records.
> Peanuts.

Absolutely.

>> I would see how the workload behaves with the following settings:
>>
>> effective_cache_size = <your shared_buffers setting plus what the
>> OS shows as cached pages>
>> seq_page_cost = 1
>> random_page_cost = 2
>> cpu_tuple_cost = 0.05
>>
>> You can set these in a session and check the plan with EXPLAIN.
>> Try various other important important queries with these settings
>> and variations on them. Once you hit the right factors to model
>> your actual costs, the optimizaer will make better choices without
>> needing to tinker with it each time.
>
>  i've played with that already ….
>
> NESTED LOOP -> GOOD
> SEQSCAN -> VERY BAD
>
> SET random_page_cost = 4;
> 2012-08-14' AND '2012-08-30' -> NESTED LOOP
> 2012-08-13' AND '2012-08-30' -> SEQSCAN
> SET random_page_cost = 2;
> 2012-08-14' AND '2012-08-30' -> NESTED LOOP
> 2012-08-07' AND '2012-08-30' -> NESTED LOOP
> 2012-08-06' AND '2012-08-30' -> SEQSCAN
> SET random_page_cost = 1;
> 2012-08-14' AND '2012-08-30' -> NESTED LOOP
> 2012-08-07' AND '2012-08-30' -> NESTED LOOP
> 2012-07-07' AND '2012-08-30' -> NESTED LOOP
> 2012-07-06' AND '2012-08-30' -> SEQSCAN

What impact did setting cpu_tuple_cost have?

> The thing is ..
> - You can alter what you want. The planner will switch at a certain
>   time range.
> - There is not one case, where the SEQSCAN-Method will be better ..
>   It's not possible.

I would be interested to see what you consider to be the proof of
that. In most benchmarks where people have actually measured it, the
seqscan becomes faster when you are selecting more than about 10% of
a table, since the index scan will be jumping all over the disk to
read the index pages and the actual data in the heap, which a seqscan
can take advantage of the OS's readahead. (Perhaps you need to tweak
that OS setting?) Of course, the more the data is cached, the less
penalty there is for random access and the less attractive seqscans
become.

Any attempt to force plans using sequential scans to always be
ignored is sure to make some types of queries slower -- sometimes
much slower. Hints or other ways to force a plan are far inferior to
modelling costs better. You might want to give that a try.

-Kevin


Re: Query-Planer from 6seconds TO DAYS

From
"Kevin Grittner"
Date:
Böckler Andreas wrote:

> b) they high seq costs might be true for that table (partition at
>    40gb), but not for the rest of the database Seqscan-Costs per
>    table would be great.

You can set those per tablespace. Again, with about 40 spindles in
our RAID, we got about ten times the speed with a sequential scan as
random access, and an index scan has to hit more pages (index and
heap rather than just the heap), so you can easily shoot yourself in
the foot by assuming that accessing a large portion of the table by
index is faster.

Really, if you stop focusing on what you think the solution is, and
provide a more clear statement of your problem, with sufficient
datail, you are likely to get a real solution.

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin


Re: Query-Planer from 6seconds TO DAYS

From
"Kevin Grittner"
Date:
ktm@rice.edu wrote:

> You have the sequential_page_cost = 1 which is better than or equal
> to the random_page_cost in all of your examples. It sounds like you
> need a sequential_page_cost of 5, 10, 20 or more.

The goal should be to set the cost factors so that they model actual
costs for you workload in your environment. In what cases have you
seen the sequential scan of a large number of adjacent pages from
disk take longer than randomly reading the same number of pages from
disk? (I would love to see the bonnie++ number for that, if you have
them.)

-Kevin


Re: Query-Planer from 6seconds TO DAYS

From
Jeff Janes
Date:
On Fri, Oct 26, 2012 at 8:15 AM, Böckler Andreas <andy@boeckler.org> wrote:
> Hi Ken,
>
> Am 26.10.2012 um 16:55 schrieb ktm@rice.edu:
>
>> Hi Andy,
>>
>> You have the sequential_page_cost = 1 which is better than or equal to
>> the random_page_cost in all of your examples.
>> It sounds like you need
>> a sequential_page_cost of 5, 10, 20 or more.
>
> You're right it was sequential_page_cost = 1 because it's really irrelevant what I do here:
> set random_page_cost=2;
> set seq_page_cost=5;
> '2012-05-01' AND '2012-08-30' -> NESTEDLOOP
> '2012-04-01' AND '2012-08-30' -> SEQSCAN
>
> a) there will be a point, where things will go bad

Sure.  And there truly is some point at which the sequential scan
actually will become faster.

>  this is like patching up a roof 'till you find the next hole instead of making it right at the beginning of
constructionprocess 

We are not at the beginning of the construction process.  You are
already living in the house.

Version 9.3 is currently under construction.  Maybe this will be a fix
for this problem in that release.  The hackers mailing list would be
the place to discuss that.

Cheers,

Jeff


Re: Query-Planer from 6seconds TO DAYS

From
Jeff Janes
Date:
On Fri, Oct 26, 2012 at 8:30 AM, Böckler Andreas <andy@boeckler.org> wrote:
>
> Am 25.10.2012 um 18:20 schrieb Jeff Janes:
>
>> Can you load the data into 9.2 and see if it does better?  (I'm not
>> optimistic that it will be.)
>
> This takes months, the customer has to pay us for that ;)

You probably only need to load one partition to figure out if does a
better job there.

Once you know if it solves the problem, then you can make an informed
decision on whether migration might be worthwhile.

Cheers,

Jeff


Re: Query-Planer from 6seconds TO DAYS

From
Böckler Andreas
Date:
Am 26.10.2012 um 20:00 schrieb Jeff Janes:

> You probably only need to load one partition to figure out if does a
> better job there.
>
> Once you know if it solves the problem, then you can make an informed
> decision on whether migration might be worthwhile.
>
> Cheers,
>
> Jeff
ok .. i'll give it a try ...
--
Andreas Böckler
andy@boeckler.org



Re: Query-Planer from 6seconds TO DAYS

From
Jeff Janes
Date:
On Fri, Oct 26, 2012 at 8:30 AM, Kevin Grittner <kgrittn@mail.com> wrote:
> ktm@rice.edu wrote:
>
>> You have the sequential_page_cost = 1 which is better than or equal
>> to the random_page_cost in all of your examples. It sounds like you
>> need a sequential_page_cost of 5, 10, 20 or more.
>
> The goal should be to set the cost factors so that they model actual
> costs for you workload in your environment.

Unfortunately the random_page_cost is getting multiplied by an
estimated page count which is 4 orders of magnitude too high.
random_page_cost and seq_page_cost (and enable_seqscan) might not be
the right knobs, but they are the knobs that currently exist.

Cheers,

Jeff