Thread: Planner cost adjustments

Planner cost adjustments

From
Daniel Begin
Date:

Hi all,

Running some queries, I found that the planner often selects sequential scan instead of an index scan, even if the latter is way faster (one order of magnitude faster if I consider some tests I made by setting enable_seqscan = ON/OFF). How can I figure out what parameter I should adjust to get the planner select an appropriate plan that would better consider my DB setup?

 

I had a look at http://www.postgresql.org/docs/9.3/static/runtime-config-query.html but at this point it is too much information for me;-)  Any rules of thumb, recipes I could use to select which parameters I should look at first?

 

Daniel

Re: Planner cost adjustments

From
Melvin Davidson
Date:
I have found that setting

enable_seqscan = off

will remedy that situation. Basically, it forces the planner to choose the index. However, if no correct index is available, it will use sequential scan anyway. The only time it will have a negative effect is if the seqscan is actually faster, which doesn't happen very often.

On Fri, May 29, 2015 at 9:39 AM, Daniel Begin <jfd553@hotmail.com> wrote:

Hi all,

Running some queries, I found that the planner often selects sequential scan instead of an index scan, even if the latter is way faster (one order of magnitude faster if I consider some tests I made by setting enable_seqscan = ON/OFF). How can I figure out what parameter I should adjust to get the planner select an appropriate plan that would better consider my DB setup?

 

I had a look at http://www.postgresql.org/docs/9.3/static/runtime-config-query.html but at this point it is too much information for me;-)  Any rules of thumb, recipes I could use to select which parameters I should look at first?

 

Daniel




--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: Planner cost adjustments

From
Tomas Vondra
Date:
Hi,

On 05/29/15 17:22, Melvin Davidson wrote:
> I have found that setting
>
> enable_seqscan = off
>
> will remedy that situation. Basically, it forces the planner to
> choose the index. However, if no correct index is available, it will
> use sequential scan anyway. The only time it will have a negative
> effect is if the seqscan is actually faster, which doesn't happen
> very often.

Actually, this is quite poor choice, exactly because it forces the
planner to use indexscan even if seqscan would be much faster. The
correct way to fix this is by tweaking cost variables, for example by
setting random_page_cost = 2 (instead of default 4).

Sadly, the original post provides very little information about the
context, e.g. database size, storage system, explain plans etc.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Planner cost adjustments

From
PT
Date:
On Fri, 29 May 2015 09:39:00 -0400
Daniel Begin <jfd553@hotmail.com> wrote:

> Hi all,
>
> Running some queries, I found that the planner often selects sequential scan
> instead of an index scan, even if the latter is way faster (one order of
> magnitude faster if I consider some tests I made by setting enable_seqscan =
> ON/OFF). How can I figure out what parameter I should adjust to get the
> planner select an appropriate plan that would better consider my DB setup?
>
> I had a look at
> http://www.postgresql.org/docs/9.3/static/runtime-config-query.html but at
> this point it is too much information for me;-)  Any rules of thumb, recipes
> I could use to select which parameters I should look at first?

Here's the correct way to handle this:

1) As mentioned elsewhere, first take the time to ensure that your
   cost estimate settings are reasonable for your hardware.  See
   section 18.7.2:
   http://www.postgresql.org/docs/9.4/static/runtime-config-query.html
2) If #1 doesnt' fix it, don't change enable_seqscan. Run a bunch of
   tests on the query(s) to see how well it performs. Then do
   ANALYZE DATABASE $insert_name_here; and run all the tests again.
   If performance/planning improves, then the analyze settings on
   your server aren't aggressive enough. Make changes to related
   config settings to fix.
3) If #2 doesn't uncover the problem, run EXPLAIN ANALYZE on all the
   queries in your test. It takes a bit of understanding to do this
   step, so you'll want to read up a bit and possibly ask questions
   if you have trouble interpreting the output, but you're looking
   for discrepencies between the estimated and actual times for any
   particular table. If you find them, that tends to indicate that
   you'll need to update statistics targets on any tables with the
   problem. See:
   http://www.postgresql.org/docs/9.4/static/planner-stats.html
4) If #3 doesn't fix things, then the PostgreSQL developers want to
   know about your problem so they can improve the planner. First,
   if there are queries that are causing you problems, update the
   application to disable sequential scans _for_those_particular_
   _queries_ so your application continues to trundle along but
   don't disable sequential scans globally, as that may cause
   other queries to perform badly. Once that immediate problem is
   out of the way, put together a test case that demonstrates the
   problem you're having (but doesn't contain any proprietary
   data, etc) and post it to the list so the developers can figure
   out what to do to improve Postgres.

Hope this helps.

--
Bill Moran <wmoran@potentialtech.com>


Re: Planner cost adjustments

From
Daniel Begin
Date:
Omg! I was not expecting such a step-by-step procedure, thanks!
I'll follow the guide :-)

Since I was about to provide a bit of context as asked by Tomas, here it is
for those who are interested...
Best regards,
Daniel

A bit of the required context...
I am running all this on my personal PC:  Windows 64b, i7 chip, 16GB ram.
The PostgreSQL 9.3 cluster is spread over 3X3TB external drives with write
caching. Most tables are static (no insert).

My largest table looks like this...
Records composed of:  3 bigint, 2 boolean, 1 timestamp and 1 geography type.
Number of records: 3870130000
Table size: 369GB
Indexes size: 425GB
 - btree(primary key): 125GB
 - btree(another field): 86GB
 - gist(geography): 241GB

Overall, 40% of my table and 30% of indexes do not fit in cache
(effective_cache_size=10GB) but looking at mostly used tables and indexes,
more than 90% of what I use doesn't fit.

On one hand, according to the documentation
(http://www.postgresql.org/docs/9.3/static/runtime-config-query.html), with
a cache rate like mine, I should probably increase random_page_cost to
better reflect the true cost of random storage reads.

On the other hand however, I found that...
(https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server)
"This is not where you should start to search for plan problems. Thet
random_page_cost is pretty far down this list (at the end in fact). If you
are getting bad plans, this shouldn't be the first thing you look at, even
though lowering this value may be effective. Instead, you should start by
making sure autovacuum is working properly, that you are collecting enough
statistics, and that you have correctly sized the memory parameters for your
server--all the things gone over above. After you've done all those much
more important things, if you're still getting bad plans then you should see
if lowering random_page_cost is still useful."

Please find below some the database config's parameters that might be of
interest...

Best regards,
Daniel

General config parameters I have modified
temp_buffers = 512MB
work_mem = 16MB
maintenance_work_mem = 256MB
checkpoint_segments = 64
checkpoint_completion_target = 0.8
effective_cache_size = 10GB
logging_collector = on
track_counts = on
autovacuum = on

Here are config parameters related to autovacuum
 autovacuum_analyze_scale_factor = 0.1
 autovacuum_analyze_threshold  = 50
 autovacuum_freeze_max_age   = 200000000
 autovacuum_max_workers  = 3
 autovacuum_multixact_freeze_max_age = 400000000
 autovacuum_naptime = 1min
 autovacuum_vacuum_cost_delay  = 20ms
 autovacuum_vacuum_cost_limit   = -1
 autovacuum_vacuum_scale_factor = 0.2
 autovacuum_vacuum_threshold = 50
 vacuum_cost_delay  = 0
vacuum_cost_limit = 200
vacuum_cost_page_dirty = 20
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_defer_cleanup_age = 0
vacuum_freeze_min_age = 50000000
vacuum_freeze_table_age = 150000000
vacuum_multixact_freeze_min_age = 5000000
vacuum_multixact_freeze_table_age= 150000000

-----Original Message-----
From: PT [mailto:wmoran@potentialtech.com]
Sent: May-29-15 16:35
To: Daniel Begin
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Planner cost adjustments

On Fri, 29 May 2015 09:39:00 -0400
Daniel Begin <jfd553@hotmail.com> wrote:

> Hi all,
>
> Running some queries, I found that the planner often selects
> sequential scan instead of an index scan, even if the latter is way
> faster (one order of magnitude faster if I consider some tests I made
> by setting enable_seqscan = ON/OFF). How can I figure out what
> parameter I should adjust to get the planner select an appropriate plan
that would better consider my DB setup?
>
> I had a look at
> http://www.postgresql.org/docs/9.3/static/runtime-config-query.html
> but at this point it is too much information for me;-)  Any rules of
> thumb, recipes I could use to select which parameters I should look at
first?

Here's the correct way to handle this:

1) As mentioned elsewhere, first take the time to ensure that your
   cost estimate settings are reasonable for your hardware.  See
   section 18.7.2:
   http://www.postgresql.org/docs/9.4/static/runtime-config-query.html
2) If #1 doesnt' fix it, don't change enable_seqscan. Run a bunch of
   tests on the query(s) to see how well it performs. Then do
   ANALYZE DATABASE $insert_name_here; and run all the tests again.
   If performance/planning improves, then the analyze settings on
   your server aren't aggressive enough. Make changes to related
   config settings to fix.
3) If #2 doesn't uncover the problem, run EXPLAIN ANALYZE on all the
   queries in your test. It takes a bit of understanding to do this
   step, so you'll want to read up a bit and possibly ask questions
   if you have trouble interpreting the output, but you're looking
   for discrepencies between the estimated and actual times for any
   particular table. If you find them, that tends to indicate that
   you'll need to update statistics targets on any tables with the
   problem. See:
   http://www.postgresql.org/docs/9.4/static/planner-stats.html
4) If #3 doesn't fix things, then the PostgreSQL developers want to
   know about your problem so they can improve the planner. First,
   if there are queries that are causing you problems, update the
   application to disable sequential scans _for_those_particular_
   _queries_ so your application continues to trundle along but
   don't disable sequential scans globally, as that may cause
   other queries to perform badly. Once that immediate problem is
   out of the way, put together a test case that demonstrates the
   problem you're having (but doesn't contain any proprietary
   data, etc) and post it to the list so the developers can figure
   out what to do to improve Postgres.

Hope this helps.

--
Bill Moran <wmoran@potentialtech.com>



Re: Planner cost adjustments

From
Tomas Vondra
Date:
Hi,

On 05/29/15 22:56, Daniel Begin wrote:
> Omg! I was not expecting such a step-by-step procedure, thanks!
> I'll follow the guide :-)
>
> Since I was about to provide a bit of context as asked by Tomas, here it is
> for those who are interested...
> Best regards,
> Daniel
>
> A bit of the required context...
> I am running all this on my personal PC:  Windows 64b, i7 chip, 16GB ram.
> The PostgreSQL 9.3 cluster is spread over 3X3TB external drives with write
> caching. Most tables are static (no insert).
>
> My largest table looks like this...
> Records composed of:  3 bigint, 2 boolean, 1 timestamp and 1 geography type.
> Number of records: 3870130000
> Table size: 369GB
> Indexes size: 425GB
>   - btree(primary key): 125GB
>   - btree(another field): 86GB
>   - gist(geography): 241GB
>

Huh, I haven't really expected that. Especially on a Windows laptop with
external drives (I assume 7.2k SATA drives connected using USB or maybe
eSATA?). Write cache is the on-drive write cache? Not really a good idea
to leave that enabled (volatile cache, so a risk of data loss or data
corruption).

Also, what do you mean by "spread over"? Are you using tablespaces or
some sort of RAID?

 > Overall, 40% of my table and 30% of indexes do not fit in cache
 > (effective_cache_size=10GB) but looking at mostly used tables and
 > indexes, more than 90% of what I use doesn't fit.

I don't really understand how you compute the 40% and 30%? You have
~800GB of data+indexes, and only 16GB of RAM, so that's more like 2% of
the database size. Or do you measure the hit ratios somehow?

> On one hand, according to the documentation
> (http://www.postgresql.org/docs/9.3/static/runtime-config-query.html),
> with a cache rate like mine, I should probably increase random_page_cost to
> better reflect the true cost of random storage reads.

I don't follow. Haven't you said in the first post that the database
often chooses sequential scans while index scans are way faster?
Increasing random_page_cost will only push if further towards sequential
scans, making it worse.

> On the other hand however, I found that...
> (https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server)
> "This is not where you should start to search for plan problems.
> Thet random_page_cost is pretty far down this list (at the end in
> fact). If you are getting bad plans, this shouldn't be the first
> thing you look at, even though lowering this value may be effective.
> Instead, you should start by making sure autovacuum is working
> properly, that you are collecting enough statistics, and that you
> have correctly sized the memory parameters for your server--all the
> things gone over above. After you've done all those much more
> important things, ifyou're still getting bad plans then
> you should see if lowering random_page_cost is still useful."

Well, so maybe you're at the point when tuning random_page_cost is the
right next step ... but sadly you haven't provided any example queries,
so it's hard to say. Can you choose a few queries and run EXPLAIN
ANALYZE on them (and post it to explain.depesz.com, and only put the
links here)?


> Please find below some the database config's parameters that might
> be of interest...

> Best regards,
> Daniel
>
> General config parameters I have modified
> temp_buffers = 512MB

Why are you tuning temp_buffers? Shouldn't you tune shared_buffers
instead? I'm not very familiar with Windows, and I vaguely remember
issues with larger shared_buffers values, but AFAIK that improved in the
recent releases.

> work_mem = 16MB
> maintenance_work_mem = 256MB
> checkpoint_segments = 64
> checkpoint_completion_target = 0.8
> effective_cache_size = 10GB
> logging_collector = on
> track_counts = on
> autovacuum = on

Otherwise, I don't see anything terribly misconfigured.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Planner cost adjustments

From
Daniel Begin
Date:
Thank Tomas,
Here are few answers to your questions/comments

- Write cache is the on-drive write cache and I agree on the risks... I
plugged them on a UPS.
- Spread over is done with tablespaces
- 30% & 40% ratios are that 3 indexes/10 and 4 tables/10 are larger than
10GB
- Confused comments about random_page_cost setting? Well, I was just citing
the doc I found that was confusing for me (at this stage!-)
- Sadly I haven't provided any example queries? Well you are right. When I
found this strange behavior a couple of weeks ago, I did not questioned
myself too much and did not keep comparison stats, but as things are moving
forward, I am expecting the problem to rise again so I asked the list hoping
there was a simple "just do that" answer floating around. However, as PT
suggested, at one point I will have to run new tests and I'll then let you
know :-)
- Why are you tuning temp_buffers? Simply because looking at others config
(on the web) they tend to have more than the standard value while dealing
with large and complex queries
- Shouldn't you tune shared_buffers instead? Yes but I forgot to mention
that...
Shared_buffers = 128MB, since it seems that "on Windows the useful range is
64MB to 512MB".


Best regards,
Daniel



-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tomas Vondra
Sent: May-29-15 20:19
To: Daniel Begin; 'PT'
Cc: pgsql-general@postgresql.org; 'Melvin Davidson'
Subject: Re: [GENERAL] Planner cost adjustments

Hi,

On 05/29/15 22:56, Daniel Begin wrote:
> Omg! I was not expecting such a step-by-step procedure, thanks!
> I'll follow the guide :-)
>
> Since I was about to provide a bit of context as asked by Tomas, here
> it is for those who are interested...
> Best regards,
> Daniel
>
> A bit of the required context...
> I am running all this on my personal PC:  Windows 64b, i7 chip, 16GB ram.
> The PostgreSQL 9.3 cluster is spread over 3X3TB external drives with
> write caching. Most tables are static (no insert).
>
> My largest table looks like this...
> Records composed of:  3 bigint, 2 boolean, 1 timestamp and 1 geography
type.
> Number of records: 3870130000
> Table size: 369GB
> Indexes size: 425GB
>   - btree(primary key): 125GB
>   - btree(another field): 86GB
>   - gist(geography): 241GB
>

Huh, I haven't really expected that. Especially on a Windows laptop with
external drives (I assume 7.2k SATA drives connected using USB or maybe
eSATA?). Write cache is the on-drive write cache? Not really a good idea to
leave that enabled (volatile cache, so a risk of data loss or data
corruption).

Also, what do you mean by "spread over"? Are you using tablespaces or some
sort of RAID?

 > Overall, 40% of my table and 30% of indexes do not fit in cache  >
(effective_cache_size=10GB) but looking at mostly used tables and  >
indexes, more than 90% of what I use doesn't fit.

I don't really understand how you compute the 40% and 30%? You have ~800GB
of data+indexes, and only 16GB of RAM, so that's more like 2% of the
database size. Or do you measure the hit ratios somehow?

> On one hand, according to the documentation
> (http://www.postgresql.org/docs/9.3/static/runtime-config-query.html),
> with a cache rate like mine, I should probably increase
> random_page_cost to better reflect the true cost of random storage reads.

I don't follow. Haven't you said in the first post that the database often
chooses sequential scans while index scans are way faster?
Increasing random_page_cost will only push if further towards sequential
scans, making it worse.

> On the other hand however, I found that...
> (https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server)
> "This is not where you should start to search for plan problems.
> Thet random_page_cost is pretty far down this list (at the end in
> fact). If you are getting bad plans, this shouldn't be the first thing
> you look at, even though lowering this value may be effective.
> Instead, you should start by making sure autovacuum is working
> properly, that you are collecting enough statistics, and that you have
> correctly sized the memory parameters for your server--all the things
> gone over above. After you've done all those much more important
> things, ifyou're still getting bad plans then you should see if
> lowering random_page_cost is still useful."

Well, so maybe you're at the point when tuning random_page_cost is the right
next step ... but sadly you haven't provided any example queries, so it's
hard to say. Can you choose a few queries and run EXPLAIN ANALYZE on them
(and post it to explain.depesz.com, and only put the links here)?


> Please find below some the database config's parameters that might be
> of interest...

> Best regards,
> Daniel
>
> General config parameters I have modified temp_buffers = 512MB

Why are you tuning temp_buffers? Shouldn't you tune shared_buffers instead?
I'm not very familiar with Windows, and I vaguely remember issues with
larger shared_buffers values, but AFAIK that improved in the recent
releases.

> work_mem = 16MB
> maintenance_work_mem = 256MB
> checkpoint_segments = 64
> checkpoint_completion_target = 0.8
> effective_cache_size = 10GB
> logging_collector = on
> track_counts = on
> autovacuum = on

Otherwise, I don't see anything terribly misconfigured.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Planner cost adjustments

From
Daniel Begin
Date:
Here is a follow-up on the step-by-step procedure proposed by PT

#1 - setup postgresql planner's cost estimate settings for my hardware.
----------------------------------------------------------------------------
----------
Current parameters values described in section 18.7.2 haven't been changed
except for the effective_cache_size
seq_page_cost = 1
random_page_cost = 4
cpu_tuple_cost = 0.01
cpu_index_tuple_cost = 0.005
cpu_operator_cost = 0.0025
effective_cache_size = 10GB

I did a bunch of tests on frequently used queries to see how well they
perform - using SET enable_seqscan = ON/OFF.
As described earlier in this tread, the planner use Seq Scan on tables even
if using an Index Scan is in this case 5 times faster!
Here are the logs of EXPLAIN ANALYSE on a query...

osmdump=# SET enable_seqscan = ON;
osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed FROM
changesets WHERE changesets.user_id IN(SELECT id FROM sample.users);
----------------------------------------------------------------------------
-----------------------------------------------------
 Hash Semi Join  (cost=21.50..819505.27 rows=726722 width=24) (actual
time=1574.914..7444.938 rows=338568 loops=1)
   Hash Cond: (changesets.user_id = users.id)
   ->  Seq Scan on changesets  (cost=0.00..745407.22 rows=25139722 width=24)
(actual time=0.002..4724.578 rows=25133929 loops=1)
   ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual time=0.165..0.165
rows=600 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 24kB
         ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8) (actual
time=0.003..0.073 rows=600 loops=1)
 Total runtime: 7658.715 ms
(7 rows)

osmdump=# SET enable_seqscan = OFF;
osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed FROM
changesets WHERE changesets.user_id IN(SELECT id FROM sample.users);
----------------------------------------------------------------------------
--------------------------------------------------------------------
 Nested Loop  (cost=10000000015.94..10001072613.45 rows=726722 width=24)
(actual time=0.268..1490.515 rows=338568 loops=1)
   ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600 width=8)
(actual time=0.207..0.531 rows=600 loops=1)
         ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
rows=600 width=8) (actual time=0.003..0.037 rows=600 loops=1)
   ->  Index Scan using changesets_useridndx on changesets
(cost=0.44..1775.54 rows=1211 width=24) (actual time=0.038..2.357 rows=564
loops=600
         Index Cond: (user_id = users.id)
 Total runtime: 1715.517 ms
(6 rows)


#2 - Run ANALYZE DATABASE and look at performance/planning improvement.
----------------------------------------------------------------------------
----------
I ran ANALYZE DATABASE then rerun the query. It did not produce any
significant improvement according to the EXPLAIN ANALYSE below...

osmdump=# SET enable_seqscan = ON;
osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed FROM
changesets WHERE changesets.user_id IN(SELECT id FROM sample.users);
----------------------------------------------------------------------------
-----------------------------------------------------
 Hash Semi Join  (cost=21.50..819511.42 rows=729133 width=24) (actual
time=1538.100..7307.743 rows=338568 loops=1)
   Hash Cond: (changesets.user_id = users.id)
   ->  Seq Scan on changesets  (cost=0.00..745390.84 rows=25138084 width=24)
(actual time=0.027..4620.691 rows=25133929 loops=1)
   ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual time=0.300..0.300
rows=600 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 24kB
         ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8) (actual
time=0.022..0.187 rows=600 loops=1)
 Total runtime: 7519.254 ms
(7 rows)

osmdump=# SET enable_seqscan = OFF;
osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed FROM
changesets WHERE changesets.user_id IN(SELECT id FROM sample.users);
----------------------------------------------------------------------------
--------------------------------------------------------------------
 Nested Loop  (cost=10000000015.94..10001090810.49 rows=729133 width=24)
(actual time=0.268..1466.248 rows=338568 loops=1)
   ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600 width=8)
(actual time=0.205..0.530 rows=600 loops=1)
         ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
rows=600 width=8) (actual time=0.003..0.035 rows=600 loops=1)
   ->  Index Scan using changesets_useridndx on changesets
(cost=0.44..1805.83 rows=1215 width=24) (actual time=0.036..2.314 rows=564
loops=600)
         Index Cond: (user_id = users.id)
 Total runtime: 1677.447 ms
(6 rows)

#3 - Run EXPLAIN ANALYZE and look for discrepancies between the estimated
and actual times
----------------------------------------------------------------------------
----------
Looking at above results, there are obvious discrepancies between
expected/actual rows and time!
I dug a bit by exploring/trying to understand the different concepts
explained in...

http://www.postgresql.org/docs/9.4/static/planner-stats.html
http://www.postgresql.org/docs/8.1/static/planner-stats-details.html
http://www.postgresql.org/docs/9.2/static/view-pg-stats.html

Concerning discrepancies between the actual number of rows and predicted
value, I looked at what pg_stats was saying about user_id in table
changesets.
Here are the values provided to the planner...
Average_width=8
histogram_bounds: the size of the bins varies between 50 and 150000, which
make sense because if I had divided the column's values into groups of
approximately equal population, I would have produced bins between 1 and
100000 (if sorted by frequency)
n_distinct= 20686 (there is actually 464858 distinct values for user_id in
the table)
most_common_vals: values make sense (I checked the frequency count of a
couple most common users_id)
correlation=0.617782 (?)
most_common_elems, most_common_elem_freqs and elem_count_histogram were
empty

At this point, I wonder if the assumptions behind the planner's statistics
may produce such problems since the distribution of my data is not uniform
but follows a power law (some user_id would return millions of records while
others only one).
This is the farthest I can go at this point. Maybe someone can provide me
with more explanations regarding planner's behavior and ways to go further
to make it work properly?

Best regards,
Daniel





Re: Planner cost adjustments

From
Bill Moran
Date:
On Tue, 2 Jun 2015 14:01:35 -0400
Daniel Begin <jfd553@hotmail.com> wrote:

> Here is a follow-up on the step-by-step procedure proposed by PT
>
> #1 - setup postgresql planner's cost estimate settings for my hardware.
> ----------------------------------------------------------------------------
> ----------
> Current parameters values described in section 18.7.2 haven't been changed
> except for the effective_cache_size
> seq_page_cost = 1
> random_page_cost = 4
> cpu_tuple_cost = 0.01
> cpu_index_tuple_cost = 0.005
> cpu_operator_cost = 0.0025
> effective_cache_size = 10GB
>
> I did a bunch of tests on frequently used queries to see how well they
> perform - using SET enable_seqscan = ON/OFF.
> As described earlier in this tread, the planner use Seq Scan on tables even
> if using an Index Scan is in this case 5 times faster!
> Here are the logs of EXPLAIN ANALYSE on a query...

In an earlier message you mention that the drives are external to the
computer. I don't remember details, but external drives can be quite
dicey as far as performance goes, depending on the specific technlogy
(USB vs. external SATA vs. NAS, for example) as well as some pretty
wild variances between different brands of the same technology.
See:
http://www.databasesoup.com/2012/05/random-page-cost-revisited.html

As a result, I'm suspicious that the default values you're using for
random_page_cost and seq_page_cost are throwing things off becuase
your disks aren't performing like internally connected disks.

Correct me if I'm wrong on any of the assumptions I'm making here,
but I got the impression that you can tweak values and restart
Postgres without any hardship. If that's the case, I'm guessing that
raising seq_page_cost (possible to 2) will cause Postgres to make
better decisions about what are good plans. My suggestion is to try
some different values for those two settings, doing several tests
after each change, and see if you can find a set of values that
starts getting you good plans. It appears that the planner thinks
that it can get better performance by reading from the disk in
sequence than by picking random pages, and that makes me think that
the difference between seq_page_cost and random_page_cost is bigger
than the actual behavior of the drives.

More comments below.

> osmdump=# SET enable_seqscan = ON;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed FROM
> changesets WHERE changesets.user_id IN(SELECT id FROM sample.users);
> ----------------------------------------------------------------------------
> -----------------------------------------------------
>  Hash Semi Join  (cost=21.50..819505.27 rows=726722 width=24) (actual
> time=1574.914..7444.938 rows=338568 loops=1)
>    Hash Cond: (changesets.user_id = users.id)
>    ->  Seq Scan on changesets  (cost=0.00..745407.22 rows=25139722 width=24)
> (actual time=0.002..4724.578 rows=25133929 loops=1)
>    ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual time=0.165..0.165
> rows=600 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 24kB
>          ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8) (actual
> time=0.003..0.073 rows=600 loops=1)
>  Total runtime: 7658.715 ms
> (7 rows)
>
> osmdump=# SET enable_seqscan = OFF;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed FROM
> changesets WHERE changesets.user_id IN(SELECT id FROM sample.users);
> ----------------------------------------------------------------------------
> --------------------------------------------------------------------
>  Nested Loop  (cost=10000000015.94..10001072613.45 rows=726722 width=24)
> (actual time=0.268..1490.515 rows=338568 loops=1)
>    ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600 width=8)
> (actual time=0.207..0.531 rows=600 loops=1)
>          ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
> rows=600 width=8) (actual time=0.003..0.037 rows=600 loops=1)
>    ->  Index Scan using changesets_useridndx on changesets
> (cost=0.44..1775.54 rows=1211 width=24) (actual time=0.038..2.357 rows=564
> loops=600
>          Index Cond: (user_id = users.id)
>  Total runtime: 1715.517 ms
> (6 rows)
>
>
> #2 - Run ANALYZE DATABASE and look at performance/planning improvement.
> ----------------------------------------------------------------------------
> ----------
> I ran ANALYZE DATABASE then rerun the query. It did not produce any
> significant improvement according to the EXPLAIN ANALYSE below...
>
> osmdump=# SET enable_seqscan = ON;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed FROM
> changesets WHERE changesets.user_id IN(SELECT id FROM sample.users);
> ----------------------------------------------------------------------------
> -----------------------------------------------------
>  Hash Semi Join  (cost=21.50..819511.42 rows=729133 width=24) (actual
> time=1538.100..7307.743 rows=338568 loops=1)
>    Hash Cond: (changesets.user_id = users.id)
>    ->  Seq Scan on changesets  (cost=0.00..745390.84 rows=25138084 width=24)
> (actual time=0.027..4620.691 rows=25133929 loops=1)
>    ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual time=0.300..0.300
> rows=600 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 24kB
>          ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8) (actual
> time=0.022..0.187 rows=600 loops=1)
>  Total runtime: 7519.254 ms
> (7 rows)
>
> osmdump=# SET enable_seqscan = OFF;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed FROM
> changesets WHERE changesets.user_id IN(SELECT id FROM sample.users);
> ----------------------------------------------------------------------------
> --------------------------------------------------------------------
>  Nested Loop  (cost=10000000015.94..10001090810.49 rows=729133 width=24)
> (actual time=0.268..1466.248 rows=338568 loops=1)
>    ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600 width=8)
> (actual time=0.205..0.530 rows=600 loops=1)
>          ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
> rows=600 width=8) (actual time=0.003..0.035 rows=600 loops=1)
>    ->  Index Scan using changesets_useridndx on changesets
> (cost=0.44..1805.83 rows=1215 width=24) (actual time=0.036..2.314 rows=564
> loops=600)
>          Index Cond: (user_id = users.id)
>  Total runtime: 1677.447 ms
> (6 rows)
>
> #3 - Run EXPLAIN ANALYZE and look for discrepancies between the estimated
> and actual times
> ----------------------------------------------------------------------------
> ----------
> Looking at above results, there are obvious discrepancies between
> expected/actual rows and time!
> I dug a bit by exploring/trying to understand the different concepts
> explained in...
>
> http://www.postgresql.org/docs/9.4/static/planner-stats.html
> http://www.postgresql.org/docs/8.1/static/planner-stats-details.html
> http://www.postgresql.org/docs/9.2/static/view-pg-stats.html
>
> Concerning discrepancies between the actual number of rows and predicted
> value, I looked at what pg_stats was saying about user_id in table
> changesets.
> Here are the values provided to the planner...
> Average_width=8
> histogram_bounds: the size of the bins varies between 50 and 150000, which
> make sense because if I had divided the column's values into groups of
> approximately equal population, I would have produced bins between 1 and
> 100000 (if sorted by frequency)
> n_distinct= 20686 (there is actually 464858 distinct values for user_id in
> the table)
> most_common_vals: values make sense (I checked the frequency count of a
> couple most common users_id)
> correlation=0.617782 (?)
> most_common_elems, most_common_elem_freqs and elem_count_histogram were
> empty
>
> At this point, I wonder if the assumptions behind the planner's statistics
> may produce such problems since the distribution of my data is not uniform
> but follows a power law (some user_id would return millions of records while
> others only one).
> This is the farthest I can go at this point. Maybe someone can provide me
> with more explanations regarding planner's behavior and ways to go further
> to make it work properly?

You may also benefit from increasing the statistics targets and
running ANALYZE again. It certainly looks like some of those stats are
pretty far off. Raising the statistics target will cause ANALYZE to
investigate more rows (which takes longer but might produce more
accurate results)

I suggest experimenting with the cost settings first, though.

--
Bill Moran <wmoran@potentialtech.com>


Re: Planner cost adjustments

From
Daniel Begin
Date:
Thank Bill,

About disks performance, all drives are identical and connected using USB3
connections and yes, I can tweak values and restart Postgres without any
hardship!-)
About seq_page_cost and random_page_cost, I am about to test different lower
values as you and Thomas propose.
Raising the statistics target is a good idea. Since most of the data have a
power law distribution it might affect the statistics.

I will do as suggested and get back to the community for further comments. I
wished to follow the procedure proposed by PT, just in case I eventually had
to on step 4 (contact PostgreSQL developers so they can improve the
planner).

And I am still open to other proposal
Daniel

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bill Moran
Sent: June-02-15 23:31
To: Daniel Begin
Cc: pgsql-general@postgresql.org; 'Tomas Vondra'; 'Melvin Davidson'
Subject: Re: [GENERAL] Planner cost adjustments

On Tue, 2 Jun 2015 14:01:35 -0400
Daniel Begin <jfd553@hotmail.com> wrote:

> Here is a follow-up on the step-by-step procedure proposed by PT
>
> #1 - setup postgresql planner's cost estimate settings for my hardware.
> ----------------------------------------------------------------------
> ------
> ----------
> Current parameters values described in section 18.7.2 haven't been
> changed except for the effective_cache_size seq_page_cost = 1
> random_page_cost = 4 cpu_tuple_cost = 0.01 cpu_index_tuple_cost =
> 0.005 cpu_operator_cost = 0.0025 effective_cache_size = 10GB
>
> I did a bunch of tests on frequently used queries to see how well they
> perform - using SET enable_seqscan = ON/OFF.
> As described earlier in this tread, the planner use Seq Scan on tables
> even if using an Index Scan is in this case 5 times faster!
> Here are the logs of EXPLAIN ANALYSE on a query...

In an earlier message you mention that the drives are external to the
computer. I don't remember details, but external drives can be quite dicey
as far as performance goes, depending on the specific technlogy (USB vs.
external SATA vs. NAS, for example) as well as some pretty wild variances
between different brands of the same technology.
See:
http://www.databasesoup.com/2012/05/random-page-cost-revisited.html

As a result, I'm suspicious that the default values you're using for
random_page_cost and seq_page_cost are throwing things off becuase your
disks aren't performing like internally connected disks.

Correct me if I'm wrong on any of the assumptions I'm making here, but I got
the impression that you can tweak values and restart Postgres without any
hardship. If that's the case, I'm guessing that raising seq_page_cost
(possible to 2) will cause Postgres to make better decisions about what are
good plans. My suggestion is to try some different values for those two
settings, doing several tests after each change, and see if you can find a
set of values that starts getting you good plans. It appears that the
planner thinks that it can get better performance by reading from the disk
in sequence than by picking random pages, and that makes me think that the
difference between seq_page_cost and random_page_cost is bigger than the
actual behavior of the drives.

More comments below.

> osmdump=# SET enable_seqscan = ON;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> sample.users);
> ----------------------------------------------------------------------
> ------
> -----------------------------------------------------
>  Hash Semi Join  (cost=21.50..819505.27 rows=726722 width=24) (actual
> time=1574.914..7444.938 rows=338568 loops=1)
>    Hash Cond: (changesets.user_id = users.id)
>    ->  Seq Scan on changesets  (cost=0.00..745407.22 rows=25139722
> width=24) (actual time=0.002..4724.578 rows=25133929 loops=1)
>    ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual
> time=0.165..0.165
> rows=600 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 24kB
>          ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8)
> (actual
> time=0.003..0.073 rows=600 loops=1)
>  Total runtime: 7658.715 ms
> (7 rows)
>
> osmdump=# SET enable_seqscan = OFF;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> sample.users);
> ----------------------------------------------------------------------
> ------
> --------------------------------------------------------------------
>  Nested Loop  (cost=10000000015.94..10001072613.45 rows=726722
> width=24) (actual time=0.268..1490.515 rows=338568 loops=1)
>    ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600
> width=8) (actual time=0.207..0.531 rows=600 loops=1)
>          ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
> rows=600 width=8) (actual time=0.003..0.037 rows=600 loops=1)
>    ->  Index Scan using changesets_useridndx on changesets
> (cost=0.44..1775.54 rows=1211 width=24) (actual time=0.038..2.357
> rows=564
> loops=600
>          Index Cond: (user_id = users.id)  Total runtime: 1715.517 ms
> (6 rows)
>
>
> #2 - Run ANALYZE DATABASE and look at performance/planning improvement.
> ----------------------------------------------------------------------
> ------
> ----------
> I ran ANALYZE DATABASE then rerun the query. It did not produce any
> significant improvement according to the EXPLAIN ANALYSE below...
>
> osmdump=# SET enable_seqscan = ON;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> sample.users);
> ----------------------------------------------------------------------
> ------
> -----------------------------------------------------
>  Hash Semi Join  (cost=21.50..819511.42 rows=729133 width=24) (actual
> time=1538.100..7307.743 rows=338568 loops=1)
>    Hash Cond: (changesets.user_id = users.id)
>    ->  Seq Scan on changesets  (cost=0.00..745390.84 rows=25138084
> width=24) (actual time=0.027..4620.691 rows=25133929 loops=1)
>    ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual
> time=0.300..0.300
> rows=600 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 24kB
>          ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8)
> (actual
> time=0.022..0.187 rows=600 loops=1)
>  Total runtime: 7519.254 ms
> (7 rows)
>
> osmdump=# SET enable_seqscan = OFF;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> sample.users);
> ----------------------------------------------------------------------
> ------
> --------------------------------------------------------------------
>  Nested Loop  (cost=10000000015.94..10001090810.49 rows=729133
> width=24) (actual time=0.268..1466.248 rows=338568 loops=1)
>    ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600
> width=8) (actual time=0.205..0.530 rows=600 loops=1)
>          ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
> rows=600 width=8) (actual time=0.003..0.035 rows=600 loops=1)
>    ->  Index Scan using changesets_useridndx on changesets
> (cost=0.44..1805.83 rows=1215 width=24) (actual time=0.036..2.314
> rows=564
> loops=600)
>          Index Cond: (user_id = users.id)  Total runtime: 1677.447 ms
> (6 rows)
>
> #3 - Run EXPLAIN ANALYZE and look for discrepancies between the
> estimated and actual times
> ----------------------------------------------------------------------
> ------
> ----------
> Looking at above results, there are obvious discrepancies between
> expected/actual rows and time!
> I dug a bit by exploring/trying to understand the different concepts
> explained in...
>
> http://www.postgresql.org/docs/9.4/static/planner-stats.html
> http://www.postgresql.org/docs/8.1/static/planner-stats-details.html
> http://www.postgresql.org/docs/9.2/static/view-pg-stats.html
>
> Concerning discrepancies between the actual number of rows and
> predicted value, I looked at what pg_stats was saying about user_id in
> table changesets.
> Here are the values provided to the planner...
> Average_width=8
> histogram_bounds: the size of the bins varies between 50 and 150000,
> which make sense because if I had divided the column's values into
> groups of approximately equal population, I would have produced bins
> between 1 and
> 100000 (if sorted by frequency)
> n_distinct= 20686 (there is actually 464858 distinct values for
> user_id in the table)
> most_common_vals: values make sense (I checked the frequency count of
> a couple most common users_id)
> correlation=0.617782 (?)
> most_common_elems, most_common_elem_freqs and elem_count_histogram
> were empty
>
> At this point, I wonder if the assumptions behind the planner's
> statistics may produce such problems since the distribution of my data
> is not uniform but follows a power law (some user_id would return
> millions of records while others only one).
> This is the farthest I can go at this point. Maybe someone can provide
> me with more explanations regarding planner's behavior and ways to go
> further to make it work properly?

You may also benefit from increasing the statistics targets and running
ANALYZE again. It certainly looks like some of those stats are pretty far
off. Raising the statistics target will cause ANALYZE to investigate more
rows (which takes longer but might produce more accurate results)

I suggest experimenting with the cost settings first, though.

--
Bill Moran <wmoran@potentialtech.com>


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Planner cost adjustments

From
Kevin Grittner
Date:
Daniel Begin <jfd553@hotmail.com> wrote:

> I can tweak values and restart Postgres without any hardship!

Many of the important performance-related settings (especially cost
factors) can be adjusted with the SET command to affect just the
one connection.  This can make experimenting a lot easier.

> About seq_page_cost and random_page_cost, I am about to test
> different lower values as you and Thomas propose.

I have often been unable to get optimal plans without boosting the
value of cpu_tuple_cost (in addition to adjusting the page costs
and setting effective_cache_size).  Generally 0.03 is enough,
although I have personally never seen problems with going up to
0.05, and that sometimes fixes a few plans that 0.03 misses.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Planner cost adjustments

From
Daniel Begin
Date:
Here is a follow-up on adjusting the planner costs calculation

-Statistics target of problematic columns were increased from 100 to 200.
-Analyse was ran on all concerned tables (actually ran on the whole DB)
-Random_page_cost was decreased from 4 to 2.

As a result, about 80% of queries are now using what I would consider an
optimal plan. However, the planner keeps using full table scans for large
tables...

For instance, I ran a query that should have used an index scan on two
similar test tables. The planner had selected an index scan for the smaller
one and a Seq Scan for larger one. Except for their sizes and for one field
not used in the exercise, the test tables were identical and indexed on the
same field. The smaller test table had 3.26E+10 records and the larger one
3.78E+11 records.

The query looked like...
SELECT cs.user_id, cs.changeset_id, nd.id, nd.version
FROM changesets_selection cs, a_test_table nd
WHERE nd.changeset_id=cs.changeset_id;

In order to understand why the planner selected the Seq Scan instead of an
Index Scan on the large table (nodes), I ran an EXPLAIN ANALYSE (on warm
cache) using enable_seqscan set to OFF/ON.

-- QUERY PLAN on larger table (nodes) with SET enable_seqscan=OFF
-----------------------------;
 Nested Loop  (cost=10000000000.58..10210480648.03 rows=194204592 width=40)
(actual time=74.088..278008.579 rows=140112 loops=1)
   ->  Seq Scan on changesets_selection cs
(cost=10000000000.00..10000000110.44 rows=6644 width=24) (actual
time=0.015..4.904 rows=6903 loops=1)
   ->  Index Scan using nodes_changesetidndx on nodes nd
(cost=0.58..31387.49 rows=29230 width=24) (actual time=12.655..40.259
rows=20 loops=6903)
         Index Cond: (changeset_id = cs.changeset_id)
 Total runtime: 278026.196 ms
(5 rows)

-Completed after less than 5 minutes processing
-I assume that cost=10000000000.00..10000000110.44 for the Seq Scan of
changesets_selection is an artefact of setting enable_seqscan=OFF.
-From what I see, the evaluation of rows number is still way off (1400X) for
the large table, even if the statistics target was doubled to 200.

-- QUERY PLAN on larger table (nodes) with SET enable_seqscan=ON
------------------------------;
 Hash Join  (cost=156171782.28..185673195.13 rows=194204592 width=40)
   Hash Cond: (cs.changeset_id = n.changeset_id)
   ->  Seq Scan on changesets_selection cs  (cost=0.00..110.44 rows=6644
width=24)
   ->  Hash  (cost=84959952.68..84959952.68 rows=3878771968 width=24)
         ->  Seq Scan on nodes nd  (cost=0.00..84959952.68 rows=3878771968
width=24)
(5 rows)

-Still running after 2:30 hours processing! That is why I did not provided
the actual time and rows (however, actual rows are provided on first query
plan)
-Not surprisingly, the evaluation of rows number is way off again for the
large table - same stats, same results...

It seems there is a problem with my large table statistics, even after
increase them to 200. Should I increase the statistic target to 500, or even
to 1000?
Is there something else I can trigger to get the appropriate plan?

Comments/explanations would be appreciated
Daniel


-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Daniel Begin
Sent: June-03-15 06:32
To: 'Bill Moran'
Cc: pgsql-general@postgresql.org; 'Tomas Vondra'; 'Melvin Davidson'
Subject: Re: [GENERAL] Planner cost adjustments

Thank Bill,

About disks performance, all drives are identical and connected using USB3
connections and yes, I can tweak values and restart Postgres without any
hardship!-)
About seq_page_cost and random_page_cost, I am about to test different lower
values as you and Thomas propose.
Raising the statistics target is a good idea. Since most of the data have a
power law distribution it might affect the statistics.

I will do as suggested and get back to the community for further comments. I
wished to follow the procedure proposed by PT, just in case I eventually had
to on step 4 (contact PostgreSQL developers so they can improve the
planner).

And I am still open to other proposal
Daniel

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bill Moran
Sent: June-02-15 23:31
To: Daniel Begin
Cc: pgsql-general@postgresql.org; 'Tomas Vondra'; 'Melvin Davidson'
Subject: Re: [GENERAL] Planner cost adjustments

On Tue, 2 Jun 2015 14:01:35 -0400
Daniel Begin <jfd553@hotmail.com> wrote:

> Here is a follow-up on the step-by-step procedure proposed by PT
>
> #1 - setup postgresql planner's cost estimate settings for my hardware.
> ----------------------------------------------------------------------
> ------
> ----------
> Current parameters values described in section 18.7.2 haven't been
> changed except for the effective_cache_size seq_page_cost = 1
> random_page_cost = 4 cpu_tuple_cost = 0.01 cpu_index_tuple_cost =
> 0.005 cpu_operator_cost = 0.0025 effective_cache_size = 10GB
>
> I did a bunch of tests on frequently used queries to see how well they
> perform - using SET enable_seqscan = ON/OFF.
> As described earlier in this tread, the planner use Seq Scan on tables
> even if using an Index Scan is in this case 5 times faster!
> Here are the logs of EXPLAIN ANALYSE on a query...

In an earlier message you mention that the drives are external to the
computer. I don't remember details, but external drives can be quite dicey
as far as performance goes, depending on the specific technlogy (USB vs.
external SATA vs. NAS, for example) as well as some pretty wild variances
between different brands of the same technology.
See:
http://www.databasesoup.com/2012/05/random-page-cost-revisited.html

As a result, I'm suspicious that the default values you're using for
random_page_cost and seq_page_cost are throwing things off becuase your
disks aren't performing like internally connected disks.

Correct me if I'm wrong on any of the assumptions I'm making here, but I got
the impression that you can tweak values and restart Postgres without any
hardship. If that's the case, I'm guessing that raising seq_page_cost
(possible to 2) will cause Postgres to make better decisions about what are
good plans. My suggestion is to try some different values for those two
settings, doing several tests after each change, and see if you can find a
set of values that starts getting you good plans. It appears that the
planner thinks that it can get better performance by reading from the disk
in sequence than by picking random pages, and that makes me think that the
difference between seq_page_cost and random_page_cost is bigger than the
actual behavior of the drives.

More comments below.

> osmdump=# SET enable_seqscan = ON;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> sample.users);
> ----------------------------------------------------------------------
> ------
> -----------------------------------------------------
>  Hash Semi Join  (cost=21.50..819505.27 rows=726722 width=24) (actual
> time=1574.914..7444.938 rows=338568 loops=1)
>    Hash Cond: (changesets.user_id = users.id)
>    ->  Seq Scan on changesets  (cost=0.00..745407.22 rows=25139722
> width=24) (actual time=0.002..4724.578 rows=25133929 loops=1)
>    ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual
> time=0.165..0.165
> rows=600 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 24kB
>          ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8)
> (actual
> time=0.003..0.073 rows=600 loops=1)
>  Total runtime: 7658.715 ms
> (7 rows)
>
> osmdump=# SET enable_seqscan = OFF;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> sample.users);
> ----------------------------------------------------------------------
> ------
> --------------------------------------------------------------------
>  Nested Loop  (cost=10000000015.94..10001072613.45 rows=726722
> width=24) (actual time=0.268..1490.515 rows=338568 loops=1)
>    ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600
> width=8) (actual time=0.207..0.531 rows=600 loops=1)
>          ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
> rows=600 width=8) (actual time=0.003..0.037 rows=600 loops=1)
>    ->  Index Scan using changesets_useridndx on changesets
> (cost=0.44..1775.54 rows=1211 width=24) (actual time=0.038..2.357
> rows=564
> loops=600
>          Index Cond: (user_id = users.id)  Total runtime: 1715.517 ms
> (6 rows)
>
>
> #2 - Run ANALYZE DATABASE and look at performance/planning improvement.
> ----------------------------------------------------------------------
> ------
> ----------
> I ran ANALYZE DATABASE then rerun the query. It did not produce any
> significant improvement according to the EXPLAIN ANALYSE below...
>
> osmdump=# SET enable_seqscan = ON;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> sample.users);
> ----------------------------------------------------------------------
> ------
> -----------------------------------------------------
>  Hash Semi Join  (cost=21.50..819511.42 rows=729133 width=24) (actual
> time=1538.100..7307.743 rows=338568 loops=1)
>    Hash Cond: (changesets.user_id = users.id)
>    ->  Seq Scan on changesets  (cost=0.00..745390.84 rows=25138084
> width=24) (actual time=0.027..4620.691 rows=25133929 loops=1)
>    ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual
> time=0.300..0.300
> rows=600 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 24kB
>          ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8)
> (actual
> time=0.022..0.187 rows=600 loops=1)
>  Total runtime: 7519.254 ms
> (7 rows)
>
> osmdump=# SET enable_seqscan = OFF;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> sample.users);
> ----------------------------------------------------------------------
> ------
> --------------------------------------------------------------------
>  Nested Loop  (cost=10000000015.94..10001090810.49 rows=729133
> width=24) (actual time=0.268..1466.248 rows=338568 loops=1)
>    ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600
> width=8) (actual time=0.205..0.530 rows=600 loops=1)
>          ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
> rows=600 width=8) (actual time=0.003..0.035 rows=600 loops=1)
>    ->  Index Scan using changesets_useridndx on changesets
> (cost=0.44..1805.83 rows=1215 width=24) (actual time=0.036..2.314
> rows=564
> loops=600)
>          Index Cond: (user_id = users.id)  Total runtime: 1677.447 ms
> (6 rows)
>
> #3 - Run EXPLAIN ANALYZE and look for discrepancies between the
> estimated and actual times
> ----------------------------------------------------------------------
> ------
> ----------
> Looking at above results, there are obvious discrepancies between
> expected/actual rows and time!
> I dug a bit by exploring/trying to understand the different concepts
> explained in...
>
> http://www.postgresql.org/docs/9.4/static/planner-stats.html
> http://www.postgresql.org/docs/8.1/static/planner-stats-details.html
> http://www.postgresql.org/docs/9.2/static/view-pg-stats.html
>
> Concerning discrepancies between the actual number of rows and
> predicted value, I looked at what pg_stats was saying about user_id in
> table changesets.
> Here are the values provided to the planner...
> Average_width=8
> histogram_bounds: the size of the bins varies between 50 and 150000,
> which make sense because if I had divided the column's values into
> groups of approximately equal population, I would have produced bins
> between 1 and
> 100000 (if sorted by frequency)
> n_distinct= 20686 (there is actually 464858 distinct values for
> user_id in the table)
> most_common_vals: values make sense (I checked the frequency count of
> a couple most common users_id)
> correlation=0.617782 (?)
> most_common_elems, most_common_elem_freqs and elem_count_histogram
> were empty
>
> At this point, I wonder if the assumptions behind the planner's
> statistics may produce such problems since the distribution of my data
> is not uniform but follows a power law (some user_id would return
> millions of records while others only one).
> This is the farthest I can go at this point. Maybe someone can provide
> me with more explanations regarding planner's behavior and ways to go
> further to make it work properly?

You may also benefit from increasing the statistics targets and running
ANALYZE again. It certainly looks like some of those stats are pretty far
off. Raising the statistics target will cause ANALYZE to investigate more
rows (which takes longer but might produce more accurate results)

I suggest experimenting with the cost settings first, though.

--
Bill Moran <wmoran@potentialtech.com>


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Planner cost adjustments

From
Bill Moran
Date:
On Wed, 10 Jun 2015 17:20:00 -0400
Daniel Begin <jfd553@hotmail.com> wrote:

> Here is a follow-up on adjusting the planner costs calculation
>
> -Statistics target of problematic columns were increased from 100 to 200.
> -Analyse was ran on all concerned tables (actually ran on the whole DB)
> -Random_page_cost was decreased from 4 to 2.

Since you changed two things, there's no way to be sure which change led
to the improvement. You seem to be assuming that changing the stastics
target was what helped. While that _may_ be right, it might also have
been the change to random_page_cost.

> As a result, about 80% of queries are now using what I would consider an
> optimal plan. However, the planner keeps using full table scans for large
> tables...
>
> For instance, I ran a query that should have used an index scan on two
> similar test tables. The planner had selected an index scan for the smaller
> one and a Seq Scan for larger one. Except for their sizes and for one field
> not used in the exercise, the test tables were identical and indexed on the
> same field. The smaller test table had 3.26E+10 records and the larger one
> 3.78E+11 records.
>
> The query looked like...
> SELECT cs.user_id, cs.changeset_id, nd.id, nd.version
> FROM changesets_selection cs, a_test_table nd
> WHERE nd.changeset_id=cs.changeset_id;
>
> In order to understand why the planner selected the Seq Scan instead of an
> Index Scan on the large table (nodes), I ran an EXPLAIN ANALYSE (on warm
> cache) using enable_seqscan set to OFF/ON.
>
> -- QUERY PLAN on larger table (nodes) with SET enable_seqscan=OFF
> -----------------------------;
>  Nested Loop  (cost=10000000000.58..10210480648.03 rows=194204592 width=40)
> (actual time=74.088..278008.579 rows=140112 loops=1)
>    ->  Seq Scan on changesets_selection cs
> (cost=10000000000.00..10000000110.44 rows=6644 width=24) (actual
> time=0.015..4.904 rows=6903 loops=1)
>    ->  Index Scan using nodes_changesetidndx on nodes nd
> (cost=0.58..31387.49 rows=29230 width=24) (actual time=12.655..40.259
> rows=20 loops=6903)
>          Index Cond: (changeset_id = cs.changeset_id)
>  Total runtime: 278026.196 ms
> (5 rows)
>
> -Completed after less than 5 minutes processing
> -I assume that cost=10000000000.00..10000000110.44 for the Seq Scan of
> changesets_selection is an artefact of setting enable_seqscan=OFF.
> -From what I see, the evaluation of rows number is still way off (1400X) for
> the large table, even if the statistics target was doubled to 200.
>
> -- QUERY PLAN on larger table (nodes) with SET enable_seqscan=ON
> ------------------------------;
>  Hash Join  (cost=156171782.28..185673195.13 rows=194204592 width=40)
>    Hash Cond: (cs.changeset_id = n.changeset_id)
>    ->  Seq Scan on changesets_selection cs  (cost=0.00..110.44 rows=6644
> width=24)
>    ->  Hash  (cost=84959952.68..84959952.68 rows=3878771968 width=24)
>          ->  Seq Scan on nodes nd  (cost=0.00..84959952.68 rows=3878771968
> width=24)
> (5 rows)
>
> -Still running after 2:30 hours processing! That is why I did not provided
> the actual time and rows (however, actual rows are provided on first query
> plan)
> -Not surprisingly, the evaluation of rows number is way off again for the
> large table - same stats, same results...
>
> It seems there is a problem with my large table statistics, even after
> increase them to 200. Should I increase the statistic target to 500, or even
> to 1000?
> Is there something else I can trigger to get the appropriate plan?
>
> Comments/explanations would be appreciated
> Daniel
>
>
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Daniel Begin
> Sent: June-03-15 06:32
> To: 'Bill Moran'
> Cc: pgsql-general@postgresql.org; 'Tomas Vondra'; 'Melvin Davidson'
> Subject: Re: [GENERAL] Planner cost adjustments
>
> Thank Bill,
>
> About disks performance, all drives are identical and connected using USB3
> connections and yes, I can tweak values and restart Postgres without any
> hardship!-)
> About seq_page_cost and random_page_cost, I am about to test different lower
> values as you and Thomas propose.
> Raising the statistics target is a good idea. Since most of the data have a
> power law distribution it might affect the statistics.
>
> I will do as suggested and get back to the community for further comments. I
> wished to follow the procedure proposed by PT, just in case I eventually had
> to on step 4 (contact PostgreSQL developers so they can improve the
> planner).
>
> And I am still open to other proposal
> Daniel
>
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bill Moran
> Sent: June-02-15 23:31
> To: Daniel Begin
> Cc: pgsql-general@postgresql.org; 'Tomas Vondra'; 'Melvin Davidson'
> Subject: Re: [GENERAL] Planner cost adjustments
>
> On Tue, 2 Jun 2015 14:01:35 -0400
> Daniel Begin <jfd553@hotmail.com> wrote:
>
> > Here is a follow-up on the step-by-step procedure proposed by PT
> >
> > #1 - setup postgresql planner's cost estimate settings for my hardware.
> > ----------------------------------------------------------------------
> > ------
> > ----------
> > Current parameters values described in section 18.7.2 haven't been
> > changed except for the effective_cache_size seq_page_cost = 1
> > random_page_cost = 4 cpu_tuple_cost = 0.01 cpu_index_tuple_cost =
> > 0.005 cpu_operator_cost = 0.0025 effective_cache_size = 10GB
> >
> > I did a bunch of tests on frequently used queries to see how well they
> > perform - using SET enable_seqscan = ON/OFF.
> > As described earlier in this tread, the planner use Seq Scan on tables
> > even if using an Index Scan is in this case 5 times faster!
> > Here are the logs of EXPLAIN ANALYSE on a query...
>
> In an earlier message you mention that the drives are external to the
> computer. I don't remember details, but external drives can be quite dicey
> as far as performance goes, depending on the specific technlogy (USB vs.
> external SATA vs. NAS, for example) as well as some pretty wild variances
> between different brands of the same technology.
> See:
> http://www.databasesoup.com/2012/05/random-page-cost-revisited.html
>
> As a result, I'm suspicious that the default values you're using for
> random_page_cost and seq_page_cost are throwing things off becuase your
> disks aren't performing like internally connected disks.
>
> Correct me if I'm wrong on any of the assumptions I'm making here, but I got
> the impression that you can tweak values and restart Postgres without any
> hardship. If that's the case, I'm guessing that raising seq_page_cost
> (possible to 2) will cause Postgres to make better decisions about what are
> good plans. My suggestion is to try some different values for those two
> settings, doing several tests after each change, and see if you can find a
> set of values that starts getting you good plans. It appears that the
> planner thinks that it can get better performance by reading from the disk
> in sequence than by picking random pages, and that makes me think that the
> difference between seq_page_cost and random_page_cost is bigger than the
> actual behavior of the drives.
>
> More comments below.
>
> > osmdump=# SET enable_seqscan = ON;
> > osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> > FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> > sample.users);
> > ----------------------------------------------------------------------
> > ------
> > -----------------------------------------------------
> >  Hash Semi Join  (cost=21.50..819505.27 rows=726722 width=24) (actual
> > time=1574.914..7444.938 rows=338568 loops=1)
> >    Hash Cond: (changesets.user_id = users.id)
> >    ->  Seq Scan on changesets  (cost=0.00..745407.22 rows=25139722
> > width=24) (actual time=0.002..4724.578 rows=25133929 loops=1)
> >    ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual
> > time=0.165..0.165
> > rows=600 loops=1)
> >          Buckets: 1024  Batches: 1  Memory Usage: 24kB
> >          ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8)
> > (actual
> > time=0.003..0.073 rows=600 loops=1)
> >  Total runtime: 7658.715 ms
> > (7 rows)
> >
> > osmdump=# SET enable_seqscan = OFF;
> > osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> > FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> > sample.users);
> > ----------------------------------------------------------------------
> > ------
> > --------------------------------------------------------------------
> >  Nested Loop  (cost=10000000015.94..10001072613.45 rows=726722
> > width=24) (actual time=0.268..1490.515 rows=338568 loops=1)
> >    ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600
> > width=8) (actual time=0.207..0.531 rows=600 loops=1)
> >          ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
> > rows=600 width=8) (actual time=0.003..0.037 rows=600 loops=1)
> >    ->  Index Scan using changesets_useridndx on changesets
> > (cost=0.44..1775.54 rows=1211 width=24) (actual time=0.038..2.357
> > rows=564
> > loops=600
> >          Index Cond: (user_id = users.id)  Total runtime: 1715.517 ms
> > (6 rows)
> >
> >
> > #2 - Run ANALYZE DATABASE and look at performance/planning improvement.
> > ----------------------------------------------------------------------
> > ------
> > ----------
> > I ran ANALYZE DATABASE then rerun the query. It did not produce any
> > significant improvement according to the EXPLAIN ANALYSE below...
> >
> > osmdump=# SET enable_seqscan = ON;
> > osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> > FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> > sample.users);
> > ----------------------------------------------------------------------
> > ------
> > -----------------------------------------------------
> >  Hash Semi Join  (cost=21.50..819511.42 rows=729133 width=24) (actual
> > time=1538.100..7307.743 rows=338568 loops=1)
> >    Hash Cond: (changesets.user_id = users.id)
> >    ->  Seq Scan on changesets  (cost=0.00..745390.84 rows=25138084
> > width=24) (actual time=0.027..4620.691 rows=25133929 loops=1)
> >    ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual
> > time=0.300..0.300
> > rows=600 loops=1)
> >          Buckets: 1024  Batches: 1  Memory Usage: 24kB
> >          ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8)
> > (actual
> > time=0.022..0.187 rows=600 loops=1)
> >  Total runtime: 7519.254 ms
> > (7 rows)
> >
> > osmdump=# SET enable_seqscan = OFF;
> > osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> > FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> > sample.users);
> > ----------------------------------------------------------------------
> > ------
> > --------------------------------------------------------------------
> >  Nested Loop  (cost=10000000015.94..10001090810.49 rows=729133
> > width=24) (actual time=0.268..1466.248 rows=338568 loops=1)
> >    ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600
> > width=8) (actual time=0.205..0.530 rows=600 loops=1)
> >          ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
> > rows=600 width=8) (actual time=0.003..0.035 rows=600 loops=1)
> >    ->  Index Scan using changesets_useridndx on changesets
> > (cost=0.44..1805.83 rows=1215 width=24) (actual time=0.036..2.314
> > rows=564
> > loops=600)
> >          Index Cond: (user_id = users.id)  Total runtime: 1677.447 ms
> > (6 rows)
> >
> > #3 - Run EXPLAIN ANALYZE and look for discrepancies between the
> > estimated and actual times
> > ----------------------------------------------------------------------
> > ------
> > ----------
> > Looking at above results, there are obvious discrepancies between
> > expected/actual rows and time!
> > I dug a bit by exploring/trying to understand the different concepts
> > explained in...
> >
> > http://www.postgresql.org/docs/9.4/static/planner-stats.html
> > http://www.postgresql.org/docs/8.1/static/planner-stats-details.html
> > http://www.postgresql.org/docs/9.2/static/view-pg-stats.html
> >
> > Concerning discrepancies between the actual number of rows and
> > predicted value, I looked at what pg_stats was saying about user_id in
> > table changesets.
> > Here are the values provided to the planner...
> > Average_width=8
> > histogram_bounds: the size of the bins varies between 50 and 150000,
> > which make sense because if I had divided the column's values into
> > groups of approximately equal population, I would have produced bins
> > between 1 and
> > 100000 (if sorted by frequency)
> > n_distinct= 20686 (there is actually 464858 distinct values for
> > user_id in the table)
> > most_common_vals: values make sense (I checked the frequency count of
> > a couple most common users_id)
> > correlation=0.617782 (?)
> > most_common_elems, most_common_elem_freqs and elem_count_histogram
> > were empty
> >
> > At this point, I wonder if the assumptions behind the planner's
> > statistics may produce such problems since the distribution of my data
> > is not uniform but follows a power law (some user_id would return
> > millions of records while others only one).
> > This is the farthest I can go at this point. Maybe someone can provide
> > me with more explanations regarding planner's behavior and ways to go
> > further to make it work properly?
>
> You may also benefit from increasing the statistics targets and running
> ANALYZE again. It certainly looks like some of those stats are pretty far
> off. Raising the statistics target will cause ANALYZE to investigate more
> rows (which takes longer but might produce more accurate results)
>
> I suggest experimenting with the cost settings first, though.
>
> --
> Bill Moran <wmoran@potentialtech.com>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


--
Bill Moran <wmoran@potentialtech.com>


Re: Planner cost adjustments

From
Daniel Begin
Date:
Hi Bill, you are right about not changing two variables at the same time.

I first increased the statistics target for problematic columns. It helps
but did not provide significant improvement. I then changed the
random_page_cost which really improved planner's choices about Seq/Index
scan.

Since I now get expected Seq/Index scan selection on most tables but larger
ones, I am not sure keeping increasing random_page_cost is a good idea. I
wonder if it may trigger index scan on smaller table when a Seq scan would
have been less expensive.

The remaining problem seems related to the statistics of some large tables.
On one hand, I might increase the statistic target for these tables to 500,
or even to 1000 and look at the results (but I have doubts it will help). On
the other hand, I could simply force enable_seqscan to OFF for queries
dealing with them.

Still not sure about the best solution but the problem is definitely
narrower :-)
Daniel


-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bill Moran
Sent: June-11-15 06:51
To: Daniel Begin
Cc: pgsql-general@postgresql.org; 'Tomas Vondra'; 'Melvin Davidson'
Subject: Re: [GENERAL] Planner cost adjustments

On Wed, 10 Jun 2015 17:20:00 -0400
Daniel Begin <jfd553@hotmail.com> wrote:

> Here is a follow-up on adjusting the planner costs calculation
>
> -Statistics target of problematic columns were increased from 100 to 200.
> -Analyse was ran on all concerned tables (actually ran on the whole
> DB) -Random_page_cost was decreased from 4 to 2.

Since you changed two things, there's no way to be sure which change led to
the improvement. You seem to be assuming that changing the stastics target
was what helped. While that _may_ be right, it might also have been the
change to random_page_cost.

> As a result, about 80% of queries are now using what I would consider
> an optimal plan. However, the planner keeps using full table scans for
> large tables...
>
> For instance, I ran a query that should have used an index scan on two
> similar test tables. The planner had selected an index scan for the
> smaller one and a Seq Scan for larger one. Except for their sizes and
> for one field not used in the exercise, the test tables were identical
> and indexed on the same field. The smaller test table had 3.26E+10
> records and the larger one
> 3.78E+11 records.
>
> The query looked like...
> SELECT cs.user_id, cs.changeset_id, nd.id, nd.version FROM
> changesets_selection cs, a_test_table nd WHERE
> nd.changeset_id=cs.changeset_id;
>
> In order to understand why the planner selected the Seq Scan instead
> of an Index Scan on the large table (nodes), I ran an EXPLAIN ANALYSE
> (on warm
> cache) using enable_seqscan set to OFF/ON.
>
> -- QUERY PLAN on larger table (nodes) with SET enable_seqscan=OFF
> -----------------------------;  Nested Loop
> (cost=10000000000.58..10210480648.03 rows=194204592 width=40) (actual
> time=74.088..278008.579 rows=140112 loops=1)
>    ->  Seq Scan on changesets_selection cs
> (cost=10000000000.00..10000000110.44 rows=6644 width=24) (actual
> time=0.015..4.904 rows=6903 loops=1)
>    ->  Index Scan using nodes_changesetidndx on nodes nd
> (cost=0.58..31387.49 rows=29230 width=24) (actual time=12.655..40.259
> rows=20 loops=6903)
>          Index Cond: (changeset_id = cs.changeset_id)  Total runtime:
> 278026.196 ms
> (5 rows)
>
> -Completed after less than 5 minutes processing -I assume that
> cost=10000000000.00..10000000110.44 for the Seq Scan of
> changesets_selection is an artefact of setting enable_seqscan=OFF.
> -From what I see, the evaluation of rows number is still way off
> (1400X) for the large table, even if the statistics target was doubled to
200.
>
> -- QUERY PLAN on larger table (nodes) with SET enable_seqscan=ON
> ------------------------------;  Hash Join
> (cost=156171782.28..185673195.13 rows=194204592 width=40)
>    Hash Cond: (cs.changeset_id = n.changeset_id)
>    ->  Seq Scan on changesets_selection cs  (cost=0.00..110.44
> rows=6644
> width=24)
>    ->  Hash  (cost=84959952.68..84959952.68 rows=3878771968 width=24)
>          ->  Seq Scan on nodes nd  (cost=0.00..84959952.68
> rows=3878771968
> width=24)
> (5 rows)
>
> -Still running after 2:30 hours processing! That is why I did not
> provided the actual time and rows (however, actual rows are provided
> on first query
> plan)
> -Not surprisingly, the evaluation of rows number is way off again for
> the large table - same stats, same results...
>
> It seems there is a problem with my large table statistics, even after
> increase them to 200. Should I increase the statistic target to 500,
> or even to 1000?
> Is there something else I can trigger to get the appropriate plan?
>
> Comments/explanations would be appreciated Daniel
>
>
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Daniel Begin
> Sent: June-03-15 06:32
> To: 'Bill Moran'
> Cc: pgsql-general@postgresql.org; 'Tomas Vondra'; 'Melvin Davidson'
> Subject: Re: [GENERAL] Planner cost adjustments
>
> Thank Bill,
>
> About disks performance, all drives are identical and connected using
> USB3 connections and yes, I can tweak values and restart Postgres
> without any
> hardship!-)
> About seq_page_cost and random_page_cost, I am about to test different
> lower values as you and Thomas propose.
> Raising the statistics target is a good idea. Since most of the data
> have a power law distribution it might affect the statistics.
>
> I will do as suggested and get back to the community for further
> comments. I wished to follow the procedure proposed by PT, just in
> case I eventually had to on step 4 (contact PostgreSQL developers so
> they can improve the planner).
>
> And I am still open to other proposal
> Daniel
>
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bill Moran
> Sent: June-02-15 23:31
> To: Daniel Begin
> Cc: pgsql-general@postgresql.org; 'Tomas Vondra'; 'Melvin Davidson'
> Subject: Re: [GENERAL] Planner cost adjustments
>
> On Tue, 2 Jun 2015 14:01:35 -0400
> Daniel Begin <jfd553@hotmail.com> wrote:
>
> > Here is a follow-up on the step-by-step procedure proposed by PT
> >
> > #1 - setup postgresql planner's cost estimate settings for my hardware.
> > --------------------------------------------------------------------
> > --
> > ------
> > ----------
> > Current parameters values described in section 18.7.2 haven't been
> > changed except for the effective_cache_size seq_page_cost = 1
> > random_page_cost = 4 cpu_tuple_cost = 0.01 cpu_index_tuple_cost =
> > 0.005 cpu_operator_cost = 0.0025 effective_cache_size = 10GB
> >
> > I did a bunch of tests on frequently used queries to see how well
> > they perform - using SET enable_seqscan = ON/OFF.
> > As described earlier in this tread, the planner use Seq Scan on
> > tables even if using an Index Scan is in this case 5 times faster!
> > Here are the logs of EXPLAIN ANALYSE on a query...
>
> In an earlier message you mention that the drives are external to the
> computer. I don't remember details, but external drives can be quite
> dicey as far as performance goes, depending on the specific technlogy (USB
vs.
> external SATA vs. NAS, for example) as well as some pretty wild
> variances between different brands of the same technology.
> See:
> http://www.databasesoup.com/2012/05/random-page-cost-revisited.html
>
> As a result, I'm suspicious that the default values you're using for
> random_page_cost and seq_page_cost are throwing things off becuase
> your disks aren't performing like internally connected disks.
>
> Correct me if I'm wrong on any of the assumptions I'm making here, but
> I got the impression that you can tweak values and restart Postgres
> without any hardship. If that's the case, I'm guessing that raising
> seq_page_cost (possible to 2) will cause Postgres to make better
> decisions about what are good plans. My suggestion is to try some
> different values for those two settings, doing several tests after
> each change, and see if you can find a set of values that starts
> getting you good plans. It appears that the planner thinks that it can
> get better performance by reading from the disk in sequence than by
> picking random pages, and that makes me think that the difference
> between seq_page_cost and random_page_cost is bigger than the actual
behavior of the drives.
>
> More comments below.
>
> > osmdump=# SET enable_seqscan = ON;
> > osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> > FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> > sample.users);
> > --------------------------------------------------------------------
> > --
> > ------
> > -----------------------------------------------------
> >  Hash Semi Join  (cost=21.50..819505.27 rows=726722 width=24)
> > (actual
> > time=1574.914..7444.938 rows=338568 loops=1)
> >    Hash Cond: (changesets.user_id = users.id)
> >    ->  Seq Scan on changesets  (cost=0.00..745407.22 rows=25139722
> > width=24) (actual time=0.002..4724.578 rows=25133929 loops=1)
> >    ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual
> > time=0.165..0.165
> > rows=600 loops=1)
> >          Buckets: 1024  Batches: 1  Memory Usage: 24kB
> >          ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8)
> > (actual
> > time=0.003..0.073 rows=600 loops=1)
> >  Total runtime: 7658.715 ms
> > (7 rows)
> >
> > osmdump=# SET enable_seqscan = OFF;
> > osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> > FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> > sample.users);
> > --------------------------------------------------------------------
> > --
> > ------
> > --------------------------------------------------------------------
> >  Nested Loop  (cost=10000000015.94..10001072613.45 rows=726722
> > width=24) (actual time=0.268..1490.515 rows=338568 loops=1)
> >    ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600
> > width=8) (actual time=0.207..0.531 rows=600 loops=1)
> >          ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
> > rows=600 width=8) (actual time=0.003..0.037 rows=600 loops=1)
> >    ->  Index Scan using changesets_useridndx on changesets
> > (cost=0.44..1775.54 rows=1211 width=24) (actual time=0.038..2.357
> > rows=564
> > loops=600
> >          Index Cond: (user_id = users.id)  Total runtime: 1715.517
> > ms
> > (6 rows)
> >
> >
> > #2 - Run ANALYZE DATABASE and look at performance/planning improvement.
> > --------------------------------------------------------------------
> > --
> > ------
> > ----------
> > I ran ANALYZE DATABASE then rerun the query. It did not produce any
> > significant improvement according to the EXPLAIN ANALYSE below...
> >
> > osmdump=# SET enable_seqscan = ON;
> > osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> > FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> > sample.users);
> > --------------------------------------------------------------------
> > --
> > ------
> > -----------------------------------------------------
> >  Hash Semi Join  (cost=21.50..819511.42 rows=729133 width=24)
> > (actual
> > time=1538.100..7307.743 rows=338568 loops=1)
> >    Hash Cond: (changesets.user_id = users.id)
> >    ->  Seq Scan on changesets  (cost=0.00..745390.84 rows=25138084
> > width=24) (actual time=0.027..4620.691 rows=25133929 loops=1)
> >    ->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual
> > time=0.300..0.300
> > rows=600 loops=1)
> >          Buckets: 1024  Batches: 1  Memory Usage: 24kB
> >          ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8)
> > (actual
> > time=0.022..0.187 rows=600 loops=1)
> >  Total runtime: 7519.254 ms
> > (7 rows)
> >
> > osmdump=# SET enable_seqscan = OFF;
> > osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed
> > FROM changesets WHERE changesets.user_id IN(SELECT id FROM
> > sample.users);
> > --------------------------------------------------------------------
> > --
> > ------
> > --------------------------------------------------------------------
> >  Nested Loop  (cost=10000000015.94..10001090810.49 rows=729133
> > width=24) (actual time=0.268..1466.248 rows=338568 loops=1)
> >    ->  HashAggregate  (cost=10000000015.50..10000000021.50 rows=600
> > width=8) (actual time=0.205..0.530 rows=600 loops=1)
> >          ->  Seq Scan on users  (cost=10000000000.00..10000000014.00
> > rows=600 width=8) (actual time=0.003..0.035 rows=600 loops=1)
> >    ->  Index Scan using changesets_useridndx on changesets
> > (cost=0.44..1805.83 rows=1215 width=24) (actual time=0.036..2.314
> > rows=564
> > loops=600)
> >          Index Cond: (user_id = users.id)  Total runtime: 1677.447
> > ms
> > (6 rows)
> >
> > #3 - Run EXPLAIN ANALYZE and look for discrepancies between the
> > estimated and actual times
> > --------------------------------------------------------------------
> > --
> > ------
> > ----------
> > Looking at above results, there are obvious discrepancies between
> > expected/actual rows and time!
> > I dug a bit by exploring/trying to understand the different concepts
> > explained in...
> >
> > http://www.postgresql.org/docs/9.4/static/planner-stats.html
> > http://www.postgresql.org/docs/8.1/static/planner-stats-details.html
> > http://www.postgresql.org/docs/9.2/static/view-pg-stats.html
> >
> > Concerning discrepancies between the actual number of rows and
> > predicted value, I looked at what pg_stats was saying about user_id
> > in table changesets.
> > Here are the values provided to the planner...
> > Average_width=8
> > histogram_bounds: the size of the bins varies between 50 and 150000,
> > which make sense because if I had divided the column's values into
> > groups of approximately equal population, I would have produced bins
> > between 1 and
> > 100000 (if sorted by frequency)
> > n_distinct= 20686 (there is actually 464858 distinct values for
> > user_id in the table)
> > most_common_vals: values make sense (I checked the frequency count
> > of a couple most common users_id)
> > correlation=0.617782 (?)
> > most_common_elems, most_common_elem_freqs and elem_count_histogram
> > were empty
> >
> > At this point, I wonder if the assumptions behind the planner's
> > statistics may produce such problems since the distribution of my
> > data is not uniform but follows a power law (some user_id would
> > return millions of records while others only one).
> > This is the farthest I can go at this point. Maybe someone can
> > provide me with more explanations regarding planner's behavior and
> > ways to go further to make it work properly?
>
> You may also benefit from increasing the statistics targets and
> running ANALYZE again. It certainly looks like some of those stats are
> pretty far off. Raising the statistics target will cause ANALYZE to
> investigate more rows (which takes longer but might produce more
> accurate results)
>
> I suggest experimenting with the cost settings first, though.
>
> --
> Bill Moran <wmoran@potentialtech.com>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To
> make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To
> make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


--
Bill Moran <wmoran@potentialtech.com>


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Planner cost adjustments

From
Francisco Olarte
Date:
Hi Daniel:

On Thu, Jun 11, 2015 at 2:38 PM, Daniel Begin <jfd553@hotmail.com> wrote:
.....
> The remaining problem seems related to the statistics of some large tables.
> On one hand, I might increase the statistic target for these tables to 500,
> or even to 1000 and look at the results (but I have doubts it will help). On
> the other hand, I could simply force enable_seqscan to OFF for queries
> dealing with them.
>
> Still not sure about the best solution but the problem is definitely
> narrower :-)

One somehow unrelated point. IIRC your problems where related to
queries doing joins with selected sets of indexed IDs on a smaller
table which then have to be looked up on some very large tables. ( I'm
not able to follow too well which is which, users on changesets, I'm a
bit lost ). Given your runtimes are always high ( in the seconds
range, so it seems wire speed / latencies are not too much of an issue
) and that selectivity estimates on huge tables are always problematic
and may be thwarting your plans you may be able to get faster results
splitting your query.

 If I read your plans correctly, that would be selecting your 600
users in one query and then preparing the changeset query for a single
user_id, which should be indexed, and looping it for every user. Given
current machines can easily send-receive 600 queries in a second it
may lead to a simpler solution. This mean you're using the DB as a
somehow inteligent plain old indexed file, but sometimes this is the
simpler approach ( heck, some of my code uses algorithms from the tape
era as they were the faster way I could do it ).

I needed to do this in one of my programs, the optimizer kept
selecting bad plans so I did the first query, held the results in
memory, and then prepared and executed in a loop from the app, my
query was selecting a couple thousand values from submillion table,
and joining with a multimillion one, getting a couple hundreds matches
per original value. Splitting it made the queries on the big table
always run indexed and fast ( and as a side bonus avoided duplicating
the parts of the first record in the wire a hundred times, which was
nice since the short table was wide and I only needed 3 short fields
from the second one, and that made the first query run at wire speed
and the second at disk speed ).

Francisco Olarte.


Re: Planner cost adjustments

From
Daniel Begin
Date:
Here is a last follow-up on adjusting the planner costs calculation

1 - Francisco, you provide me with a clue I was looking for at this point!

I learned enough PLpgsql over the last week to reprogram all my processes (not always elegant but it works). By
processingindividual records using loop, I will be able to move forward with my work and save individual results as
theyare processed.  

Furthermore, it is now easier to predict jobs completion times, since the DB uses similar plans for each record. I did
atest on a stratified sampling of records and I expect the jobs will run for about two weeks (contrarily to months in
previousestimates!-) 

2 - PLpgsql helped me to bypass the problem. As PT suggested, I will try to setup a test case that demonstrates the
problemand post it to the developers' list so they might figure out what to do in such situation. 

Thanks to all,
Daniel

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Francisco Olarte
Sent: June-11-15 10:05
To: Daniel Begin
Cc: Bill Moran; pgsql-general@postgresql.org; Tomas Vondra; Melvin Davidson
Subject: Re: [GENERAL] Planner cost adjustments

Hi Daniel:

On Thu, Jun 11, 2015 at 2:38 PM, Daniel Begin <jfd553@hotmail.com> wrote:
.....
> The remaining problem seems related to the statistics of some large tables.
> On one hand, I might increase the statistic target for these tables to
> 500, or even to 1000 and look at the results (but I have doubts it
> will help). On the other hand, I could simply force enable_seqscan to
> OFF for queries dealing with them.
>
> Still not sure about the best solution but the problem is definitely
> narrower :-)

One somehow unrelated point. IIRC your problems where related to queries doing joins with selected sets of indexed IDs
ona smaller table which then have to be looked up on some very large tables. ( I'm not able to follow too well which is
which,users on changesets, I'm a bit lost ). Given your runtimes are always high ( in the seconds range, so it seems
wirespeed / latencies are not too much of an issue 
) and that selectivity estimates on huge tables are always problematic and may be thwarting your plans you may be able
toget faster results splitting your query. 

 If I read your plans correctly, that would be selecting your 600 users in one query and then preparing the changeset
queryfor a single user_id, which should be indexed, and looping it for every user. Given current machines can easily
send-receive600 queries in a second it may lead to a simpler solution. This mean you're using the DB as a somehow
inteligentplain old indexed file, but sometimes this is the simpler approach ( heck, some of my code uses algorithms
fromthe tape era as they were the faster way I could do it ). 

I needed to do this in one of my programs, the optimizer kept selecting bad plans so I did the first query, held the
resultsin memory, and then prepared and executed in a loop from the app, my query was selecting a couple thousand
valuesfrom submillion table, and joining with a multimillion one, getting a couple hundreds matches per original value.
Splittingit made the queries on the big table always run indexed and fast ( and as a side bonus avoided duplicating the
partsof the first record in the wire a hundred times, which was nice since the short table was wide and I only needed 3
shortfields from the second one, and that made the first query run at wire speed and the second at disk speed ). 

Francisco Olarte.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general