Home > mailing lists

Thread: NestedLoops over BitmapScan question

NestedLoops over BitmapScan question

From

Виктор Егоров

Date:

29 September 2012, 02:12:04

Greetings.

I have a small monitoring query on the following tables:

select relname,relpages,reltuples::numeric(12) from pg_class where relname in ('meta_version','account') order by 1;

relname | relpages | reltuples

--------------+----------+-----------

account | 3235 | 197723

meta_version | 710068 | 32561200

(2 rows)

The logical “body” of the query is:

select count(*) from meta_version where account_id in (select account_id from account where customer_id = 8608064);

I know that due to the data distribution (above customer's accounts are used in 45% of the meta_version table) I

cannot expect fast results. But I have another question.

With default default_statistics_target I get the following plan:

http://explain.depesz.com/s/jri

In order to get better estimates, I've increased statistics targets to 200 for account.customer_id and meta_version.account_id.

Now I have the following plan:

http://explain.depesz.com/s/YZJ

Second query takes twice more time.

My questions are:

- why with better statistics planner chooses to do a SeqScan in favor of BitmapIndexScan inside the NestedLoops?

- is it possible to adjust this decision by changing other GUCs, perhaps costs?

- would it be correct to adjust seq_page_cost and random_page_cost based on the IOPS of the underlying disks?

any other metrics should be considered?

I'm running on a:

name | current_setting

----------------------------+---------------------------------------------------------------------------------------------------------------

version | PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit

archive_command | test ! -f $PG_WAL/%f && cp %p $PG_WAL/%f

archive_mode | on

bgwriter_delay | 50ms

bgwriter_lru_maxpages | 200

checkpoint_segments | 25

checkpoint_timeout | 30min

client_encoding | UTF8

effective_cache_size | 8GB

hot_standby | on

lc_collate | en_US.UTF-8

lc_ctype | en_US.UTF-8

listen_addresses | *

log_checkpoints | on

log_connections | on

log_destination | csvlog

log_directory | ../../log/CLUSTER

log_disconnections | on

log_file_mode | 0640

log_filename | pg-%Y%m%d_%H%M%S.log

log_line_prefix | %u:%d:%a:%h:%c:%x:%t>

log_lock_waits | on

log_min_duration_statement | 300ms

log_rotation_age | 1d

log_rotation_size | 0

log_temp_files | 20MB

logging_collector | on

maintenance_work_mem | 512MB

max_connections | 200

max_prepared_transactions | 0

max_stack_depth | 2MB

max_wal_senders | 2

port | 9120

server_encoding | UTF8

shared_buffers | 5GB

silent_mode | on

ssl | on

ssl_renegotiation_limit | 0

tcp_keepalives_idle | 0

temp_buffers | 256MB

TimeZone | US/Eastern

wal_buffers | 512kB

wal_keep_segments | 0

wal_level | hot_standby

wal_sender_delay | 1s

work_mem | 32MB

Regards.

--
Victor Y. Yegorov

Re: NestedLoops over BitmapScan question

From

Виктор Егоров

Date:

02 October 2012, 02:09:49

Well, I've managed to track down the cause of improper plans.

Due to the data distribution n_distinct had been estimated way too low.
I've manually set it to be 195300 instead of 15500 (with stats_target=200):
select tablename,attname,null_frac,avg_width,n_distinct,correlation
  from pg_stats
 where (tablename,attname) IN
 (VALUES ('meta_version','account_id'),('account','customer_id'));
  tablename   |   attname   | null_frac | avg_width | n_distinct | correlation
--------------+-------------+-----------+-----------+------------+-------------
 account      | customer_id |         0 |         4 |         57 |    0.998553
 meta_version | account_id  |         0 |         4 |     195300 |   0.0262315
(2 rows)

Still, optimizer underestimates rows returned by the IndexScan heavily:
http://explain.depesz.com/s/pDw

Is it possible to get correct estimates for the IndexScan on the right side
of the NestedLoops? I assume estimation is done by the B-tree AM and
it is seems to be not affected by the STATISTICS parameter of the
column.


2012/9/29 Виктор Егоров <vyegorov@gmail.com>:
> Now I have the following plan:
> http://explain.depesz.com/s/YZJ
>
> Second query takes twice more time.


--
Victor Y. Yegorov