Thread: seqscan for 100 out of 3M rows, index present

seqscan for 100 out of 3M rows, index present

From
Willy-Bas Loos
Date:
Hi,

postgres does a seqscan, even though there is an index present and it should be much more efficient to use it.
I tried to synthetically reproduce it, but it won't make the same choice when i do.
I can reproduce it with a simplified set of the data itself though.

here's the query, and the analyzed plan:
select count(*)
from d2
join g2 on g2.gid=d2.gid
where g2.k=1942

Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual time=481.526..481.526 rows=1 loops=1)
  ->  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual time=317.403..481.513 rows=17 loops=1)
        Hash Cond: (d2.gid = g2.gid)
        ->  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8) (actual time=0.013..231.707 rows=3107454 loops=1)
        ->  Hash  (cost=1290.24..1290.24 rows=494 width=8) (actual time=0.207..0.207 rows=121 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 5kB
              ->  Index Scan using g_blok on g2  (cost=0.00..1290.24 rows=494 width=8) (actual time=0.102..0.156 rows=121 loops=1)
                    Index Cond: (k = 1942)
Total runtime: 481.600 ms

Here's the DDL:
create table g2 (gid bigint primary key, k integer);
create table d2 (id bigint primary key, gid bigint);
--insert into g2 (...)
--insert into d2 (...)
create index g_blok on g2(blok);
create index d_gid on d2(gid);
alter table d2 add constraint d_g_fk foreign key (gid) references g2 (gid);
analyze d2;
analyze g2;


Any advice?

Cheers,

Willy-Bas Loos
--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

Re: seqscan for 100 out of 3M rows, index present

From
Igor Neyman
Date:
Hi,
postgres does a seqscan, even though there is an index present and it should be much more efficient to use it.
I tried to synthetically reproduce it, but it won't make the same choice when i do.
I can reproduce it with a simplified set of the data itself though.
here's the query, and the analyzed plan:
select count(*)
from d2
join g2 on g2.gid=d2.gid
where g2.k=1942

Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual time=481.526..481.526 rows=1 loops=1)
  ->  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual time=317.403..481.513 rows=17 loops=1)
        Hash Cond: (d2.gid = g2.gid)
        ->  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8) (actual time=0.013..231.707 rows=3107454
loops=1)
        ->  Hash  (cost=1290.24..1290.24 rows=494 width=8) (actual time=0.207..0.207 rows=121 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 5kB
              ->  Index Scan using g_blok on g2  (cost=0.00..1290.24 rows=494 width=8) (actual time=0.102..0.156
rows=121loops=1) 
                    Index Cond: (k = 1942)
Total runtime: 481.600 ms
Here's the DDL:
create table g2 (gid bigint primary key, k integer);
create table d2 (id bigint primary key, gid bigint);
--insert into g2 (...)
--insert into d2 (...)
create index g_blok on g2(blok);
create index d_gid on d2(gid);
alter table d2 add constraint d_g_fk foreign key (gid) references g2 (gid);
analyze d2;
analyze g2;

Any advice?

Cheers,
Willy-Bas Loos
--

So, did you try to set:

enable_seqscan = off

and see if different execution plan is more efficient?

Igor Neyman


Re: seqscan for 100 out of 3M rows, index present

From
Willy-Bas Loos
Date:
nope
$ grep ^[^#] /etc/postgresql/9.1/main/postgresql.conf|grep -e ^[^[:space:]]
data_directory = '/var/lib/postgresql/9.1/main'        # use data in another directory
hba_file = '/etc/postgresql/9.1/main/pg_hba.conf'    # host-based authentication file
ident_file = '/etc/postgresql/9.1/main/pg_ident.conf'    # ident configuration file
external_pid_file = '/var/run/postgresql/9.1-main.pid'        # write an extra PID file
port = 5432                # (change requires restart)
max_connections = 100            # (change requires restart)
unix_socket_directory = '/var/run/postgresql'        # (change requires restart)
ssl = true                # (change requires restart)
shared_buffers = 2GB            # min 128kB
work_mem = 100MB                # min 64kB
maintenance_work_mem = 256MB        # min 1MB
synchronous_commit = off        # synchronization level; on, off, or local
checkpoint_segments = 10        # in logfile segments, min 1, 16MB each
log_line_prefix = '%t '            # special values:
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'            # locale for system error message
lc_monetary = 'en_US.UTF-8'            # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'            # locale for number formatting
lc_time = 'en_US.UTF-8'                # locale for time formatting
default_text_search_config = 'pg_catalog.english'



On Wed, Jun 26, 2013 at 7:35 PM, Igor Neyman <ineyman@perceptron.com> wrote:
Hi,
postgres does a seqscan, even though there is an index present and it should be much more efficient to use it.
I tried to synthetically reproduce it, but it won't make the same choice when i do.
I can reproduce it with a simplified set of the data itself though.
here's the query, and the analyzed plan:
select count(*)
from d2
join g2 on g2.gid=d2.gid
where g2.k=1942

Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual time=481.526..481.526 rows=1 loops=1)
  ->  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual time=317.403..481.513 rows=17 loops=1)
        Hash Cond: (d2.gid = g2.gid)
        ->  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8) (actual time=0.013..231.707 rows=3107454 loops=1)
        ->  Hash  (cost=1290.24..1290.24 rows=494 width=8) (actual time=0.207..0.207 rows=121 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 5kB
              ->  Index Scan using g_blok on g2  (cost=0.00..1290.24 rows=494 width=8) (actual time=0.102..0.156 rows=121 loops=1)
                    Index Cond: (k = 1942)
Total runtime: 481.600 ms
Here's the DDL:
create table g2 (gid bigint primary key, k integer);
create table d2 (id bigint primary key, gid bigint);
--insert into g2 (...)
--insert into d2 (...)
create index g_blok on g2(blok);
create index d_gid on d2(gid);
alter table d2 add constraint d_g_fk foreign key (gid) references g2 (gid);
analyze d2;
analyze g2;

Any advice?

Cheers,
Willy-Bas Loos
--

So, did you try to set:

enable_seqscan = off

and see if different execution plan is more efficient?

Igor Neyman



--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

Re: seqscan for 100 out of 3M rows, index present

From
Scott Marlowe
Date:
On Wed, Jun 26, 2013 at 9:45 AM, Willy-Bas Loos <willybas@gmail.com> wrote:
> Hi,
>
> postgres does a seqscan, even though there is an index present and it should
> be much more efficient to use it.
> I tried to synthetically reproduce it, but it won't make the same choice
> when i do.
> I can reproduce it with a simplified set of the data itself though.
>
> here's the query, and the analyzed plan:
> select count(*)
> from d2
> join g2 on g2.gid=d2.gid
> where g2.k=1942
>
> Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual
> time=481.526..481.526 rows=1 loops=1)
>   ->  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual
> time=317.403..481.513 rows=17 loops=1)
>         Hash Cond: (d2.gid = g2.gid)
>         ->  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8)
> (actual time=0.013..231.707 rows=3107454 loops=1)

But this plan isn't retrieving just a few rows from d2, it's
retreiving 3.1 Million rows.


Re: seqscan for 100 out of 3M rows, index present

From
Igor Neyman
Date:

From: Willy-Bas Loos [mailto:willybas@gmail.com]
Sent: Wednesday, June 26, 2013 3:04 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] seqscan for 100 out of 3M rows, index present

nope
$ grep ^[^#] /etc/postgresql/9.1/main/postgresql.conf|grep -e ^[^[:space:]]
data_directory = '/var/lib/postgresql/9.1/main'        # use data in another directory
hba_file = '/etc/postgresql/9.1/main/pg_hba.conf'    # host-based authentication file
ident_file = '/etc/postgresql/9.1/main/pg_ident.conf'    # ident configuration file
external_pid_file = '/var/run/postgresql/9.1-main.pid'        # write an extra PID file
port = 5432                # (change requires restart)
max_connections = 100            # (change requires restart)
unix_socket_directory = '/var/run/postgresql'        # (change requires restart)
ssl = true                # (change requires restart)
shared_buffers = 2GB            # min 128kB
work_mem = 100MB                # min 64kB
maintenance_work_mem = 256MB        # min 1MB
synchronous_commit = off        # synchronization level; on, off, or local
checkpoint_segments = 10        # in logfile segments, min 1, 16MB each
log_line_prefix = '%t '            # special values:
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'            # locale for system error message
lc_monetary = 'en_US.UTF-8'            # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'            # locale for number formatting
lc_time = 'en_US.UTF-8'                # locale for time formatting
default_text_search_config = 'pg_catalog.english'

--

You could change this setting on session level, and prove yourself or query optimizer right (or wrong :)

Igor Neyman

...
...
Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual time=481.526..481.526 rows=1 loops=1)
  ->  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual time=317.403..481.513 rows=17 loops=1)
        Hash Cond: (d2.gid = g2.gid)
        ->  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8) (actual time=0.013..231.707 rows=3107454
loops=1)
        ->  Hash  (cost=1290.24..1290.24 rows=494 width=8) (actual time=0.207..0.207 rows=121 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 5kB
              ->  Index Scan using g_blok on g2  (cost=0.00..1290.24 rows=494 width=8) (actual time=0.102..0.156
rows=121loops=1) 
                    Index Cond: (k = 1942)
Total runtime: 481.600 ms
Here's the DDL:
create table g2 (gid bigint primary key, k integer);
create table d2 (id bigint primary key, gid bigint);
--insert into g2 (...)
--insert into d2 (...)
create index g_blok on g2(blok);
create index d_gid on d2(gid);
alter table d2 add constraint d_g_fk foreign key (gid) references g2 (gid);
analyze d2;
analyze g2;

Any advice?

Cheers,
Willy-Bas Loos
--
So, did you try to set:

enable_seqscan = off

and see if different execution plan is more efficient?

Igor Neyman



--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth


Re: seqscan for 100 out of 3M rows, index present

From
Willy-Bas Loos
Date:
plan with enable_seqscan off:

Aggregate  (cost=253892.48..253892.49 rows=1 width=0) (actual time=208.681..208.681 rows=1 loops=1)
  ->  Nested Loop  (cost=5.87..253889.49 rows=1198 width=0) (actual time=69.403..208.647 rows=17 loops=1)
        ->  Index Scan using geo_blok_idx on geo g  (cost=0.00..1314.43 rows=500 width=8) (actual time=45.776..46.147 rows=121 loops=1)
              Index Cond: (blok = 1942)
        ->  Bitmap Heap Scan on bmp_data d  (cost=5.87..502.91 rows=179 width=8) (actual time=1.340..1.341 rows=0 loops=121)
              Recheck Cond: (geo_id = g.geo_id)
              ->  Bitmap Index Scan on bmp_data_geo_idx  (cost=0.00..5.82 rows=179 width=0) (actual time=1.206..1.206 rows=0 loops=121)
                    Index Cond: (geo_id = g.geo_id)
Total runtime: 208.850 ms




On Wed, Jun 26, 2013 at 9:08 PM, Igor Neyman <ineyman@perceptron.com> wrote:


From: Willy-Bas Loos [mailto:willybas@gmail.com]
Sent: Wednesday, June 26, 2013 3:04 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] seqscan for 100 out of 3M rows, index present

nope
$ grep ^[^#] /etc/postgresql/9.1/main/postgresql.conf|grep -e ^[^[:space:]]
data_directory = '/var/lib/postgresql/9.1/main'        # use data in another directory
hba_file = '/etc/postgresql/9.1/main/pg_hba.conf'    # host-based authentication file
ident_file = '/etc/postgresql/9.1/main/pg_ident.conf'    # ident configuration file
external_pid_file = '/var/run/postgresql/9.1-main.pid'        # write an extra PID file
port = 5432                # (change requires restart)
max_connections = 100            # (change requires restart)
unix_socket_directory = '/var/run/postgresql'        # (change requires restart)
ssl = true                # (change requires restart)
shared_buffers = 2GB            # min 128kB
work_mem = 100MB                # min 64kB
maintenance_work_mem = 256MB        # min 1MB
synchronous_commit = off        # synchronization level; on, off, or local
checkpoint_segments = 10        # in logfile segments, min 1, 16MB each
log_line_prefix = '%t '            # special values:
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'            # locale for system error message
lc_monetary = 'en_US.UTF-8'            # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'            # locale for number formatting
lc_time = 'en_US.UTF-8'                # locale for time formatting
default_text_search_config = 'pg_catalog.english'

--

You could change this setting on session level, and prove yourself or query optimizer right (or wrong :)

Igor Neyman

...
...
Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual time=481.526..481.526 rows=1 loops=1)
  ->  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual time=317.403..481.513 rows=17 loops=1)
        Hash Cond: (d2.gid = g2.gid)
        ->  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8) (actual time=0.013..231.707 rows=3107454 loops=1)
        ->  Hash  (cost=1290.24..1290.24 rows=494 width=8) (actual time=0.207..0.207 rows=121 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 5kB
              ->  Index Scan using g_blok on g2  (cost=0.00..1290.24 rows=494 width=8) (actual time=0.102..0.156 rows=121 loops=1)
                    Index Cond: (k = 1942)
Total runtime: 481.600 ms
Here's the DDL:
create table g2 (gid bigint primary key, k integer);
create table d2 (id bigint primary key, gid bigint);
--insert into g2 (...)
--insert into d2 (...)
create index g_blok on g2(blok);
create index d_gid on d2(gid);
alter table d2 add constraint d_g_fk foreign key (gid) references g2 (gid);
analyze d2;
analyze g2;

Any advice?

Cheers,
Willy-Bas Loos
--
So, did you try to set:

enable_seqscan = off

and see if different execution plan is more efficient?

Igor Neyman



--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth



--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

Re: seqscan for 100 out of 3M rows, index present

From
Igor Neyman
Date:

From: Willy-Bas Loos [mailto:willybas@gmail.com]
Sent: Wednesday, June 26, 2013 3:19 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] seqscan for 100 out of 3M rows, index present

plan with enable_seqscan off:

Aggregate  (cost=253892.48..253892.49 rows=1 width=0) (actual time=208.681..208.681 rows=1 loops=1)
  ->  Nested Loop  (cost=5.87..253889.49 rows=1198 width=0) (actual time=69.403..208.647 rows=17 loops=1)
        ->  Index Scan using geo_blok_idx on geo g  (cost=0.00..1314.43 rows=500 width=8) (actual time=45.776..46.147
rows=121loops=1) 
              Index Cond: (blok = 1942)
        ->  Bitmap Heap Scan on bmp_data d  (cost=5.87..502.91 rows=179 width=8) (actual time=1.340..1.341 rows=0
loops=121)
              Recheck Cond: (geo_id = g.geo_id)
              ->  Bitmap Index Scan on bmp_data_geo_idx  (cost=0.00..5.82 rows=179 width=0) (actual time=1.206..1.206
rows=0loops=121) 
                    Index Cond: (geo_id = g.geo_id)
Total runtime: 208.850 ms


On Wed, Jun 26, 2013 at 9:08 PM, Igor Neyman <ineyman@perceptron.com> wrote:


From: Willy-Bas Loos [mailto:willybas@gmail.com]
Sent: Wednesday, June 26, 2013 3:04 PM
To: Igor Neyman
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] seqscan for 100 out of 3M rows, index present

nope
$ grep ^[^#] /etc/postgresql/9.1/main/postgresql.conf|grep -e ^[^[:space:]]
data_directory = '/var/lib/postgresql/9.1/main'        # use data in another directory
hba_file = '/etc/postgresql/9.1/main/pg_hba.conf'    # host-based authentication file
ident_file = '/etc/postgresql/9.1/main/pg_ident.conf'    # ident configuration file
external_pid_file = '/var/run/postgresql/9.1-main.pid'        # write an extra PID file
port = 5432                # (change requires restart)
max_connections = 100            # (change requires restart)
unix_socket_directory = '/var/run/postgresql'        # (change requires restart)
ssl = true                # (change requires restart)
shared_buffers = 2GB            # min 128kB
work_mem = 100MB                # min 64kB
maintenance_work_mem = 256MB        # min 1MB
synchronous_commit = off        # synchronization level; on, off, or local
checkpoint_segments = 10        # in logfile segments, min 1, 16MB each
log_line_prefix = '%t '            # special values:
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'            # locale for system error message
lc_monetary = 'en_US.UTF-8'            # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'            # locale for number formatting
lc_time = 'en_US.UTF-8'                # locale for time formatting
default_text_search_config = 'pg_catalog.english'
--

How much RAM you have on this machine?
What else is this machine is being used for (besides being db server)?
And, what is your setting for effective_cache_size?  It looks like you didn't change it from default (128MB).
You need to adjust effective_cache_size so somewhat between 60%-75% of RAM, if the database is the main process running
onthis machine. 

Again, effective_cache_size could be set on session level, so you could try it before changing GUC in postgresql.conf.
When trying it, don't forget to change enable_seqscan back to "on" (if it's still "off").

Igor Neyman







Re: seqscan for 100 out of 3M rows, index present

From
Willy-Bas Loos
Date:


On Wed, Jun 26, 2013 at 9:30 PM, Igor Neyman <ineyman@perceptron.com> wrote:

How much RAM you have on this machine?
16 GB
What else is this machine is being used for (besides being db server)?
It's my laptop by now, but i was working on a server before that. The laptop gives me some liberties to play around.
I could reproduce it well on my laptop, so i thought it would do.
 
And, what is your setting for effective_cache_size?  It looks like you didn't change it from default (128MB).
You need to adjust effective_cache_size so somewhat between 60%-75% of RAM, if the database is the main process running on this machine.
correct, it was 128GB, changed it to 12GB, to no avail.





--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

Re: seqscan for 100 out of 3M rows, index present

From
Jeff Janes
Date:
On Wed, Jun 26, 2013 at 12:07 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Wed, Jun 26, 2013 at 9:45 AM, Willy-Bas Loos <willybas@gmail.com> wrote:
>
> Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual
> time=481.526..481.526 rows=1 loops=1)
>   ->  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual
> time=317.403..481.513 rows=17 loops=1)
>         Hash Cond: (d2.gid = g2.gid)
>         ->  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8)
> (actual time=0.013..231.707 rows=3107454 loops=1)

But this plan isn't retrieving just a few rows from d2, it's
retreiving 3.1 Million rows.

But I think that that is the point.  Why is it retrieving 3.1 million, when it only needs 17?
 
Cheers,

Jeff

Re: seqscan for 100 out of 3M rows, index present

From
Willy-Bas Loos
Date:
On Wed, Jun 26, 2013 at 10:31 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Why is it retrieving 3.1 million, when it only needs 17?
 

that's because of the sequential scan, it reads all the data.

cheers,

willy-bas
--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

Re: seqscan for 100 out of 3M rows, index present

From
Victor Yegorov
Date:
2013/6/26 Willy-Bas Loos <willybas@gmail.com>
postgres does a seqscan, even though there is an index present and it should be much more efficient to use it.
I tried to synthetically reproduce it, but it won't make the same choice when i do.
I can reproduce it with a simplified set of the data itself though.

here's the query, and the analyzed plan:
select count(*)
from d2
join g2 on g2.gid=d2.gid
where g2.k=1942

1) Could you show the output of the following queries, please?
select relname,relpages,reltuples::numeric
  from pg_class where oid in ('d2'::regclass, 'g2'::regclass);
select attrelid::regclass, attname,
       CASE WHEN attstattarget<0 THEN current_setting('default_statistics_target')::int4 ELSE attstattarget END
  from pg_attribute
 where attrelid in ('d2'::regclass, 'g2'::regclass) and attname='gid';

2) Will it help running the following?:
ALTER TABLE d2 ALTER gid SET STATISTICS 500;
VACUUM ANALYZE d2;
EXPLAIN (ANALYZE, BUFFERS) ...
SET enable_seqscan TO 'off';
EXPLAIN (ANALYZE, BUFFERS) ...


--
Victor Y. Yegorov

Re: seqscan for 100 out of 3M rows, index present

From
"ktm@rice.edu"
Date:
On Wed, Jun 26, 2013 at 10:36:10PM +0200, Willy-Bas Loos wrote:
> On Wed, Jun 26, 2013 at 10:31 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> >
> > Why is it retrieving 3.1 million, when it only needs 17?
> >
> >
> > that's because of the sequential scan, it reads all the data.
>
> cheers,
>
> willy-bas

Well, the two plans timings were pretty close together. Maybe your
cost model is off. Try adjusting the various cost parameters to
favor random I/O more.

Regards,
Ken


Re: seqscan for 100 out of 3M rows, index present

From
Sergey Konoplev
Date:
On Wed, Jun 26, 2013 at 12:18 PM, Willy-Bas Loos <willybas@gmail.com> wrote:
> plan with enable_seqscan off:
>
> Aggregate  (cost=253892.48..253892.49 rows=1 width=0) (actual
> time=208.681..208.681 rows=1 loops=1)
>   ->  Nested Loop  (cost=5.87..253889.49 rows=1198 width=0) (actual
> time=69.403..208.647 rows=17 loops=1)
>         ->  Index Scan using geo_blok_idx on geo g  (cost=0.00..1314.43
> rows=500 width=8) (actual time=45.776..46.147 rows=121 loops=1)
>               Index Cond: (blok = 1942)
>         ->  Bitmap Heap Scan on bmp_data d  (cost=5.87..502.91 rows=179
> width=8) (actual time=1.340..1.341 rows=0 loops=121)
>               Recheck Cond: (geo_id = g.geo_id)
>               ->  Bitmap Index Scan on bmp_data_geo_idx  (cost=0.00..5.82
> rows=179 width=0) (actual time=1.206..1.206 rows=0 loops=121)
>                     Index Cond: (geo_id = g.geo_id)
> Total runtime: 208.850 ms
>
> On Wed, Jun 26, 2013 at 9:08 PM, Igor Neyman <ineyman@perceptron.com> wrote:
>> Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual
>> time=481.526..481.526 rows=1 loops=1)
>>   ->  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual
>> time=317.403..481.513 rows=17 loops=1)
>>         Hash Cond: (d2.gid = g2.gid)
>>         ->  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8)
>> (actual time=0.013..231.707 rows=3107454 loops=1)
>>         ->  Hash  (cost=1290.24..1290.24 rows=494 width=8) (actual
>> time=0.207..0.207 rows=121 loops=1)
>>               Buckets: 1024  Batches: 1  Memory Usage: 5kB
>>               ->  Index Scan using g_blok on g2  (cost=0.00..1290.24
>> rows=494 width=8) (actual time=0.102..0.156 rows=121 loops=1)
>>                     Index Cond: (k = 1942)
>> Total runtime: 481.600 ms

These are plans of two different queries. Please show the second one
(where d2, g2, etc are) with secscans off.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com


Re: seqscan for 100 out of 3M rows, index present

From
Jeff Janes
Date:
On Wed, Jun 26, 2013 at 12:18 PM, Willy-Bas Loos <willybas@gmail.com> wrote:
plan with enable_seqscan off:

Aggregate  (cost=253892.48..253892.49 rows=1 width=0) (actual time=208.681..208.681 rows=1 loops=1)


The estimated cost of this is ~4x times greater than the estimated cost for the sequential scan.  It should be easy to tweak things to get those to reverse, but will doing so mess up other queries that are currently OK?

 
  ->  Nested Loop  (cost=5.87..253889.49 rows=1198 width=0) (actual time=69.403..208.647 rows=17 loops=1)

The estimated number of rows is off by 70 fold.  Most of this is probably due to cross-column correlations, which you probably can't do much about.
 
        ->  Index Scan using geo_blok_idx on geo g  (cost=0.00..1314.43 rows=500 width=8) (actual time=45.776..46.147 rows=121 loops=1)
              Index Cond: (blok = 1942)

It thinks it will find 500 rows (a suspiciously round number?) but actually finds 121.  That is off by a factor of 4.  Why does it not produce a better estimate on such a simple histogram-based estimation?  Was the table analyzed recently?  Have you tried increasing default_statistics_target?  If you choose values of blok other than 1942, what are the results like?  This estimate feeds into the inner loop estimates multiplicatively, so this is a powerful factor in driving the choice.

 
        ->  Bitmap Heap Scan on bmp_data d  (cost=5.87..502.91 rows=179 width=8) (actual time=1.340..1.341 rows=0 loops=121)
              Recheck Cond: (geo_id = g.geo_id)
              ->  Bitmap Index Scan on bmp_data_geo_idx  (cost=0.00..5.82 rows=179 width=0) (actual time=1.206..1.206 rows=0 loops=121)
                    Index Cond: (geo_id = g.geo_id)
Total runtime: 208.850 ms


So it is only twice as fast as the sequential scan anyway.  Were you expecting even more faster?  Unless it is the dominant query in your database, I would usually not consider a factor of 2 improvement to be worth worrying about, as it is too likely you will make something else worse in the process.

Cheers,

Jeff

Re: seqscan for 100 out of 3M rows, index present

From
Willy-Bas Loos
Date:
On Wed, Jun 26, 2013 at 11:20 PM, Willy-Bas Loos <willybas@gmail.com> wrote:
On Wed, Jun 26, 2013 at 10:55 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:


These are plans of two different queries. Please show the second one
(where d2, g2, etc are) with secscans off.


yes, you're right sry for the confusion.
here's the plan with enable_seqscan=off for the same quer as the OP. (same deal though)

Aggregate  (cost=59704.95..59704.96 rows=1 width=0) (actual time=41.612..41.613 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..59701.99 rows=1184 width=0) (actual time=40.451..41.591 rows=17 loops=1)
        ->  Index Scan using g_blok on g2  (cost=0.00..1290.24 rows=494 width=8) (actual time=40.209..40.472 rows=121 loops=1)

              Index Cond: (k = 1942)
        ->  Index Scan using d_gid on d2  (cost=0.00..117.62 rows=50 width=8) (actual time=0.008..0.008 rows=0 loops=121)
              Index Cond: (gid = g2.gid)
Total runtime: 41.746 ms

Cheers,

WBL

forgot the list
--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

Re: seqscan for 100 out of 3M rows, index present

From
Willy-Bas Loos
Date:
1)
 attrelid | attname | attstattarget
----------+---------+---------------
 g2       | gid     |           100
 d2       | gid     |           100
(2 rows)

setting statistics too 500 works!
I already tried overruling pg_statistic.stadistinct, but that didn't work.
thank you all for your help!!

Cheers,

Willy-Bas


On Wed, Jun 26, 2013 at 10:46 PM, Victor Yegorov <vyegorov@gmail.com> wrote:
2013/6/26 Willy-Bas Loos <willybas@gmail.com>
postgres does a seqscan, even though there is an index present and it should be much more efficient to use it.
I tried to synthetically reproduce it, but it won't make the same choice when i do.
I can reproduce it with a simplified set of the data itself though.

here's the query, and the analyzed plan:
select count(*)
from d2
join g2 on g2.gid=d2.gid
where g2.k=1942

1) Could you show the output of the following queries, please?
select relname,relpages,reltuples::numeric
  from pg_class where oid in ('d2'::regclass, 'g2'::regclass);
select attrelid::regclass, attname,
       CASE WHEN attstattarget<0 THEN current_setting('default_statistics_target')::int4 ELSE attstattarget END
  from pg_attribute
 where attrelid in ('d2'::regclass, 'g2'::regclass) and attname='gid';

2) Will it help running the following?:
ALTER TABLE d2 ALTER gid SET STATISTICS 500;
VACUUM ANALYZE d2;
EXPLAIN (ANALYZE, BUFFERS) ...
SET enable_seqscan TO 'off';
EXPLAIN (ANALYZE, BUFFERS) ...


--
Victor Y. Yegorov



--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth