Thread: Request for help with slow query
Hi, thanks for any help. I've tried to be thorough, but let me know if I should provide more information. A description of what you are trying to achieve and what results you expect: I have a large (3 million row) table called "tape" that represents files, which I join to a small (100 row) table called "filesystem" that represents filesystems. I have a web interface that allows you to sort by a number of fields in the tape table and view the results 100 at a time (using LIMIT and OFFSET). The data only changes hourly and I do a "vacuum analyze" after all changes. The tables are defined as: create table filesystem ( id serial primary key, host varchar(256), storage_path varchar(2048) not null check (storage_path != ''), mounted_on varchar(2048) not null check (mounted_on != ''), constraint unique_fs unique(host, storage_path) ); create table tape ( id serial primary key, volser char(255) not null check (volser != ''), path varchar(2048) not null check (path != ''), scratched boolean not null default FALSE, last_write_date timestamp not null default current_timestamp, last_access_date timestamp not null default current_timestamp, filesystem_id integer references filesystem not null, size bigint not null check (size >= 0), worm_status char, encryption char, job_name char(8), job_step char(8), dsname char(17), recfm char(3), block_size int, lrecl int, constraint filesystem_already_has_that_volser unique(filesystem_id, volser) ); An example query that's running slowly for me is: select tape.volser, tape.path, tape.scratched, tape.size, extract(epoch from tape.last_write_date) as last_write_date, extract(epoch from tape.last_access_date) as last_access_date from tape inner join filesystem on (tape.filesystem_id = filesystem.id) order by last_write_date desc limit 100 offset 100; On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster. Here's the explain output: QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1) -> Sort (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1) Sort Key: date_part('epoch'::text, tape.last_write_date) -> Hash Join (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1) Hash Cond: ("outer".filesystem_id = "inner".id) -> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757loops=1) -> Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1) -> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101loops=1) Total runtime: 66553.643 ms Here's a depesz link with that output: http://explain.depesz.com/s/AUR Things I've tried: 1. I added an index on last_write_date with: create index tape_last_write_date_idx on tape(last_write_date); and there was no improvement in query time. 2. I bumped: effective_cache_size to 1/2 system RAM (1GB) shared_buffers to 1/4 system RAM (512MB) work_mem to 10MB and there was no improvement in query time. 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17 using the same hardware and it was about 5 times faster (nice work, whoever did that!). Unfortunately upgrading is not an option, so this is more of an anecdote. I would think the query could go much faster in either environment with some optimization. The EXACT PostgreSQL version you are running: PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux) How you installed PostgreSQL: Standard SuSE SLES 10-SP3 RPMs: postgresql-devel-8.1.17-0.3 postgresql-pl-8.1.17-0.4 postgresql-libs-8.1.17-0.3 postgresql-8.1.17-0.3 postgresql-server-8.1.17-0.3 postgresql-contrib-8.1.17-0.3 Changes made to the settings in the postgresql.conf file: Only the memory changes mentioned above. Operating system and version: Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux SLES 10-SP3 What program you're using to connect to PostgreSQL: Perl DBI Perl v5.8.8 What version of the ODBC/JDBC/ADO/etc driver you're using, if any: perl-DBD-Pg 1.43 If you're using a connection pool, load balancer or application server, which one you're using and its version: None. Is there anything remotely unusual in the PostgreSQL server logs? No, they're empty. CPU manufacturer and model: Intel Celeron CPU 440 @ 2.00GHz Amount and size of RAM installed: 2GB RAM Storage details (important for performance and corruption questions): Do you use a RAID controller? No. How many hard disks are connected to the system and what types are they? We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM. How are your disks arranged for storage? Postgres lives on the same 100GB ext3 partition as the OS. Thanks, Sean
Did you try to add an index on filesystem_id
From: "Woolcock, Sean" <Sean.Woolcock@emc.com>
To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Monday, October 29, 2012 6:41 PM
Subject: [PERFORM] Request for help with slow query
Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.
A description of what you are trying to achieve and what results you expect:
I have a large (3 million row) table called "tape" that represents files,
which I join to a small (100 row) table called "filesystem" that represents
filesystems. I have a web interface that allows you to sort by a number of
fields in the tape table and view the results 100 at a time (using LIMIT
and OFFSET).
The data only changes hourly and I do a "vacuum analyze" after all changes.
The tables are defined as:
create table filesystem (
id serial primary key,
host varchar(256),
storage_path varchar(2048) not null check (storage_path != ''),
mounted_on varchar(2048) not null check (mounted_on != ''),
constraint unique_fs unique(host, storage_path)
);
create table tape (
id serial primary key,
volser char(255) not null check (volser != ''),
path varchar(2048) not null check (path != ''),
scratched boolean not null default FALSE,
last_write_date timestamp not null default current_timestamp,
last_access_date timestamp not null default current_timestamp,
filesystem_id integer references filesystem not null,
size bigint not null check (size >= 0),
worm_status char,
encryption char,
job_name char(8),
job_step char(8),
dsname char(17),
recfm char(3),
block_size int,
lrecl int,
constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
);
An example query that's running slowly for me is:
select tape.volser,
tape.path,
tape.scratched,
tape.size,
extract(epoch from tape.last_write_date) as last_write_date,
extract(epoch from tape.last_access_date) as last_access_date
from tape
inner join filesystem
on (tape.filesystem_id = filesystem.id)
order by last_write_date desc
limit 100
offset 100;
On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.
Here's the explain output:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
-> Sort (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1)
Sort Key: date_part('epoch'::text, tape.last_write_date)
-> Hash Join (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1)
Hash Cond: ("outer".filesystem_id = "inner".id)
-> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1)
-> Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
-> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1)
Total runtime: 66553.643 ms
Here's a depesz link with that output: http://explain.depesz.com/s/AUR
Things I've tried:
1. I added an index on last_write_date with:
create index tape_last_write_date_idx on tape(last_write_date);
and there was no improvement in query time.
2. I bumped:
effective_cache_size to 1/2 system RAM (1GB)
shared_buffers to 1/4 system RAM (512MB)
work_mem to 10MB
and there was no improvement in query time.
3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
using the same hardware and it was about 5 times faster (nice work,
whoever did that!). Unfortunately upgrading is not an option, so this
is more of an anecdote. I would think the query could go much faster
in either environment with some optimization.
The EXACT PostgreSQL version you are running:
PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)
How you installed PostgreSQL:
Standard SuSE SLES 10-SP3 RPMs:
postgresql-devel-8.1.17-0.3
postgresql-pl-8.1.17-0.4
postgresql-libs-8.1.17-0.3
postgresql-8.1.17-0.3
postgresql-server-8.1.17-0.3
postgresql-contrib-8.1.17-0.3
Changes made to the settings in the postgresql.conf file:
Only the memory changes mentioned above.
Operating system and version:
Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux
SLES 10-SP3
What program you're using to connect to PostgreSQL:
Perl DBI
Perl v5.8.8
What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
perl-DBD-Pg 1.43
If you're using a connection pool, load balancer or application server, which one you're using and its version:
None.
Is there anything remotely unusual in the PostgreSQL server logs?
No, they're empty.
CPU manufacturer and model:
Intel Celeron CPU 440 @ 2.00GHz
Amount and size of RAM installed:
2GB RAM
Storage details (important for performance and corruption questions):
Do you use a RAID controller?
No.
How many hard disks are connected to the system and what types are they?
We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
How are your disks arranged for storage?
Postgres lives on the same 100GB ext3 partition as the OS.
Thanks,
Sean
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
From: "Woolcock, Sean" <Sean.Woolcock@emc.com>
To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Monday, October 29, 2012 6:41 PM
Subject: [PERFORM] Request for help with slow query
Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.
A description of what you are trying to achieve and what results you expect:
I have a large (3 million row) table called "tape" that represents files,
which I join to a small (100 row) table called "filesystem" that represents
filesystems. I have a web interface that allows you to sort by a number of
fields in the tape table and view the results 100 at a time (using LIMIT
and OFFSET).
The data only changes hourly and I do a "vacuum analyze" after all changes.
The tables are defined as:
create table filesystem (
id serial primary key,
host varchar(256),
storage_path varchar(2048) not null check (storage_path != ''),
mounted_on varchar(2048) not null check (mounted_on != ''),
constraint unique_fs unique(host, storage_path)
);
create table tape (
id serial primary key,
volser char(255) not null check (volser != ''),
path varchar(2048) not null check (path != ''),
scratched boolean not null default FALSE,
last_write_date timestamp not null default current_timestamp,
last_access_date timestamp not null default current_timestamp,
filesystem_id integer references filesystem not null,
size bigint not null check (size >= 0),
worm_status char,
encryption char,
job_name char(8),
job_step char(8),
dsname char(17),
recfm char(3),
block_size int,
lrecl int,
constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
);
An example query that's running slowly for me is:
select tape.volser,
tape.path,
tape.scratched,
tape.size,
extract(epoch from tape.last_write_date) as last_write_date,
extract(epoch from tape.last_access_date) as last_access_date
from tape
inner join filesystem
on (tape.filesystem_id = filesystem.id)
order by last_write_date desc
limit 100
offset 100;
On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.
Here's the explain output:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
-> Sort (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1)
Sort Key: date_part('epoch'::text, tape.last_write_date)
-> Hash Join (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1)
Hash Cond: ("outer".filesystem_id = "inner".id)
-> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1)
-> Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
-> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1)
Total runtime: 66553.643 ms
Here's a depesz link with that output: http://explain.depesz.com/s/AUR
Things I've tried:
1. I added an index on last_write_date with:
create index tape_last_write_date_idx on tape(last_write_date);
and there was no improvement in query time.
2. I bumped:
effective_cache_size to 1/2 system RAM (1GB)
shared_buffers to 1/4 system RAM (512MB)
work_mem to 10MB
and there was no improvement in query time.
3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
using the same hardware and it was about 5 times faster (nice work,
whoever did that!). Unfortunately upgrading is not an option, so this
is more of an anecdote. I would think the query could go much faster
in either environment with some optimization.
The EXACT PostgreSQL version you are running:
PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)
How you installed PostgreSQL:
Standard SuSE SLES 10-SP3 RPMs:
postgresql-devel-8.1.17-0.3
postgresql-pl-8.1.17-0.4
postgresql-libs-8.1.17-0.3
postgresql-8.1.17-0.3
postgresql-server-8.1.17-0.3
postgresql-contrib-8.1.17-0.3
Changes made to the settings in the postgresql.conf file:
Only the memory changes mentioned above.
Operating system and version:
Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux
SLES 10-SP3
What program you're using to connect to PostgreSQL:
Perl DBI
Perl v5.8.8
What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
perl-DBD-Pg 1.43
If you're using a connection pool, load balancer or application server, which one you're using and its version:
None.
Is there anything remotely unusual in the PostgreSQL server logs?
No, they're empty.
CPU manufacturer and model:
Intel Celeron CPU 440 @ 2.00GHz
Amount and size of RAM installed:
2GB RAM
Storage details (important for performance and corruption questions):
Do you use a RAID controller?
No.
How many hard disks are connected to the system and what types are they?
We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
How are your disks arranged for storage?
Postgres lives on the same 100GB ext3 partition as the OS.
Thanks,
Sean
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
I thought that an index was implicitly created for foreign keys, but I see that that's not true. I've just created one nowand re-ran the query but it did not change the query plan or run time. Thanks, Sean ________________________________________ From: salah jubeh [s_jubeh@yahoo.com] Sent: Monday, October 29, 2012 3:18 PM To: Woolcock, Sean; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Request for help with slow query Did you try to add an index on filesystem_id ________________________________ From: "Woolcock, Sean" <Sean.Woolcock@emc.com> To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org> Sent: Monday, October 29, 2012 6:41 PM Subject: [PERFORM] Request for help with slow query Hi, thanks for any help. I've tried to be thorough, but let me know if I should provide more information. A description of what you are trying to achieve and what results you expect: I have a large (3 million row) table called "tape" that represents files, which I join to a small (100 row) table called "filesystem" that represents filesystems. I have a web interface that allows you to sort by a number of fields in the tape table and view the results 100 at a time (using LIMIT and OFFSET). The data only changes hourly and I do a "vacuum analyze" after all changes. The tables are defined as: create table filesystem ( id serial primary key, host varchar(256), storage_path varchar(2048) not null check (storage_path != ''), mounted_on varchar(2048) not null check (mounted_on != ''), constraint unique_fs unique(host, storage_path) ); create table tape ( id serial primary key, volser char(255) not null check (volser != ''), path varchar(2048) not null check (path != ''), scratched boolean not null default FALSE, last_write_date timestamp not null default current_timestamp, last_access_date timestamp not null default current_timestamp, filesystem_id integer references filesystem not null, size bigint not null check (size >= 0), worm_status char, encryption char, job_name char(8), job_step char(8), dsname char(17), recfm char(3), block_size int, lrecl int, constraint filesystem_already_has_that_volser unique(filesystem_id, volser) ); An example query that's running slowly for me is: select tape.volser, tape.path, tape.scratched, tape.size, extract(epoch from tape.last_write_date) as last_write_date, extract(epoch from tape.last_access_date) as last_access_date from tape inner join filesystem on (tape.filesystem_id = filesystem.id<http://filesystem.id/>) order by last_write_date desc limit 100 offset 100; On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster. Here's the explain output: QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1) -> Sort (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1) Sort Key: date_part('epoch'::text, tape.last_write_date) -> Hash Join (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1) Hash Cond: ("outer".filesystem_id = "inner".id) -> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757loops=1) -> Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1) -> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101loops=1) Total runtime: 66553.643 ms Here's a depesz link with that output: http://explain.depesz.com/s/AUR Things I've tried: 1. I added an index on last_write_date with: create index tape_last_write_date_idx on tape(last_write_date); and there was no improvement in query time. 2. I bumped: effective_cache_size to 1/2 system RAM (1GB) shared_buffers to 1/4 system RAM (512MB) work_mem to 10MB and there was no improvement in query time. 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17 using the same hardware and it was about 5 times faster (nice work, whoever did that!). Unfortunately upgrading is not an option, so this is more of an anecdote. I would think the query could go much faster in either environment with some optimization. The EXACT PostgreSQL version you are running: PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux) How you installed PostgreSQL: Standard SuSE SLES 10-SP3 RPMs: postgresql-devel-8.1.17-0.3 postgresql-pl-8.1.17-0.4 postgresql-libs-8.1.17-0.3 postgresql-8.1.17-0.3 postgresql-server-8.1.17-0.3 postgresql-contrib-8.1.17-0.3 Changes made to the settings in the postgresql.conf file: Only the memory changes mentioned above. Operating system and version: Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux SLES 10-SP3 What program you're using to connect to PostgreSQL: Perl DBI Perl v5.8.8 What version of the ODBC/JDBC/ADO/etc driver you're using, if any: perl-DBD-Pg 1.43 If you're using a connection pool, load balancer or application server, which one you're using and its version: None. Is there anything remotely unusual in the PostgreSQL server logs? No, they're empty. CPU manufacturer and model: Intel Celeron CPU 440 @ 2.00GHz Amount and size of RAM installed: 2GB RAM Storage details (important for performance and corruption questions): Do you use a RAID controller? No. How many hard disks are connected to the system and what types are they? We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM. How are your disks arranged for storage? Postgres lives on the same 100GB ext3 partition as the OS. Thanks, Sean -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
On 10/29/2012 12:25 PM, Woolcock, Sean wrote: > > I thought that an index was implicitly created for foreign keys, but I see that that's not true. I've just created onenow and re-ran the query but it did not change the query plan or run time. 1. Explain analyze, not explain please Check to see if estimated rows differs wildly from actual. 2. Seriously... 8.1? That is not supported. Please upgrade to a supported version of PostgreSQL. http://www.postgresql.org/support/versioning/ 3. Simple things: A. Have you run analyze on the two tables? B. What is your default_statistics_target? Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579
On 10/29/2012 12:41 PM, Woolcock, Sean wrote: > An example query that's running slowly for me is: > > select tape.volser, > tape.path, > tape.scratched, > tape.size, > extract(epoch from tape.last_write_date) as last_write_date, > extract(epoch from tape.last_access_date) as last_access_date > from tape > inner join filesystem > on (tape.filesystem_id = filesystem.id) > order by last_write_date desc > limit 100 > offset 100; Is this a representative example? From the looks of this, you could entirely drop the join against the filesystems table, because you're not using it in the SELECT or WHERE sections at all. You don't need that join in this example. > -> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) > (actual time=2.824..18175.863 rows=3219757 loops=1) > -> Hash (cost=3.01..3.01 rows=101 width=4) (actual > time=0.204..0.204 rows=101 loops=1) > -> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) > (actual time=0.004..0.116 rows=101 loops=1) > Total runtime: 66553.643 ms I think we can stop looking at this point. Because of the ORDER clause, it has to read the entire tape table because you have no information on last_write_date it can use. Then, it has to read the entire filesystem table because you asked it to do a join, even if you threw away the results. > 1. I added an index on last_write_date with: > and there was no improvement in query time. I'm not sure 8.1 knows what to do with that. But I can guarantee newer versions would do a reverse index scan on this index to find the top 100 rows, even with the offset. You can also do this with newer versions, since it's the most common query you run: create index tape_last_write_date_idx on tape (last_write_date DESC); Which would at least give you forward read order when addressing this index. > 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17 > using the same hardware and it was about 5 times faster (nice work, It would be an order of magnitude faster than that if you add the index also. > Unfortunately upgrading is not an option, so this is more of an > anecdote. I would think the query could go much faster in either > environment with some optimization. You desperately need to reconsider this. PostgreSQL 8.1 is no longer supported, and was last updated in late 2010. Any bug fixes, including known corruption and security bugs, are no longer being backported. Every day you run on an 8.1 install is a risk. The story is similar with 8.2. Even 8.3 is on the way to retirement. You're *six* major versions behind the main release. At the very least, you need to upgrade PostgreSQL from 8.1.17 to 8.1.23. You're still on a version of PG that's almost 7-years old, but at least you'd have the most recent patch level. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
As shaun has indicated, there is no need for join, also as Joshua suggested, it is good to upgrade your server. also add indexes for your predicates and foreign keys and you will get a desired result.
Regards
From: Shaun Thomas <sthomas@optionshouse.com>
To: "Woolcock, Sean" <Sean.Woolcock@emc.com>
Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Monday, October 29, 2012 8:36 PM
Subject: Re: [PERFORM] Request for help with slow query
On 10/29/2012 12:41 PM, Woolcock, Sean wrote:
> An example query that's running slowly for me is:
>
> select tape.volser,
> tape.path,
> tape.scratched,
> tape.size,
> extract(epoch from tape.last_write_date) as last_write_date,
> extract(epoch from tape.last_access_date) as last_access_date
> from tape
> inner join filesystem
> on (tape.filesystem_id = filesystem.id)
> order by last_write_date desc
> limit 100
> offset 100;
Is this a representative example? From the looks of this, you could entirely drop the join against the filesystems table, because you're not using it in the SELECT or WHERE sections at all. You don't need that join in this example.
> -> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312)
> (actual time=2.824..18175.863 rows=3219757 loops=1)
> -> Hash (cost=3.01..3.01 rows=101 width=4) (actual
> time=0.204..0.204 rows=101 loops=1)
> -> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4)
> (actual time=0.004..0.116 rows=101 loops=1)
> Total runtime: 66553.643 ms
I think we can stop looking at this point. Because of the ORDER clause, it has to read the entire tape table because you have no information on last_write_date it can use. Then, it has to read the entire filesystem table because you asked it to do a join, even if you threw away the results.
> 1. I added an index on last_write_date with:
> and there was no improvement in query time.
I'm not sure 8.1 knows what to do with that. But I can guarantee newer versions would do a reverse index scan on this index to find the top 100 rows, even with the offset. You can also do this with newer versions, since it's the most common query you run:
create index tape_last_write_date_idx on tape (last_write_date DESC);
Which would at least give you forward read order when addressing this index.
> 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
> using the same hardware and it was about 5 times faster (nice work,
It would be an order of magnitude faster than that if you add the index also.
> Unfortunately upgrading is not an option, so this is more of an
> anecdote. I would think the query could go much faster in either
> environment with some optimization.
You desperately need to reconsider this. PostgreSQL 8.1 is no longer supported, and was last updated in late 2010. Any bug fixes, including known corruption and security bugs, are no longer being backported. Every day you run on an 8.1 install is a risk. The story is similar with 8.2. Even 8.3 is on the way to retirement. You're *six* major versions behind the main release.
At the very least, you need to upgrade PostgreSQL from 8.1.17 to 8.1.23. You're still on a version of PG that's almost 7-years old, but at least you'd have the most recent patch level.
-- Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Regards
From: Shaun Thomas <sthomas@optionshouse.com>
To: "Woolcock, Sean" <Sean.Woolcock@emc.com>
Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Monday, October 29, 2012 8:36 PM
Subject: Re: [PERFORM] Request for help with slow query
On 10/29/2012 12:41 PM, Woolcock, Sean wrote:
> An example query that's running slowly for me is:
>
> select tape.volser,
> tape.path,
> tape.scratched,
> tape.size,
> extract(epoch from tape.last_write_date) as last_write_date,
> extract(epoch from tape.last_access_date) as last_access_date
> from tape
> inner join filesystem
> on (tape.filesystem_id = filesystem.id)
> order by last_write_date desc
> limit 100
> offset 100;
Is this a representative example? From the looks of this, you could entirely drop the join against the filesystems table, because you're not using it in the SELECT or WHERE sections at all. You don't need that join in this example.
> -> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312)
> (actual time=2.824..18175.863 rows=3219757 loops=1)
> -> Hash (cost=3.01..3.01 rows=101 width=4) (actual
> time=0.204..0.204 rows=101 loops=1)
> -> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4)
> (actual time=0.004..0.116 rows=101 loops=1)
> Total runtime: 66553.643 ms
I think we can stop looking at this point. Because of the ORDER clause, it has to read the entire tape table because you have no information on last_write_date it can use. Then, it has to read the entire filesystem table because you asked it to do a join, even if you threw away the results.
> 1. I added an index on last_write_date with:
> and there was no improvement in query time.
I'm not sure 8.1 knows what to do with that. But I can guarantee newer versions would do a reverse index scan on this index to find the top 100 rows, even with the offset. You can also do this with newer versions, since it's the most common query you run:
create index tape_last_write_date_idx on tape (last_write_date DESC);
Which would at least give you forward read order when addressing this index.
> 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
> using the same hardware and it was about 5 times faster (nice work,
It would be an order of magnitude faster than that if you add the index also.
> Unfortunately upgrading is not an option, so this is more of an
> anecdote. I would think the query could go much faster in either
> environment with some optimization.
You desperately need to reconsider this. PostgreSQL 8.1 is no longer supported, and was last updated in late 2010. Any bug fixes, including known corruption and security bugs, are no longer being backported. Every day you run on an 8.1 install is a risk. The story is similar with 8.2. Even 8.3 is on the way to retirement. You're *six* major versions behind the main release.
At the very least, you need to upgrade PostgreSQL from 8.1.17 to 8.1.23. You're still on a version of PG that's almost 7-years old, but at least you'd have the most recent patch level.
-- Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
I'm going to push for the upgrade and make the other suggested changes. Thanks to all for the help, Sean ________________________________________ From: salah jubeh [s_jubeh@yahoo.com] Sent: Monday, October 29, 2012 3:49 PM To: sthomas@optionshouse.com; Woolcock, Sean Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Request for help with slow query As shaun has indicated, there is no need for join, also as Joshua suggested, it is good to upgrade your server. also addindexes for your predicates and foreign keys and you will get a desired result. Regards ________________________________ From: Shaun Thomas <sthomas@optionshouse.com> To: "Woolcock, Sean" <Sean.Woolcock@emc.com> Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org> Sent: Monday, October 29, 2012 8:36 PM Subject: Re: [PERFORM] Request for help with slow query On 10/29/2012 12:41 PM, Woolcock, Sean wrote: > An example query that's running slowly for me is: > > select tape.volser, > tape.path, > tape.scratched, > tape.size, > extract(epoch from tape.last_write_date) as last_write_date, > extract(epoch from tape.last_access_date) as last_access_date > from tape > inner join filesystem > on (tape.filesystem_id = filesystem.id<http://filesystem.id/>) > order by last_write_date desc > limit 100 > offset 100; Is this a representative example? From the looks of this, you could entirely drop the join against the filesystems table,because you're not using it in the SELECT or WHERE sections at all. You don't need that join in this example. > -> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) > (actual time=2.824..18175.863 rows=3219757 loops=1) > -> Hash (cost=3.01..3.01 rows=101 width=4) (actual > time=0.204..0.204 rows=101 loops=1) > -> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) > (actual time=0.004..0.116 rows=101 loops=1) > Total runtime: 66553.643 ms I think we can stop looking at this point. Because of the ORDER clause, it has to read the entire tape table because youhave no information on last_write_date it can use. Then, it has to read the entire filesystem table because you askedit to do a join, even if you threw away the results. > 1. I added an index on last_write_date with: > and there was no improvement in query time. I'm not sure 8.1 knows what to do with that. But I can guarantee newer versions would do a reverse index scan on this indexto find the top 100 rows, even with the offset. You can also do this with newer versions, since it's the most commonquery you run: create index tape_last_write_date_idx on tape (last_write_date DESC); Which would at least give you forward read order when addressing this index. > 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17 > using the same hardware and it was about 5 times faster (nice work, It would be an order of magnitude faster than that if you add the index also. > Unfortunately upgrading is not an option, so this is more of an > anecdote. I would think the query could go much faster in either > environment with some optimization. You desperately need to reconsider this. PostgreSQL 8.1 is no longer supported, and was last updated in late 2010. Any bugfixes, including known corruption and security bugs, are no longer being backported. Every day you run on an 8.1 installis a risk. The story is similar with 8.2. Even 8.3 is on the way to retirement. You're *six* major versions behindthe main release. At the very least, you need to upgrade PostgreSQL from 8.1.17 to 8.1.23. You're still on a version of PG that's almost 7-yearsold, but at least you'd have the most recent patch level. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com<mailto:sthomas@optionshouse.com> ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Woolcock, Sean wrote: > A description of what you are trying to achieve and what results > you expect: > I have a large (3 million row) table called "tape" that represents > files, which I join to a small (100 row) table called "filesystem" > that represents filesystems. I have a web interface that allows > you to sort by a number of fields in the tape table and view the > results 100 at a time (using LIMIT and OFFSET). Higher OFFSET settings may be slow because it has to read through OFFSET result rows before returning anything. There are other ways this problem can be solved, like saving key values at both ends of the displayed range. > On Postgres 8.1.17 this takes about 60 seconds. I would like it to > be faster. There was a major overall speed improvement in 8.2. And another in 8.3. Etc. 8.1 has been out of support for about two years now. http://www.postgresql.org/support/versioning/ > 1. I added an index on last_write_date with: > > create index tape_last_write_date_idx on tape(last_write_date); > > and there was no improvement in query time. I was going to ask whether you tried an index on tape (last_write_date DESC) -- but that feature was added in 8.3. Never mind. > 2. I bumped: > effective_cache_size to 1/2 system RAM (1GB) > shared_buffers to 1/4 system RAM (512MB) > work_mem to 10MB > and there was no improvement in query time. Not bad adjustments probably, anyway. > 3. I ran the query against the same data in Postgres 9.1.6 rather > than 8.1.17 using the same hardware and it was about 5 times > faster (nice work, whoever did that!). Unfortunately upgrading is > not an option, That is unfortunate. > CPU manufacturer and model: > Intel Celeron CPU 440 @ 2.00GHz > > Amount and size of RAM installed: > 2GB RAM > > Storage details (important for performance and corruption > questions): > > Do you use a RAID controller? > No. > How many hard disks are connected to the system and what types are > they? > We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM. > How are your disks arranged for storage? > Postgres lives on the same 100GB ext3 partition as the OS. That's not exactly blazingly fast hardware. If you value that data at all, I hope you have paid a lot of attention to backups, because that sounds like a machine likely to have a drive over 5 years old, which makes it highly likely to fail hard without a lot of advance warning. You seem to be heavily cached. Have you tried these settings?: seq_page_cost = 0.1 random_page_cost = 0.1 cpu_tuple_cost = 0.03 That might encourage it to use that index you added. Well, if a version of PostgreSQL that old did reverse index scans. If not you might be able to add a functional index and coax it into use. -Kevin
Sean Woolcock wrote: > I have a large (3 million row) table called "tape" that represents files, > which I join to a small (100 row) table called "filesystem" that represents > filesystems. I have a web interface that allows you to sort by a number of > fields in the tape table and view the results 100 at a time (using LIMIT > and OFFSET). > > The data only changes hourly and I do a "vacuum analyze" after all changes. > An example query that's running slowly for me is: > > select tape.volser, > tape.path, > tape.scratched, > tape.size, > extract(epoch from tape.last_write_date) as last_write_date, > extract(epoch from tape.last_access_date) as last_access_date > from tape > inner join filesystem > on (tape.filesystem_id = filesystem.id) > order by last_write_date desc > limit 100 > offset 100; > > On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster. > Here's a depesz link with that output: http://explain.depesz.com/s/AUR I don't see anything obviously wrong there. At least the sequential scan on "tape" is necessary. > Things I've tried: [...] > 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17 > using the same hardware and it was about 5 times faster (nice work, > whoever did that!). Unfortunately upgrading is not an option, so this > is more of an anecdote. I would think the query could go much faster > in either environment with some optimization. Can you post EXPLAIN ANALYZE for the query on 9.1.6? Staying on 8.1 is not a good idea, but I guess you know that. > Storage details (important for performance and corruption questions): > Do you use a RAID controller? > No. > How many hard disks are connected to the system and what types are they? > We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM. > How are your disks arranged for storage? > Postgres lives on the same 100GB ext3 partition as the OS. I'd say that a query like this will always be disk bound. Getting faster storage should help. Yours, Laurenz Albe