Thread: Request for help with slow query

Request for help with slow query

From
"Woolcock, Sean"
Date:
Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.

A description of what you are trying to achieve and what results you expect:
    I have a large (3 million row) table called "tape" that represents files,
    which I join to a small (100 row) table called "filesystem" that represents
    filesystems.  I have a web interface that allows you to sort by a number of
    fields in the tape table and view the results 100 at a time (using LIMIT
    and OFFSET).

    The data only changes hourly and I do a "vacuum analyze" after all changes.

    The tables are defined as:

        create table filesystem (
            id           serial primary key,
            host         varchar(256),
            storage_path varchar(2048) not null check (storage_path != ''),
            mounted_on   varchar(2048) not null check (mounted_on != ''),
            constraint   unique_fs unique(host, storage_path)
        );
        create table tape (
            id               serial primary key,
            volser           char(255) not null check (volser != ''),
            path             varchar(2048)   not null check (path != ''),
            scratched        boolean         not null default FALSE,
            last_write_date  timestamp       not null default current_timestamp,
            last_access_date timestamp       not null default current_timestamp,
            filesystem_id    integer references filesystem not null,
            size             bigint not null check (size >= 0),
            worm_status      char,
            encryption       char,
            job_name         char(8),
            job_step         char(8),
            dsname           char(17),
            recfm            char(3),
            block_size       int,
            lrecl            int,
            constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
        );

    An example query that's running slowly for me is:

        select tape.volser,
               tape.path,
               tape.scratched,
               tape.size,
               extract(epoch from tape.last_write_date) as last_write_date,
               extract(epoch from tape.last_access_date) as last_access_date
            from tape
            inner join filesystem
                on (tape.filesystem_id = filesystem.id)
            order by last_write_date desc
            limit 100
            offset 100;

    On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.

    Here's the explain output:
                                                                    QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
        Limit  (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
        ->  Sort  (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200
loops=1)
                Sort Key: date_part('epoch'::text, tape.last_write_date)
                ->  Hash Join  (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757
loops=1)
                    Hash Cond: ("outer".filesystem_id = "inner".id)
                    ->  Seq Scan on tape  (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863
rows=3219757loops=1) 
                    ->  Hash  (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
                            ->  Seq Scan on filesystem  (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116
rows=101loops=1) 
        Total runtime: 66553.643 ms

    Here's a depesz link with that output: http://explain.depesz.com/s/AUR


Things I've tried:

    1. I added an index on last_write_date with:

            create index tape_last_write_date_idx on tape(last_write_date);

       and there was no improvement in query time.

    2. I bumped:
             effective_cache_size to 1/2 system RAM (1GB)
             shared_buffers to 1/4 system RAM (512MB)
             work_mem to 10MB
       and there was no improvement in query time.

    3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
       using the same hardware and it was about 5 times faster (nice work,
       whoever did that!).  Unfortunately upgrading is not an option, so this
       is more of an anecdote. I would think the query could go much faster
       in either environment with some optimization.



The EXACT PostgreSQL version you are running:
    PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)

How you installed PostgreSQL:
    Standard SuSE SLES 10-SP3 RPMs:
        postgresql-devel-8.1.17-0.3
        postgresql-pl-8.1.17-0.4
        postgresql-libs-8.1.17-0.3
        postgresql-8.1.17-0.3
        postgresql-server-8.1.17-0.3
        postgresql-contrib-8.1.17-0.3

Changes made to the settings in the postgresql.conf file:
    Only the memory changes mentioned above.

Operating system and version:
    Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux

    SLES 10-SP3

What program you're using to connect to PostgreSQL:
    Perl DBI
    Perl v5.8.8

What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
    perl-DBD-Pg 1.43

If you're using a connection pool, load balancer or application server, which one you're using and its version:
    None.

Is there anything remotely unusual in the PostgreSQL server logs?
    No, they're empty.


CPU manufacturer and model:
    Intel Celeron CPU 440 @ 2.00GHz

Amount and size of RAM installed:
    2GB RAM

Storage details (important for performance and corruption questions):

    Do you use a RAID controller?
        No.
    How many hard disks are connected to the system and what types are they?
        We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
    How are your disks arranged for storage?
        Postgres lives on the same 100GB ext3 partition as the OS.


Thanks,
Sean


Re: Request for help with slow query

From
salah jubeh
Date:
Did you try to add an index on filesystem_id



From: "Woolcock, Sean" <Sean.Woolcock@emc.com>
To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Monday, October 29, 2012 6:41 PM
Subject: [PERFORM] Request for help with slow query

Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.

A description of what you are trying to achieve and what results you expect:
    I have a large (3 million row) table called "tape" that represents files,
    which I join to a small (100 row) table called "filesystem" that represents
    filesystems.  I have a web interface that allows you to sort by a number of
    fields in the tape table and view the results 100 at a time (using LIMIT
    and OFFSET).

    The data only changes hourly and I do a "vacuum analyze" after all changes.

    The tables are defined as:

        create table filesystem (
            id          serial primary key,
            host        varchar(256),
            storage_path varchar(2048) not null check (storage_path != ''),
            mounted_on  varchar(2048) not null check (mounted_on != ''),
            constraint  unique_fs unique(host, storage_path)
        );
        create table tape (
            id              serial primary key,
            volser          char(255) not null check (volser != ''),
            path            varchar(2048)  not null check (path != ''),
            scratched        boolean        not null default FALSE,
            last_write_date  timestamp      not null default current_timestamp,
            last_access_date timestamp      not null default current_timestamp,
            filesystem_id    integer references filesystem not null,
            size            bigint not null check (size >= 0),
            worm_status      char,
            encryption      char,
            job_name        char(8),
            job_step        char(8),
            dsname          char(17),
            recfm            char(3),
            block_size      int,
            lrecl            int,
            constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
        );

    An example query that's running slowly for me is:

        select tape.volser,
              tape.path,
              tape.scratched,
              tape.size,
              extract(epoch from tape.last_write_date) as last_write_date,
              extract(epoch from tape.last_access_date) as last_access_date
            from tape
            inner join filesystem
                on (tape.filesystem_id = filesystem.id)
            order by last_write_date desc
            limit 100
            offset 100;

    On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.

    Here's the explain output:
                                                                    QUERY PLAN
        ---------------------------------------------------------------------------------------------------------------------------------------
        Limit  (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
        ->  Sort  (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1)
                Sort Key: date_part('epoch'::text, tape.last_write_date)
                ->  Hash Join  (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1)
                    Hash Cond: ("outer".filesystem_id = "inner".id)
                    ->  Seq Scan on tape  (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1)
                    ->  Hash  (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
                            ->  Seq Scan on filesystem  (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1)
        Total runtime: 66553.643 ms

    Here's a depesz link with that output: http://explain.depesz.com/s/AUR


Things I've tried:

    1. I added an index on last_write_date with:

            create index tape_last_write_date_idx on tape(last_write_date);

      and there was no improvement in query time.

    2. I bumped:
            effective_cache_size to 1/2 system RAM (1GB)
            shared_buffers to 1/4 system RAM (512MB)
            work_mem to 10MB
      and there was no improvement in query time.

    3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
      using the same hardware and it was about 5 times faster (nice work,
      whoever did that!).  Unfortunately upgrading is not an option, so this
      is more of an anecdote. I would think the query could go much faster
      in either environment with some optimization.



The EXACT PostgreSQL version you are running:
    PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)

How you installed PostgreSQL:
    Standard SuSE SLES 10-SP3 RPMs:
        postgresql-devel-8.1.17-0.3
        postgresql-pl-8.1.17-0.4
        postgresql-libs-8.1.17-0.3
        postgresql-8.1.17-0.3
        postgresql-server-8.1.17-0.3
        postgresql-contrib-8.1.17-0.3

Changes made to the settings in the postgresql.conf file:
    Only the memory changes mentioned above.

Operating system and version:
    Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux

    SLES 10-SP3

What program you're using to connect to PostgreSQL:
    Perl DBI
    Perl v5.8.8

What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
    perl-DBD-Pg 1.43

If you're using a connection pool, load balancer or application server, which one you're using and its version:
    None.

Is there anything remotely unusual in the PostgreSQL server logs?
    No, they're empty.


CPU manufacturer and model:
    Intel Celeron CPU 440 @ 2.00GHz

Amount and size of RAM installed:
    2GB RAM

Storage details (important for performance and corruption questions):

    Do you use a RAID controller?
        No.
    How many hard disks are connected to the system and what types are they?
        We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
    How are your disks arranged for storage?
        Postgres lives on the same 100GB ext3 partition as the OS.


Thanks,
Sean


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Request for help with slow query

From
"Woolcock, Sean"
Date:
I thought that an index was implicitly created for foreign keys, but I see that that's not true. I've just created one
nowand re-ran the query but it did not change the query plan or run time. 

Thanks,
Sean

________________________________________
From: salah jubeh [s_jubeh@yahoo.com]
Sent: Monday, October 29, 2012 3:18 PM
To: Woolcock, Sean; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Request for help with slow query

Did you try to add an index on filesystem_id


________________________________
From: "Woolcock, Sean" <Sean.Woolcock@emc.com>
To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Monday, October 29, 2012 6:41 PM
Subject: [PERFORM] Request for help with slow query

Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.

A description of what you are trying to achieve and what results you expect:
    I have a large (3 million row) table called "tape" that represents files,
    which I join to a small (100 row) table called "filesystem" that represents
    filesystems.  I have a web interface that allows you to sort by a number of
    fields in the tape table and view the results 100 at a time (using LIMIT
    and OFFSET).

    The data only changes hourly and I do a "vacuum analyze" after all changes.

    The tables are defined as:

        create table filesystem (
            id          serial primary key,
            host        varchar(256),
            storage_path varchar(2048) not null check (storage_path != ''),
            mounted_on  varchar(2048) not null check (mounted_on != ''),
            constraint  unique_fs unique(host, storage_path)
        );
        create table tape (
            id              serial primary key,
            volser          char(255) not null check (volser != ''),
            path            varchar(2048)  not null check (path != ''),
            scratched        boolean        not null default FALSE,
            last_write_date  timestamp      not null default current_timestamp,
            last_access_date timestamp      not null default current_timestamp,
            filesystem_id    integer references filesystem not null,
            size            bigint not null check (size >= 0),
            worm_status      char,
            encryption      char,
            job_name        char(8),
            job_step        char(8),
            dsname          char(17),
            recfm            char(3),
            block_size      int,
            lrecl            int,
            constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
        );

    An example query that's running slowly for me is:

        select tape.volser,
              tape.path,
              tape.scratched,
              tape.size,
              extract(epoch from tape.last_write_date) as last_write_date,
              extract(epoch from tape.last_access_date) as last_access_date
            from tape
            inner join filesystem
                on (tape.filesystem_id = filesystem.id<http://filesystem.id/>)
            order by last_write_date desc
            limit 100
            offset 100;

    On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.

    Here's the explain output:
                                                                    QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
        Limit  (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
        ->  Sort  (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200
loops=1)
                Sort Key: date_part('epoch'::text, tape.last_write_date)
                ->  Hash Join  (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757
loops=1)
                    Hash Cond: ("outer".filesystem_id = "inner".id)
                    ->  Seq Scan on tape  (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863
rows=3219757loops=1) 
                    ->  Hash  (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
                            ->  Seq Scan on filesystem  (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116
rows=101loops=1) 
        Total runtime: 66553.643 ms

    Here's a depesz link with that output: http://explain.depesz.com/s/AUR


Things I've tried:

    1. I added an index on last_write_date with:

            create index tape_last_write_date_idx on tape(last_write_date);

      and there was no improvement in query time.

    2. I bumped:
            effective_cache_size to 1/2 system RAM (1GB)
            shared_buffers to 1/4 system RAM (512MB)
            work_mem to 10MB
      and there was no improvement in query time.

    3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
      using the same hardware and it was about 5 times faster (nice work,
      whoever did that!).  Unfortunately upgrading is not an option, so this
      is more of an anecdote. I would think the query could go much faster
      in either environment with some optimization.



The EXACT PostgreSQL version you are running:
    PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)

How you installed PostgreSQL:
    Standard SuSE SLES 10-SP3 RPMs:
        postgresql-devel-8.1.17-0.3
        postgresql-pl-8.1.17-0.4
        postgresql-libs-8.1.17-0.3
        postgresql-8.1.17-0.3
        postgresql-server-8.1.17-0.3
        postgresql-contrib-8.1.17-0.3

Changes made to the settings in the postgresql.conf file:
    Only the memory changes mentioned above.

Operating system and version:
    Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux

    SLES 10-SP3

What program you're using to connect to PostgreSQL:
    Perl DBI
    Perl v5.8.8

What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
    perl-DBD-Pg 1.43

If you're using a connection pool, load balancer or application server, which one you're using and its version:
    None.

Is there anything remotely unusual in the PostgreSQL server logs?
    No, they're empty.


CPU manufacturer and model:
    Intel Celeron CPU 440 @ 2.00GHz

Amount and size of RAM installed:
    2GB RAM

Storage details (important for performance and corruption questions):

    Do you use a RAID controller?
        No.
    How many hard disks are connected to the system and what types are they?
        We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
    How are your disks arranged for storage?
        Postgres lives on the same 100GB ext3 partition as the OS.


Thanks,
Sean


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




Re: Request for help with slow query

From
"Joshua D. Drake"
Date:
On 10/29/2012 12:25 PM, Woolcock, Sean wrote:
>
> I thought that an index was implicitly created for foreign keys, but I see that that's not true. I've just created
onenow and re-ran the query but it did not change the query plan or run time. 

1. Explain analyze, not explain please

Check to see if estimated rows differs wildly from actual.

2. Seriously... 8.1? That is not supported. Please upgrade to a
supported version of PostgreSQL.

http://www.postgresql.org/support/versioning/

3. Simple things:

  A. Have you run analyze on the two tables?
  B. What is your default_statistics_target?

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579


Re: Request for help with slow query

From
Shaun Thomas
Date:
On 10/29/2012 12:41 PM, Woolcock, Sean wrote:

>      An example query that's running slowly for me is:
>
>          select tape.volser,
>                 tape.path,
>                 tape.scratched,
>                 tape.size,
>                 extract(epoch from tape.last_write_date) as last_write_date,
>                 extract(epoch from tape.last_access_date) as last_access_date
>              from tape
>              inner join filesystem
>                  on (tape.filesystem_id = filesystem.id)
>              order by last_write_date desc
>              limit 100
>              offset 100;

Is this a representative example? From the looks of this, you could
entirely drop the join against the filesystems table, because you're not
using it in the SELECT or WHERE sections at all. You don't need that
join in this example.

> ->  Seq Scan on tape  (cost=0.00..178550.57 rows=3219757 width=312)
> (actual time=2.824..18175.863 rows=3219757 loops=1)
> ->  Hash  (cost=3.01..3.01 rows=101 width=4) (actual
> time=0.204..0.204 rows=101 loops=1)
> ->  Seq Scan on filesystem  (cost=0.00..3.01 rows=101 width=4)
> (actual time=0.004..0.116 rows=101 loops=1)
>          Total runtime: 66553.643 ms

I think we can stop looking at this point. Because of the ORDER clause,
it has to read the entire tape table because you have no information on
last_write_date it can use. Then, it has to read the entire filesystem
table because you asked it to do a join, even if you threw away the results.

>      1. I added an index on last_write_date with:
>         and there was no improvement in query time.

I'm not sure 8.1 knows what to do with that. But I can guarantee newer
versions would do a reverse index scan on this index to find the top 100
rows, even with the offset. You can also do this with newer versions,
since it's the most common query you run:

create index tape_last_write_date_idx on tape (last_write_date DESC);

Which would at least give you forward read order when addressing this index.

> 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
>    using the same hardware and it was about 5 times faster (nice work,

It would be an order of magnitude faster than that if you add the index
also.

> Unfortunately upgrading is not an option, so this is more of an
> anecdote. I would think the query could go much faster in either
> environment with some optimization.

You desperately need to reconsider this. PostgreSQL 8.1 is no longer
supported, and was last updated in late 2010. Any bug fixes, including
known corruption and security bugs, are no longer being backported.
Every day you run on an 8.1 install is a risk. The story is similar with
8.2. Even 8.3 is on the way to retirement. You're *six* major versions
behind the main release.

At the very least, you need to upgrade PostgreSQL from 8.1.17 to 8.1.23.
You're still on a version of PG that's almost 7-years old, but at least
you'd have the most recent patch level.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Request for help with slow query

From
salah jubeh
Date:
As shaun has indicated, there is no need for join, also as Joshua suggested, it is  good to upgrade your server.  also add indexes for your predicates and foreign keys and you will get a desired result.

Regards



From: Shaun Thomas <sthomas@optionshouse.com>
To: "Woolcock, Sean" <Sean.Woolcock@emc.com>
Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Monday, October 29, 2012 8:36 PM
Subject: Re: [PERFORM] Request for help with slow query

On 10/29/2012 12:41 PM, Woolcock, Sean wrote:

>      An example query that's running slowly for me is:
>
>          select tape.volser,
>                tape.path,
>                tape.scratched,
>                tape.size,
>                extract(epoch from tape.last_write_date) as last_write_date,
>                extract(epoch from tape.last_access_date) as last_access_date
>              from tape
>              inner join filesystem
>                  on (tape.filesystem_id = filesystem.id)
>              order by last_write_date desc
>              limit 100
>              offset 100;

Is this a representative example? From the looks of this, you could entirely drop the join against the filesystems table, because you're not using it in the SELECT or WHERE sections at all. You don't need that join in this example.

> ->  Seq Scan on tape  (cost=0.00..178550.57 rows=3219757 width=312)
> (actual time=2.824..18175.863 rows=3219757 loops=1)
> ->  Hash  (cost=3.01..3.01 rows=101 width=4) (actual
> time=0.204..0.204 rows=101 loops=1)
> ->  Seq Scan on filesystem  (cost=0.00..3.01 rows=101 width=4)
> (actual time=0.004..0.116 rows=101 loops=1)
>          Total runtime: 66553.643 ms

I think we can stop looking at this point. Because of the ORDER clause, it has to read the entire tape table because you have no information on last_write_date it can use. Then, it has to read the entire filesystem table because you asked it to do a join, even if you threw away the results.

>      1. I added an index on last_write_date with:
>        and there was no improvement in query time.

I'm not sure 8.1 knows what to do with that. But I can guarantee newer versions would do a reverse index scan on this index to find the top 100 rows, even with the offset. You can also do this with newer versions, since it's the most common query you run:

create index tape_last_write_date_idx on tape (last_write_date DESC);

Which would at least give you forward read order when addressing this index.

> 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
>    using the same hardware and it was about 5 times faster (nice work,

It would be an order of magnitude faster than that if you add the index also.

> Unfortunately upgrading is not an option, so this is more of an
> anecdote. I would think the query could go much faster in either
> environment with some optimization.

You desperately need to reconsider this. PostgreSQL 8.1 is no longer supported, and was last updated in late 2010. Any bug fixes, including known corruption and security bugs, are no longer being backported. Every day you run on an 8.1 install is a risk. The story is similar with 8.2. Even 8.3 is on the way to retirement. You're *six* major versions behind the main release.

At the very least, you need to upgrade PostgreSQL from 8.1.17 to 8.1.23. You're still on a version of PG that's almost 7-years old, but at least you'd have the most recent patch level.


-- Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Request for help with slow query

From
"Woolcock, Sean"
Date:
I'm going to push for the upgrade and make the other suggested changes.

Thanks to all for the help,
Sean

________________________________________
From: salah jubeh [s_jubeh@yahoo.com]
Sent: Monday, October 29, 2012 3:49 PM
To: sthomas@optionshouse.com; Woolcock, Sean
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Request for help with slow query

As shaun has indicated, there is no need for join, also as Joshua suggested, it is  good to upgrade your server.  also
addindexes for your predicates and foreign keys and you will get a desired result. 

Regards


________________________________
From: Shaun Thomas <sthomas@optionshouse.com>
To: "Woolcock, Sean" <Sean.Woolcock@emc.com>
Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Monday, October 29, 2012 8:36 PM
Subject: Re: [PERFORM] Request for help with slow query

On 10/29/2012 12:41 PM, Woolcock, Sean wrote:

>      An example query that's running slowly for me is:
>
>          select tape.volser,
>                tape.path,
>                tape.scratched,
>                tape.size,
>                extract(epoch from tape.last_write_date) as last_write_date,
>                extract(epoch from tape.last_access_date) as last_access_date
>              from tape
>              inner join filesystem
>                  on (tape.filesystem_id = filesystem.id<http://filesystem.id/>)
>              order by last_write_date desc
>              limit 100
>              offset 100;

Is this a representative example? From the looks of this, you could entirely drop the join against the filesystems
table,because you're not using it in the SELECT or WHERE sections at all. You don't need that join in this example. 

> ->  Seq Scan on tape  (cost=0.00..178550.57 rows=3219757 width=312)
> (actual time=2.824..18175.863 rows=3219757 loops=1)
> ->  Hash  (cost=3.01..3.01 rows=101 width=4) (actual
> time=0.204..0.204 rows=101 loops=1)
> ->  Seq Scan on filesystem  (cost=0.00..3.01 rows=101 width=4)
> (actual time=0.004..0.116 rows=101 loops=1)
>          Total runtime: 66553.643 ms

I think we can stop looking at this point. Because of the ORDER clause, it has to read the entire tape table because
youhave no information on last_write_date it can use. Then, it has to read the entire filesystem table because you
askedit to do a join, even if you threw away the results. 

>      1. I added an index on last_write_date with:
>        and there was no improvement in query time.

I'm not sure 8.1 knows what to do with that. But I can guarantee newer versions would do a reverse index scan on this
indexto find the top 100 rows, even with the offset. You can also do this with newer versions, since it's the most
commonquery you run: 

create index tape_last_write_date_idx on tape (last_write_date DESC);

Which would at least give you forward read order when addressing this index.

> 3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
>    using the same hardware and it was about 5 times faster (nice work,

It would be an order of magnitude faster than that if you add the index also.

> Unfortunately upgrading is not an option, so this is more of an
> anecdote. I would think the query could go much faster in either
> environment with some optimization.

You desperately need to reconsider this. PostgreSQL 8.1 is no longer supported, and was last updated in late 2010. Any
bugfixes, including known corruption and security bugs, are no longer being backported. Every day you run on an 8.1
installis a risk. The story is similar with 8.2. Even 8.3 is on the way to retirement. You're *six* major versions
behindthe main release. 

At the very least, you need to upgrade PostgreSQL from 8.1.17 to 8.1.23. You're still on a version of PG that's almost
7-yearsold, but at least you'd have the most recent patch level. 


-- Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com<mailto:sthomas@optionshouse.com>

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




Re: Request for help with slow query

From
"Kevin Grittner"
Date:
Woolcock, Sean wrote:

> A description of what you are trying to achieve and what results
> you expect:
>  I have a large (3 million row) table called "tape" that represents
>  files, which I join to a small (100 row) table called "filesystem"
>  that represents filesystems. I have a web interface that allows
>  you to sort by a number of fields in the tape table and view the
>  results 100 at a time (using LIMIT and OFFSET).

Higher OFFSET settings may be slow because it has to read through
OFFSET result rows before returning anything. There are other ways
this problem can be solved, like saving key values at both ends of
the displayed range.

>  On Postgres 8.1.17 this takes about 60 seconds. I would like it to
>  be faster.

There was a major overall speed improvement in 8.2. And another in
8.3. Etc. 8.1 has been out of support for about two years now.

http://www.postgresql.org/support/versioning/

>  1. I added an index on last_write_date with:
>
>  create index tape_last_write_date_idx on tape(last_write_date);
>
>  and there was no improvement in query time.

I was going to ask whether you tried an index on tape
(last_write_date DESC) -- but that feature was added in 8.3.  Never
mind.

>  2. I bumped:
>  effective_cache_size to 1/2 system RAM (1GB)
>  shared_buffers to 1/4 system RAM (512MB)
>  work_mem to 10MB
>  and there was no improvement in query time.

Not bad adjustments probably, anyway.

>  3. I ran the query against the same data in Postgres 9.1.6 rather
>  than 8.1.17 using the same hardware and it was about 5 times
>  faster (nice work, whoever did that!). Unfortunately upgrading is
>  not an option,

That is unfortunate.

> CPU manufacturer and model:
>  Intel Celeron CPU 440 @ 2.00GHz
>
> Amount and size of RAM installed:
>  2GB RAM
>
> Storage details (important for performance and corruption
> questions):
>
>  Do you use a RAID controller?
>  No.
>  How many hard disks are connected to the system and what types are
>  they?
>  We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
>  How are your disks arranged for storage?
>  Postgres lives on the same 100GB ext3 partition as the OS.

That's not exactly blazingly fast hardware. If you value that data at
all, I hope you have paid a lot of attention to backups, because that
sounds like a machine likely to have a drive over 5 years old, which
makes it highly likely to fail hard without a lot of advance warning.

You seem to be heavily cached. Have you tried these settings?:

seq_page_cost = 0.1
random_page_cost = 0.1
cpu_tuple_cost = 0.03

That might encourage it to use that index you added. Well, if a
version of PostgreSQL that old did reverse index scans. If not you
might be able to add a functional index and coax it into use.

-Kevin


Re: Request for help with slow query

From
"Albe Laurenz"
Date:
Sean Woolcock wrote:
>     I have a large (3 million row) table called "tape" that represents
files,
>     which I join to a small (100 row) table called "filesystem" that
represents
>     filesystems.  I have a web interface that allows you to sort by a
number of
>     fields in the tape table and view the results 100 at a time (using
LIMIT
>     and OFFSET).
>
>     The data only changes hourly and I do a "vacuum analyze" after all
changes.

>     An example query that's running slowly for me is:
>
>         select tape.volser,
>                tape.path,
>                tape.scratched,
>                tape.size,
>                extract(epoch from tape.last_write_date) as
last_write_date,
>                extract(epoch from tape.last_access_date) as
last_access_date
>             from tape
>             inner join filesystem
>                 on (tape.filesystem_id = filesystem.id)
>             order by last_write_date desc
>             limit 100
>             offset 100;
>
>     On Postgres 8.1.17 this takes about 60 seconds. I would like it to
be faster.

>     Here's a depesz link with that output:
http://explain.depesz.com/s/AUR

I don't see anything obviously wrong there.

At least the sequential scan on "tape" is necessary.

> Things I've tried:
[...]
>     3. I ran the query against the same data in Postgres 9.1.6 rather
than 8.1.17
>        using the same hardware and it was about 5 times faster (nice
work,
>        whoever did that!).  Unfortunately upgrading is not an option,
so this
>        is more of an anecdote. I would think the query could go much
faster
>        in either environment with some optimization.

Can you post EXPLAIN ANALYZE for the query on 9.1.6?

Staying on 8.1 is not a good idea, but I guess you know that.

> Storage details (important for performance and corruption questions):
>     Do you use a RAID controller?
>         No.
>     How many hard disks are connected to the system and what types are
they?
>         We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
>     How are your disks arranged for storage?
>         Postgres lives on the same 100GB ext3 partition as the OS.

I'd say that a query like this will always be disk bound.
Getting faster storage should help.

Yours,
Laurenz Albe