Re: Query performance discontinuity - Mailing list pgsql-performance

From Mike Nielsen
Subject Re: Query performance discontinuity
Date
Msg-id 1037074231.29703.234.camel@CPE-144-132-182-167
Whole thread Raw
In response to Query performance discontinuity  (Mike Nielsen <miken@bigpond.net.au>)
Responses Re: Query performance discontinuity
Re: Query performance discontinuity
List pgsql-performance
Stephan, Tom & Josh:

Here's the result I get from changing the <> to a > in the tstart
condition (no improvement):

pganalysis=> explain analyze select * from ps2 where tstart> '2000-1-1
pganalysis'> 00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
pganalysis-> tstart,time_stamp limit 59628;
NOTICE:  QUERY PLAN:

Limit  (cost=160313.27..160313.27 rows=59628 width=179) (actual
time=42269.87..42709.82 rows=59628 loops=1)
  ->  Sort  (cost=160313.27..160313.27 rows=327895 width=179) (actual
time=42269.86..42643.74 rows=59629 loops=1)
        ->  Seq Scan on ps2  (cost=0.00..13783.40 rows=327895 width=179)
(actual time=0.15..15211.49 rows=327960 loops=1)
Total runtime: 43232.53 msec

EXPLAIN

Setting enable_seqscan=off produced a good result:


pganalysis=> explain analyze select * from ps2 where tstart> '2000-1-1
pganalysis'> 00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
pganalysis-> tstart,time_stamp limit 59628;
NOTICE:  QUERY PLAN:

Limit  (cost=0.00..160322.87 rows=59628 width=179) (actual
time=40.39..2222.06 rows=59628 loops=1)
  ->  Index Scan using ps2_idx on ps2  (cost=0.00..881616.45 rows=327895
width=179) (actual time=40.38..2151.38 rows=59629 loops=1)
Total runtime: 2262.23 msec

EXPLAIN

The ps2 table is in time_stamp order, but the tstarts aren't quite as
good -- they're mostly there, but they're computed by subtracting a
(stochastic) value from time_stamp.

I haven't tinkered with sort_mem yet, but will once I've got this little
project wrapped up (1 or 2 days to go!).

This, by the way, is pg log data that I've parsed up so I can do some
performance number-crunching for a client of mine.  Is there a better
way to get comprehensive, per-query read, write and cache hit data?  As
you can imagine, with millions of records, my client-side perl script
for the parsing is slow.  I've tried my hand at writing an aggregate
function on the server side using lex and yacc, but decided that, at
least for this project, I'd rather let the machine do the head-banging
-- I can tokenize the raw syslog data (loaded into another pg table)
into an intermediate result in an aggregate function, and my parser
works on the token strings, but the perl script finished before I could
go any further...

In the off chance, however, that I get invited to more of this kind of
work, it would be really handy to be able to get the data without all
this hassle!  Any clues would be gratefully received.

Regards,

Mike


On Tue, 2002-11-12 at 04:44, Mike Nielsen wrote:
    pgsql-performers,

    Just out of curiosity, anybody with any ideas on what happens to this
    query when the limit is 59626?  It's as though 59626 = infinity?

    pganalysis=> explain analyze select * from ps2 where tstart<> '2000-1-1
    00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
    tstart,time_stamp limit 59624;
    NOTICE:  QUERY PLAN:

    Limit  (cost=0.00..160328.37 rows=59624 width=179) (actual
    time=14.52..2225.16 rows=59624 loops=1)
      ->  Index Scan using ps2_idx on ps2  (cost=0.00..881812.85 rows=327935
    width=179) (actual time=14.51..2154.59 rows=59625 loops=1)
    Total runtime: 2265.93 msec

    EXPLAIN
    pganalysis=> explain analyze select * from ps2 where tstart<> '2000-1-1
    00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
    tstart,time_stamp limit 59625;
    NOTICE:  QUERY PLAN:

    Limit  (cost=0.00..160331.06 rows=59625 width=179) (actual
    time=0.45..2212.19 rows=59625 loops=1)
      ->  Index Scan using ps2_idx on ps2  (cost=0.00..881812.85 rows=327935
    width=179) (actual time=0.45..2140.87 rows=59626 loops=1)
    Total runtime: 2254.50 msec

    EXPLAIN
    pganalysis=> explain analyze select * from ps2 where tstart<> '2000-1-1
    00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
    tstart,time_stamp limit 59626;
    NOTICE:  QUERY PLAN:

    Limit  (cost=160332.32..160332.32 rows=59626 width=179) (actual
    time=37359.41..37535.85 rows=59626 loops=1)
      ->  Sort  (cost=160332.32..160332.32 rows=327935 width=179) (actual
    time=37359.40..37471.07 rows=59627 loops=1)
            ->  Seq Scan on ps2  (cost=0.00..13783.52 rows=327935 width=179)
    (actual time=0.26..12433.00 rows=327960 loops=1)
    Total runtime: 38477.39 msec

    EXPLAIN
    pganalysis=> explain analyze select * from ps2 where tstart<> '2000-1-1
    00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
    tstart,time_stamp limit 59627;
    NOTICE:  QUERY PLAN:

    Limit  (cost=160332.32..160332.32 rows=59627 width=179) (actual
    time=38084.85..38260.88 rows=59627 loops=1)
      ->  Sort  (cost=160332.32..160332.32 rows=327935 width=179) (actual
    time=38084.83..38194.63 rows=59628 loops=1)
            ->  Seq Scan on ps2  (cost=0.00..13783.52 rows=327935 width=179)
    (actual time=0.15..12174.74 rows=327960 loops=1)
    Total runtime: 38611.83 msec

    EXPLAIN

    pganalysis=> \d ps2
                        Table "ps2"
       Column    |           Type           | Modifiers
    -------------+--------------------------+-----------
     host        | character varying(255)   |
     pid         | integer                  |
     line        | integer                  |
     time_stamp  | timestamp with time zone |
     seq         | integer                  |
     cpu_sys     | real                     |
     cpu_elapsed | real                     |
     cpu_user    | real                     |
     cpu_syst    | real                     |
     cpu_usert   | real                     |
     mssp        | integer                  |
     sigp        | integer                  |
     msrt        | integer                  |
     msst        | integer                  |
     sigt        | integer                  |
     msrp        | integer                  |
     swap        | integer                  |
     swat        | integer                  |
     recp        | integer                  |
     rect        | integer                  |
     pgfp        | integer                  |
     pgft        | integer                  |
     icsp        | integer                  |
     vcst        | integer                  |
     icst        | integer                  |
     vcsp        | integer                  |
     fsbop       | integer                  |
     fsbos       | integer                  |
     fsbip       | integer                  |
     fsbis       | integer                  |
     dread       | integer                  |
     dwrit       | integer                  |
     sbhr        | real                     |
     sread       | integer                  |
     swrit       | integer                  |
     lbhr        | real                     |
     lread       | integer                  |
     lwrit       | integer                  |
     dbuser      | character(8)             |
     tstart      | timestamp with time zone |
    Indexes: ps2_idx

    pganalysis=> \d ps2_idx
                Index "ps2_idx"
       Column   |           Type
    ------------+--------------------------
     tstart     | timestamp with time zone
     time_stamp | timestamp with time zone
    btree

    pganalysis=>

    psql (PostgreSQL) 7.2
    contains support for: readline, history, multibyte


    Platform: Celeron 1.3GHz, 512MB 40GB IDE hard disk, Linux 2.4.8-26mdk
    kernel

    Regards,

    Mike



    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Query performance discontinuity
Next
From: Rod Taylor
Date:
Subject: Re: Query performance discontinuity