Re: EXPLAIN detail - Mailing list pgsql-performance

From Luigi N. Puleio
Subject Re: EXPLAIN detail
Date
Msg-id 981759.47577.qm@web33504.mail.mud.yahoo.com
Whole thread Raw
In response to EXPLAIN detail  ("Luigi N. Puleio" <npuleio@rocketmail.com>)
Responses Re: EXPLAIN detail  (Craig Ringer <craig@postnewspapers.com.au>)
Re: EXPLAIN detail  (PFC <lists@peufeu.com>)
List pgsql-performance
>> With all that casting, is it possible that appropriate indexes aren't
>> being used because your WHERE / ON clauses aren't an exact type match
>> for the index?
>
> You mean to put an index on date with timestamptz datatype column?...

> Er ... I'm not quite sure what you mean. Do you mean an index on a cast
> of the column, eg:

> CREATE INDEX some_idx_name ON some_table ( some_timestamp_field::date )

> then ... maybe. It's hard to be sure when there is so little information
> available. It shouldn't be necessary, but there are certainly uses for
> that sort of thing - for example, I use a couple of functional indexes
> in the schema I'm working on at the moment. It's probably a good idea to
> look at ways to avoid doing that first, though.

>> Can you post the full EXPLAIN ANALYZE from the query? This snippet
>> doesn't even show how records are being looked up.
>
> HashAggregate (cost=52236.31..52236.33 rows=1 width=34) (actual time=7004.779...7004.782 rows=1 loops=1)
>   -> Nested Loop (cost=0.00..52236.30 rows=1 width=34) (actual time=3939.450..7004.592 rows=1 loops=1)
>        Join filter: (("inner".calldate)::time without time zone => ("outer".calldate)::time without time zone)
>      -> Seq Scan on table a (cost=0.00..27444.03 rows=1 width=26) (actual time=2479.199..2485.266 rows=3 loops=1)
>           Filter: (((calldate)::date = '2008-04-09'::date) AND ((src)::text = '410'::text) AND (substr((dst)::text,
1,4)='*100'::text) AND ((lastdata)::text ='/dati/ita/loginok'::text))  
>           ->Seq Scan on table b (cost=0.00..24792.22 rows=3 width=16) (actual time=1504.508..1506.374 rows=1 loops=3)
>               Filter: ((((lastdata)::text ='/dati/ita/logoutok'::text) AND ('410'::text=(src)::text) AND
('2008-04-09'::date= (calldate)::date)) 
> Total runtime: 7005.706 ms

> Personally, I'd want to get rid of all those casts first. Once that's
> cleaned up I'd want to look at creating appropriate indexes on your
> tables. If necessary, I might even create a composite index on
> (lastdata,src,calldate) .

>> What about a \d of the table from psql, or at least a summary of the
>> involved column data types and associated indexes?
>
> this table has an acctid column which is PK then most of the other columns are varchar(80) or so....

> Do you mean that the columns involved in your WHERE and ON clauses, the
> ones you're casting to date, timestamp, etc, are stored as VARCHAR? If
> so, it's no surprise that the query is slow because you're forcing
> PostgreSQL to convert a string to a date, timestamp, or time datatype to
> do anything with it ... and you're doing it many times in every query.
> That will be VERY slow, and prevent the use of (simple) indexes on those
> columns.

> If you're really storing dates/times as VARCHAR, you should probably
> look at some changes to your database design, starting with the use of
> appropriate data types.

> That's all guesswork, because you have not provided enough information.

> Can you please post the output of psql's \d command on the table in
> question?

> If for some reason you cannot do that, please at least include the data
> type of the primary key and all fields involved in the query, as well as
> a list of all the indexes on both tables.

> The easy way to do that is to just launch "psql" then run:

> \d table

> and paste the output to an email.

> So for 4 records result, 7 seconds are too way a lot I guess... but as I said before I'm gonna wait if > the
responsibleof the server did a VACUUM on the table... 
>
> What do you think?...

> If you're really casting VARCHAR to DATE, TIME, TIMESTAMP, etc on demand
> then personally I really doubt that dead rows are your problem.


Well, this table has a primary key index on first column called acctid which is an integer; instead the calldate column
isa TIMESTAMPTZ and in fact I'm using to do (calldate)::date in the ON clause because since the time part of that
columnis always different and in the nesting I have to identificate the date is the same... 

the other two columns (src and lastdata) are both VARCHAR(80) and the query is this one:

EXPLAIN ANALYZE
SELECT
      (a.calldate)::date,
      a.src,
      a.dst,
      MIN(e.calldate) - a.calldate
FROM
    cdr a
    INNER JOIN cdr e
    ON ((e.calldate)::date = (a.calldate)::date AND e.src = a.src
        AND e.lastdata = '/dati/ita/logoutok' AND e.calldate >= a.calldate)
WHERE
     (a.calldate)::date = '2008-04-09'
     AND a.src = '410'
     AND substr(a.dst, 1, 4) = '*100'
     AND a.lastdata = '/dati/ita/loginok'
GROUP BY
      a.calldate, a.src, a.dst





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: EXPLAIN detail
Next
From: Craig Ringer
Date:
Subject: Re: EXPLAIN detail