Thread: EXPLAIN detail
Hello everyone!! I have a table with 17 columns and it has almost 530000 records and doing just a SELECT * FROM table with the EXPLAIN ANALYZE I get: Seq Scan on table (cost=0.00...19452.95 rows=529395 width=170) (actual time=0.155...2194.294 rows=529395 loops=1) total runtime=3679.039 ms and this table has a PK... Do you think is too much time for a simple select?... I guess it's a bit slow to get all those records...but since I'm a newbie with PostgreSQL, what I can check to optimize? Thanks to all! Ciao, Luigi __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Wed, Apr 9, 2008 at 3:21 PM, Luigi N. Puleio <npuleio@rocketmail.com> wrote: > Hello everyone!! > > I have a table with 17 columns and it has almost > 530000 records and doing just a > > SELECT * FROM table > > with the EXPLAIN ANALYZE I get: > > Seq Scan on table (cost=0.00...19452.95 rows=529395 > width=170) (actual time=0.155...2194.294 rows=529395 > loops=1) > total runtime=3679.039 ms > > and this table has a PK... > Do you think is too much time for a simple select?... > Well, PK won't help you here because you are selecting all rows from the table and that seq scan is the right thing for that. Without knowing your hardware its difficult to judge if the time taken is more or not. Anyways, I don't think there is much tweaking you can do for such a query except making sure that your table is not bloated with dead tuples. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > Anyways, I don't think there is much > tweaking you can do for such a query except making sure that > your table is not bloated with dead tuples. To the OP: More explicitly: Make sure you use autovacuum or run VACUUM manually on the table periodically. Would I be correct in suspecting that your real problem is with a more meaningful and complex query, and the one you've posted is oversimplifying what you are trying to do? If that is the case, and you're having problems with queries that do more real work than this one does, maybe you should post EXPLAIN ANALYZE output from such a real world query. -- Craig Ringer
On Wed, 9 Apr 2008, Pavan Deolasee wrote: >> I have a table with 17 columns and it has almost >> 530000 records and doing just a >> >> SELECT * FROM table > Well, PK won't help you here because you are selecting all rows > from the table and that seq scan is the right thing for that. Yes. Like he said. Basically, you're asking the database to fetch all half a million rows. That's going to take some time, whatever hardware you have. The PK is completely irrelevant, because the query doesn't refer to it at all. To be honest, three seconds sounds pretty reasonable for that sort of query. Matthew -- There once was a limerick .sig that really was not very big It was going quite fine Till it reached the fourth line
--- Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > On Wed, Apr 9, 2008 at 3:21 PM, Luigi N. Puleio > <npuleio@rocketmail.com> wrote: > > Hello everyone!! > > > > I have a table with 17 columns and it has almost > > 530000 records and doing just a > > > > SELECT * FROM table > > > > with the EXPLAIN ANALYZE I get: > > > > Seq Scan on table (cost=0.00...19452.95 > rows=529395 > > width=170) (actual time=0.155...2194.294 > rows=529395 > > loops=1) > > total runtime=3679.039 ms > > > > and this table has a PK... > > Do you think is too much time for a simple > select?... > > > > Well, PK won't help you here because you are > selecting all rows > from the table and that seq scan is the right thing > for that. > Without knowing your hardware its difficult to judge > if > the time taken is more or not. Anyways, I don't > think there is much > tweaking you can do for such a query except making > sure that > your table is not bloated with dead tuples. > In effect, this simple query is a start of examination to check about speed for another nested query; more precisely I'm tring to obtain the difference of the time querying the same table with a different condition, like: SELECT (a.column1)::date, MIN(b.column2) - a.column2 FROM table a inner join table b on ((a.column1)::date = (b.column1)::date amd b.column3 = 'b' and (b.column1)::time without time zone >= (a.column1)::time without time zone) WHERE (a.column1)::date = '2008-04-09' a.column3 = 'a' GROUP BY a.column1 and with this I have to obtain like 3-4 records from all those whole 500000 records and with the explain analyze I get almost 6 seconds: Nested Loop (cost=0.00...52140.83 rows=1 width=34) (actual time=4311.756...5951.271 rows=1 loops=1) So its been a lot of time because I could wonder how long it would take for example if I do a filter not for a single day but for a month which should return much more than 1 row... Actually I emailed to the responsible of the server where PostgreSQL is installed to see if he done a vacuum manually lately since querying the pg_settings or the pg_stat_all_tables I have no response about autovacuum... But maybe there's a better way to query this nested loop for more efficience.... Thanks to all! Ciao, Luigi __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Luigi N. Puleio wrote: > SELECT > (a.column1)::date, MIN(b.column2) - a.column2 > FROM > table a > inner join table b > on ((a.column1)::date = (b.column1)::date amd > b.column3 = 'b' and (b.column1)::time without time > zone >= (a.column1)::time without time zone) > WHERE > (a.column1)::date = '2008-04-09' > a.column3 = 'a' > GROUP BY a.column1 > > and with this I have to obtain like 3-4 records from > all those whole 500000 records and with the explain > analyze I get almost 6 seconds: > > Nested Loop (cost=0.00...52140.83 rows=1 width=34) > (actual time=4311.756...5951.271 rows=1 loops=1) With all that casting, is it possible that appropriate indexes aren't being used because your WHERE / ON clauses aren't an exact type match for the index? Can you post the full EXPLAIN ANALYZE from the query? This snippet doesn't even show how records are being looked up. What about a \d of the table from psql, or at least a summary of the involved column data types and associated indexes? -- Craig Ringer
>> SELECT >> (a.column1)::date, MIN(b.column2) - a.column2 >> FROM >> table a >> inner join table b >> on ((a.column1)::date = (b.column1)::date amd >> b.column3 = 'b' and (b.column1)::time without time >> zone >= (a.column1)::time without time zone) >> WHERE >> (a.column1)::date = '2008-04-09' >> a.column3 = 'a' >> GROUP BY a.column1 >> >> and with this I have to obtain like 3-4 records from >> all those whole 500000 records and with the explain >> analyze I get almost 6 seconds: >> >> Nested Loop (cost=0.00...52140.83 rows=1 width=34) >> (actual time=4311.756...5951.271 rows=1 loops=1) > With all that casting, is it possible that appropriate indexes aren't > being used because your WHERE / ON clauses aren't an exact type match > for the index? You mean to put an index on date with timestamptz datatype column?... > Can you post the full EXPLAIN ANALYZE from the query? This snippet > doesn't even show how records are being looked up. HashAggregate (cost=52236.31..52236.33 rows=1 width=34) (actual time=7004.779...7004.782 rows=1 loops=1) -> Nested Loop (cost=0.00..52236.30 rows=1 width=34) (actual time=3939.450..7004.592 rows=1 loops=1) Join filter: (("inner".calldate)::time without time zone => ("outer".calldate)::time without time zone) -> Seq Scan on table a (cost=0.00..27444.03 rows=1 width=26) (actual time=2479.199..2485.266 rows=3 loops=1) Filter: (((calldate)::date = '2008-04-09'::date) AND ((src)::text = '410'::text) AND (substr((dst)::text, 1, 4)='*100'::text)AND ((lastdata)::text ='/dati/ita/loginok'::text)) ->Seq Scan on table b (cost=0.00..24792.22 rows=3 width=16) (actual time=1504.508..1506.374 rows=1 loops=3) Filter: ((((lastdata)::text ='/dati/ita/logoutok'::text) AND ('410'::text=(src)::text) AND ('2008-04-09'::date= (calldate)::date)) Total runtime: 7005.706 ms > What about a \d of the table from psql, or at least a summary of the > involved column data types and associated indexes? this table has an acctid column which is PK then most of the other columns are varchar(80) or so.... So for 4 records result, 7 seconds are too way a lot I guess... but as I said before I'm gonna wait if the responsible ofthe server did a VACUUM on the table... What do you think?... Thanks again to all. Ciao, Luigi __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Luigi N. Puleio wrote: >> With all that casting, is it possible that appropriate indexes aren't >> being used because your WHERE / ON clauses aren't an exact type match >> for the index? > > You mean to put an index on date with timestamptz datatype column?... Er ... I'm not quite sure what you mean. Do you mean an index on a cast of the column, eg: CREATE INDEX some_idx_name ON some_table ( some_timestamp_field::date ) then ... maybe. It's hard to be sure when there is so little information available. It shouldn't be necessary, but there are certainly uses for that sort of thing - for example, I use a couple of functional indexes in the schema I'm working on at the moment. It's probably a good idea to look at ways to avoid doing that first, though. >> Can you post the full EXPLAIN ANALYZE from the query? This snippet >> doesn't even show how records are being looked up. > > HashAggregate (cost=52236.31..52236.33 rows=1 width=34) (actual time=7004.779...7004.782 rows=1 loops=1) > -> Nested Loop (cost=0.00..52236.30 rows=1 width=34) (actual time=3939.450..7004.592 rows=1 loops=1) > Join filter: (("inner".calldate)::time without time zone => ("outer".calldate)::time without time zone) > -> Seq Scan on table a (cost=0.00..27444.03 rows=1 width=26) (actual time=2479.199..2485.266 rows=3 loops=1) > Filter: (((calldate)::date = '2008-04-09'::date) AND ((src)::text = '410'::text) AND (substr((dst)::text, 1,4)='*100'::text) AND ((lastdata)::text ='/dati/ita/loginok'::text)) > ->Seq Scan on table b (cost=0.00..24792.22 rows=3 width=16) (actual time=1504.508..1506.374 rows=1 loops=3) > Filter: ((((lastdata)::text ='/dati/ita/logoutok'::text) AND ('410'::text=(src)::text) AND ('2008-04-09'::date= (calldate)::date)) > Total runtime: 7005.706 ms Personally, I'd want to get rid of all those casts first. Once that's cleaned up I'd want to look at creating appropriate indexes on your tables. If necessary, I might even create a composite index on (lastdata,src,calldate) . >> What about a \d of the table from psql, or at least a summary of the >> involved column data types and associated indexes? > > this table has an acctid column which is PK then most of the other columns are varchar(80) or so.... Do you mean that the columns involved in your WHERE and ON clauses, the ones you're casting to date, timestamp, etc, are stored as VARCHAR? If so, it's no surprise that the query is slow because you're forcing PostgreSQL to convert a string to a date, timestamp, or time datatype to do anything with it ... and you're doing it many times in every query. That will be VERY slow, and prevent the use of (simple) indexes on those columns. If you're really storing dates/times as VARCHAR, you should probably look at some changes to your database design, starting with the use of appropriate data types. That's all guesswork, because you have not provided enough information. Can you please post the output of psql's \d command on the table in question? If for some reason you cannot do that, please at least include the data type of the primary key and all fields involved in the query, as well as a list of all the indexes on both tables. The easy way to do that is to just launch "psql" then run: \d table and paste the output to an email. > So for 4 records result, 7 seconds are too way a lot I guess... but as I said before I'm gonna wait if the responsibleof the server did a VACUUM on the table... > > What do you think?... If you're really casting VARCHAR to DATE, TIME, TIMESTAMP, etc on demand then personally I really doubt that dead rows are your problem. -- Craig Ringer
>> With all that casting, is it possible that appropriate indexes aren't >> being used because your WHERE / ON clauses aren't an exact type match >> for the index? > > You mean to put an index on date with timestamptz datatype column?... > Er ... I'm not quite sure what you mean. Do you mean an index on a cast > of the column, eg: > CREATE INDEX some_idx_name ON some_table ( some_timestamp_field::date ) > then ... maybe. It's hard to be sure when there is so little information > available. It shouldn't be necessary, but there are certainly uses for > that sort of thing - for example, I use a couple of functional indexes > in the schema I'm working on at the moment. It's probably a good idea to > look at ways to avoid doing that first, though. >> Can you post the full EXPLAIN ANALYZE from the query? This snippet >> doesn't even show how records are being looked up. > > HashAggregate (cost=52236.31..52236.33 rows=1 width=34) (actual time=7004.779...7004.782 rows=1 loops=1) > -> Nested Loop (cost=0.00..52236.30 rows=1 width=34) (actual time=3939.450..7004.592 rows=1 loops=1) > Join filter: (("inner".calldate)::time without time zone => ("outer".calldate)::time without time zone) > -> Seq Scan on table a (cost=0.00..27444.03 rows=1 width=26) (actual time=2479.199..2485.266 rows=3 loops=1) > Filter: (((calldate)::date = '2008-04-09'::date) AND ((src)::text = '410'::text) AND (substr((dst)::text, 1,4)='*100'::text) AND ((lastdata)::text ='/dati/ita/loginok'::text)) > ->Seq Scan on table b (cost=0.00..24792.22 rows=3 width=16) (actual time=1504.508..1506.374 rows=1 loops=3) > Filter: ((((lastdata)::text ='/dati/ita/logoutok'::text) AND ('410'::text=(src)::text) AND ('2008-04-09'::date= (calldate)::date)) > Total runtime: 7005.706 ms > Personally, I'd want to get rid of all those casts first. Once that's > cleaned up I'd want to look at creating appropriate indexes on your > tables. If necessary, I might even create a composite index on > (lastdata,src,calldate) . >> What about a \d of the table from psql, or at least a summary of the >> involved column data types and associated indexes? > > this table has an acctid column which is PK then most of the other columns are varchar(80) or so.... > Do you mean that the columns involved in your WHERE and ON clauses, the > ones you're casting to date, timestamp, etc, are stored as VARCHAR? If > so, it's no surprise that the query is slow because you're forcing > PostgreSQL to convert a string to a date, timestamp, or time datatype to > do anything with it ... and you're doing it many times in every query. > That will be VERY slow, and prevent the use of (simple) indexes on those > columns. > If you're really storing dates/times as VARCHAR, you should probably > look at some changes to your database design, starting with the use of > appropriate data types. > That's all guesswork, because you have not provided enough information. > Can you please post the output of psql's \d command on the table in > question? > If for some reason you cannot do that, please at least include the data > type of the primary key and all fields involved in the query, as well as > a list of all the indexes on both tables. > The easy way to do that is to just launch "psql" then run: > \d table > and paste the output to an email. > So for 4 records result, 7 seconds are too way a lot I guess... but as I said before I'm gonna wait if > the responsibleof the server did a VACUUM on the table... > > What do you think?... > If you're really casting VARCHAR to DATE, TIME, TIMESTAMP, etc on demand > then personally I really doubt that dead rows are your problem. Well, this table has a primary key index on first column called acctid which is an integer; instead the calldate column isa TIMESTAMPTZ and in fact I'm using to do (calldate)::date in the ON clause because since the time part of that columnis always different and in the nesting I have to identificate the date is the same... the other two columns (src and lastdata) are both VARCHAR(80) and the query is this one: EXPLAIN ANALYZE SELECT (a.calldate)::date, a.src, a.dst, MIN(e.calldate) - a.calldate FROM cdr a INNER JOIN cdr e ON ((e.calldate)::date = (a.calldate)::date AND e.src = a.src AND e.lastdata = '/dati/ita/logoutok' AND e.calldate >= a.calldate) WHERE (a.calldate)::date = '2008-04-09' AND a.src = '410' AND substr(a.dst, 1, 4) = '*100' AND a.lastdata = '/dati/ita/loginok' GROUP BY a.calldate, a.src, a.dst __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Luigi N. Puleio wrote: >> If for some reason you cannot do that, please at least include the data >> type of the primary key and all fields involved in the query, as well as >> a list of all the indexes on both tables. If you won't show people on the list your table definitions, or at least the information shown above, then it's less likely that anybody can help you or will spend the time trying to help you. Personally I think you may need some functional/cast, and possibly composite, indexes to avoid the looping sequential scan as I said before. However, that's guesswork without some more information as repeatedly stated and requested. I'm not going to bother replying to any further mail just to say so again. Try reading the documentation chapter about indexes: http://www.postgresql.org/docs/current/static/indexes.html and about query optimisation: http://www.postgresql.org/docs/current/static/performance-tips.html then experiment with various indexes to see what works best. Think about the data types. Remember that you can build an index on a cast of a field, on multiple fields, on function calls, or basically any other simple expression or expressions, but that complex indexes will cost more to build and maintain and might be bigger (and thus slower to search). Anyway, I'm done. -- Craig Ringer
> Well, this table has a primary key index on first column called acctid > which is an integer; instead the calldate column is a TIMESTAMPTZ and in > fact I'm using to do (calldate)::date in the ON clause because since the > time part of that column is always different and in the nesting I have > to identificate the date is the same... > > the other two columns (src and lastdata) are both VARCHAR(80) and the > query is this one: Tip for getting answers from this list : You should just post the output of "\d yourtable" from psql, it would be quicker than writing a paragraph... Be lazy, lol. So, basically if I understand you are doing a self-join on your table, you want all rows from the same day, and you're doing something with the dates, and... Tip for getting answers from this list : Explain (in english) what your query actually does, someone might come up with a better idea on HOW to do it. Snip : > EXPLAIN ANALYZE > SELECT > (a.calldate)::date, > a.src, > a.dst, > MIN(e.calldate) - a.calldate > FROM > cdr a > INNER JOIN cdr e > ON ((e.calldate)::date = (a.calldate)::date AND e.src = a.src > AND e.lastdata = '/dati/ita/logoutok' AND e.calldate >= > a.calldate) > WHERE > (a.calldate)::date = '2008-04-09' > AND a.src = '410' > AND substr(a.dst, 1, 4) = '*100' > AND a.lastdata = '/dati/ita/loginok' > GROUP BY > a.calldate, a.src, a.dst OK, I assume you have an index on calldate, which is a TIMESTAMPTZ ? (in that case, why is it called calldate, and not calltimestamp ?...) Bad news, the index is useless for this condition : (a.calldate)::date = '2008-04-09' There, you are asking postgres to scan the entire table, convert the column to date, and test. Bad. In order to use the index, you could rewrite it as something like : a.calldate >= '2008-04-09' AND a.calldate < ('2008-04-09'::DATE + '1 DAY'::INTERVAL) This is a RANGE query (just like BETWEEN) which is index-friendly. Personnaly, I wouldn't do it that way : since you use the date (and not the time, I presume you only use the time for display purposes) I would just store the timestamptz in "calltimestamp" and the date in "calldate", with a trigger to ensure the date is set to calltimestamp::date every time a row is inserted/updated. This is better than a function index since you use that column a lot in your query, it will be slightly faster, and it will save a lot of timestamptz->date casts hence it will save CPU cycles Try this last option (separate date column), and repost EXPLAIN ANALYZE of your query so it can be optimized further. Also, PLEASE don't use substr(), use a.dst LIKE '*100%', look in the manual. LIKE 'foo%' is indexable if you create the proper index.
On Wed, Apr 9, 2008 at 11:41 AM, PFC <lists@peufeu.com> wrote: > In order to use the index, you could rewrite it as something like : > a.calldate >= '2008-04-09' AND a.calldate < ('2008-04-09'::DATE + '1 > DAY'::INTERVAL) > This is a RANGE query (just like BETWEEN) which is index-friendly. Another option would be to create a functional index on date_trunc( 'day', cdr.calldate) then using a where condition like: date_trunc(a.calldate) = '2008-04-09' would definitely use an index. -- Regards, Richard Broersma Jr.