Thread: Slow - grindingly slow - query

Slow - grindingly slow - query

From
Theo Kramer
Date:
Hi

I have a single table with two views. The table effectively contains both
master and detail info (legacy stuff I'm afraid). The query in question is
used to see if any records exist in the detail that do not exist in the
master. The table and index definition is as follows
 create table accounts (   domain text,   registrationtype char   /* Plus a couple of other irrelevant fields */ );
 create index domain_idx on accounts (domain); create index domain_type_idx on accounts (domain, registrationtype);

The views are
 create view accountmaster as SELECT * from accounts where registrationtype =
'N'; create view accountdetail as SELECT * from accounts where registrationtype <>
'N';

The query is
 select accountdetail.domain from accountdetail where   accountdetail.domain not in     (select accountmaster.domain
fromaccountmaster);
 

I started the query about 5 hours ago and it is still running. I did the same
on Informix Online 7 and it took less than two minutes...

My system details are postgres: 6.5.3 O/S: RH6.0 Kernel 2.2.5-15smp

Explain shows the following
 explain select accountdetail.domain from accountdetail where   accountdetail.domain not in     (select
accountmaster.domainfrom accountmaster) limit 10; NOTICE:  QUERY PLAN:
 
 Seq Scan on accounts  (cost=3667.89 rows=34958 width=12)   SubPlan     ->  Seq Scan on accounts  (cost=3667.89
rows=33373width=12)
 
 EXPLAIN

The number of records in the two views are
 psql -c "select count(*) from accountmaster" coza; count ----- 45527 (1 row)
 psql -c "select count(*) from accountdetail" coza; count ----- 22803

I know of exactly one record (I put it there myself) that satisfies the
selection criteria.

Any ideas would be appreciated

--------
Regards
Theo

PS We have it running live at http://co.za (commercial domains in South Africa).


Re: [HACKERS] Slow - grindingly slow - query

From
The Hermit Hacker
Date:
What does:

explain select domain from accountdetail where domain not in (     select domain from accountmaster);

show?

Also, did you do a 'vacuum analyze' on the tables?

Also, how about if you get rid of the views

SELECT domain FROM account
WHERE registrationtype <> 'N';

*shakes head*  am I missing something here?  I'm reading your SELECT and
'CREATE VIEW's and don't they negate each other? *scratch head*

If I'm reading your select properly, and with the amount of sleep I've had
recently, its possible I'm not...

The subselect is saying give me all domains whose registration type = 'N'.
The select itself is saying give me all domains whoe registration type <>
'N' (select accountdetail.domain from accountdetail), and narrow that
listing down further to only include those domains whose registration type
<> 'N'?

Either I'm reading this *totally* wrong, or you satisfy that condition
ujust by doing a 'SELECT domain FROM accountdetail;' ...

No?

On Thu, 11 Nov 1999, Theo Kramer wrote:

> Hi
> 
> I have a single table with two views. The table effectively contains both
> master and detail info (legacy stuff I'm afraid). The query in question is
> used to see if any records exist in the detail that do not exist in the
> master. The table and index definition is as follows
> 
>   create table accounts (
>     domain text,
>     registrationtype char
>     /* Plus a couple of other irrelevant fields */
>   );
> 
>   create index domain_idx on accounts (domain);
>   create index domain_type_idx on accounts (domain, registrationtype);
> 
> The views are
> 
>   create view accountmaster as SELECT * from accounts where registrationtype =
> 'N';
>   create view accountdetail as SELECT * from accounts where registrationtype <>
> 'N';
> 
> The query is
> 
>   select accountdetail.domain from accountdetail where
>     accountdetail.domain not in
>       (select accountmaster.domain from accountmaster);
> 
> I started the query about 5 hours ago and it is still running. I did the same
> on Informix Online 7 and it took less than two minutes...
> 
> My system details are
>   postgres: 6.5.3
>   O/S: RH6.0 Kernel 2.2.5-15smp
> 
> Explain shows the following
> 
>   explain select accountdetail.domain from accountdetail where
>     accountdetail.domain not in
>       (select accountmaster.domain from accountmaster) limit 10;
>   NOTICE:  QUERY PLAN:
> 
>   Seq Scan on accounts  (cost=3667.89 rows=34958 width=12)
>     SubPlan
>       ->  Seq Scan on accounts  (cost=3667.89 rows=33373 width=12)
> 
>   EXPLAIN
> 
> The number of records in the two views are
> 
>   psql -c "select count(*) from accountmaster" coza;
>   count
>   -----
>   45527
>   (1 row)
> 
>   psql -c "select count(*) from accountdetail" coza;
>   count
>   -----
>   22803
> 
> I know of exactly one record (I put it there myself) that satisfies the
> selection criteria.
> 
> Any ideas would be appreciated
> 
> --------
> Regards
> Theo
> 
> PS We have it running live at http://co.za (commercial domains in South Africa).
> 
> ************
> 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 




Re: [HACKERS] Slow - grindingly slow - query

From
Hannu Krosing
Date:
Theo Kramer wrote:
> 
> Hi
> 
> I have a single table with two views. The table effectively contains both
> master and detail info (legacy stuff I'm afraid). The query in question is
> used to see if any records exist in the detail that do not exist in the
> master. The table and index definition is as follows
> 
>   create table accounts (
>     domain text,
>     registrationtype char
>     /* Plus a couple of other irrelevant fields */
>   );
> 
>   create index domain_idx on accounts (domain);
>   create index domain_type_idx on accounts (domain, registrationtype);

try using   create index registrationtype_index  on accounts (registrationtype);

------
Hannu


Re: [HACKERS] Slow - grindingly slow - query

From
Theo Kramer
Date:
The Hermit Hacker wrote:
> 
> What does:
> 
> explain select domain from accountdetail
>         where domain not in (
>                 select domain from accountmaster);
> 
> show?

NOTICE:  QUERY PLAN:

Seq Scan on accounts  (cost=3667.89 rows=34958 width=12) SubPlan   ->  Seq Scan on accounts  (cost=3667.89 rows=33373
width=12)

EXPLAIN


> Also, did you do a 'vacuum analyze' on the tables?

Yes - should have mentioned that.
> Also, how about if you get rid of the views
> 
> SELECT domain FROM account
> WHERE registrationtype <> 'N';
> 
> *shakes head*  am I missing something here?  I'm reading your SELECT and
> 'CREATE VIEW's and don't they negate each other? *scratch head*

No - a domain can both be new (registrationtype 'N') and updated 
(registrationtype 'U') ie. one or more rows with the same domain with one row
containing a domain with registrationtype 'N' and zero or more rows containing
the same domain with registrationtype not 'N'. The reason for the <> 'N' and 
not just = 'U' is that we have a couple of rows with registrationtype set to
something else.
> The subselect is saying give me all domains whose registration type = 'N'.
> The select itself is saying give me all domains whoe registration type <>
> 'N' (select accountdetail.domain from accountdetail), and narrow that
> listing down further to only include those domains whose registration type
> <> 'N'?
> 
> Either I'm reading this *totally* wrong, or you satisfy that condition
> ujust by doing a 'SELECT domain FROM accountdetail;' ...
> 
> No?

No :). See above

--------
Regards
Theo


Re: [HACKERS] Slow - grindingly slow - query

From
Theo Kramer
Date:
Hannu Krosing wrote:
> try using
>     create index registrationtype_index  on accounts (registrationtype);

OK did that, and am rerunning the query. 

The explain now shows explain select accountdetail.domain from accountdetail where   accountdetail.domain not in
(selectaccountmaster.domain from accountmaster); NOTICE:  QUERY PLAN:
 
 Seq Scan on accounts  (cost=3667.89 rows=34958 width=12)   SubPlan     ->  Index Scan using registrationtype_idx on
accounts (cost=2444.62
 
rows=33373 width=12)
 EXPLAIN


Will let you all know when it completes.
--------
Regards
Theo


Re: [HACKERS] Slow - grindingly slow - query

From
Tom Lane
Date:
Theo Kramer <theo@flame.co.za> writes:
> The query is

>   select accountdetail.domain from accountdetail where
>     accountdetail.domain not in
>       (select accountmaster.domain from accountmaster);

Try something like
 select accountdetail.domain from accountdetail where   not exists (select accountmaster.domain from accountmaster
where              accountmaster.domain = accountdetail.domain);
 

I believe this is in the FAQ...
        regards, tom lane


Re: [HACKERS] Slow - grindingly slow - query

From
Theo Kramer
Date:
Tom Lane wrote:
> 
> Theo Kramer <theo@flame.co.za> writes:
> > The query is
> 
> >   select accountdetail.domain from accountdetail where
> >     accountdetail.domain not in
> >       (select accountmaster.domain from accountmaster);

This takes more than 5 hours and 30 minutes.

> Try something like
> 
>   select accountdetail.domain from accountdetail where
>     not exists (select accountmaster.domain from accountmaster where
>                 accountmaster.domain = accountdetail.domain);

This takes 5 seconds - wow!

> I believe this is in the FAQ...

Will check out the FAQs. Many thanks.
--------
Regards
Theo


Re: [HACKERS] Slow - grindingly slow - query

From
Vadim Mikheev
Date:
Theo Kramer wrote:
> 
> > Try something like
> >
> >   select accountdetail.domain from accountdetail where
> >     not exists (select accountmaster.domain from accountmaster where
> >                 accountmaster.domain = accountdetail.domain);
> 
> This takes 5 seconds - wow!

> I did the same on Informix Online 7 and it took less than two minutes...
            ^^^^^^^^^^^
 
Could you run the query above in Informix?
How long would it take to complete?

Vadim


Re: [HACKERS] Slow - grindingly slow - query

From
Theo Kramer
Date:

			
		

Re: [HACKERS] Slow - grindingly slow - query

From
Brian Hirt
Date:
> > >   select accountdetail.domain from accountdetail where
> > >     accountdetail.domain not in
> > >       (select accountmaster.domain from accountmaster);
> 
> This takes more than 5 hours and 30 minutes.
> 
> >   select accountdetail.domain from accountdetail where
> >     not exists (select accountmaster.domain from accountmaster where
> >                 accountmaster.domain = accountdetail.domain);
> 
> This takes 5 seconds - wow!
> 

I have a general comment/question here.  Why do in/not in clauses seem
to perform so slowly?  I've noticed this type of behavior with with my 
system also.  I think the above queries will always return the exact 
same results regardless of the data.  From looking at the query plan 
with explain, it's clear the second query makes better use of the 
indexes.  Can't the rewrite engine recognize a simple case like the 
one above and rewrite it to use exists and not exists with the proper 
joins?  Or possibly the optimizer can generate a better plan?  Sometimes 
it's not so easy to just change a query in the code.  Sometimes you can't
change the code because you only have executables and sometimes you are
using a tool that automatically generates SQL using in clauses.  
Additionally, since intersect and union get rewritten as in clauses they 
suffer the same performance problems. 

-brian

-- 
The world's most ambitious and comprehensive PC game database project.
                     http://www.mobygames.com


Re: [HACKERS] Slow - grindingly slow - query

From
Tom Lane
Date:
Brian Hirt <bhirt@mobygames.com> writes:
> Can't the rewrite engine recognize a simple case like the 
> one above and rewrite it to use exists and not exists with the proper 
> joins?  Or possibly the optimizer can generate a better plan?

This is on the TODO list, and will get done someday.  IMHO it's not as
urgent as a lot of the planner/optimizer's other shortcomings, because
it can usually be worked around by revising the query.

If it's bugging you enough to go fix it now, contributions are always
welcome ;-)
        regards, tom lane


Re: [HACKERS] Slow - grindingly slow - query

From
Brian Hirt
Date:
On Fri, Nov 12, 1999 at 09:58:14AM -0500, Tom Lane wrote:
> Brian Hirt <bhirt@mobygames.com> writes:
> > Can't the rewrite engine recognize a simple case like the 
> > one above and rewrite it to use exists and not exists with the proper 
> > joins?  Or possibly the optimizer can generate a better plan?
> 
> This is on the TODO list, and will get done someday.  IMHO it's not as
> urgent as a lot of the planner/optimizer's other shortcomings, because
> it can usually be worked around by revising the query.
> 
> If it's bugging you enough to go fix it now, contributions are always
> welcome ;-)
> 

Okay, what would be the correct approach to solving the problem, 
and where would be a good place to start?  I'v only been on this list
for a few weeks, so I'm missed discussion on the approach to solving 
this problem.  Should this change be localized to just the planner? 
Should the rewrite system be creating a different query tree?  Will both 
need to be changed?  If a lot of work is being done to this part of 
the system, is now a bad time to try this work?

I'm willing to jump in to this, but I may take a while to figure it out 
and ask a lot of questions that are obvious to the hardened postgres 
programmer.  I'm not famaliar with the postgres code, yet. 


-brian

-- 
The world's most ambitious and comprehensive PC game database project.
                     http://www.mobygames.com


Re: [HACKERS] Slow - grindingly slow - query

From
Tom Lane
Date:
Brian Hirt <bhirt@mobygames.com> writes:
> On Fri, Nov 12, 1999 at 09:58:14AM -0500, Tom Lane wrote:
>> If it's bugging you enough to go fix it now, contributions are always
>> welcome ;-)

> Okay, what would be the correct approach to solving the problem, 
> and where would be a good place to start?  I'v only been on this list
> for a few weeks, so I'm missed discussion on the approach to solving 
> this problem.  Should this change be localized to just the planner? 
> Should the rewrite system be creating a different query tree?  Will both 
> need to be changed?  If a lot of work is being done to this part of 
> the system, is now a bad time to try this work?

Well, actually, figuring out how & where to do it is the trickiest part
of the work.  Might not be the best project for a newbie backend-hacker
to start with :-(.

After a few moments' thought, it seems to me that this issue might be
closely intertwined with the OUTER JOIN stuff that Thomas is working on
and the querytree representation redesign that Jan and I have been
muttering about (but not yet actually doing anything about).  We want
to handle SELECT ... WHERE expr IN (SELECT ...) like a join, but the
semantics aren't exactly the same as a conventional join, so it might
be that the thing needs to be rewritten as a special join type.  In
that case it'd fit right in with OUTER JOIN, I suspect.

The Informix EXPLAIN results that Theo Kramer posted (a few messages
back in this thread) are pretty interesting too.  If I'm reading that
printout right, Informix is not any smarter than we are about choosing
the scan types for the outer and inner queries; and yet they have a much
faster runtime for the WHERE IN query.  I speculate that they are doing
the physical matching of outer and inner tuples in a smarter way than we
are --- perhaps they are doing one scan of the inner query and entering
all the values into a hashtable that's then probed for each outer tuple.
(As opposed to rescanning the inner query for each outer tuple, as we
currently do.)  If that's the answer, then it could probably be
implemented as a localized change: rewrite the SubPlan node executor to
look more like the HashJoin node executor.  This isn't perfect --- it
wouldn't pick up the possibility of a merge-style join --- but it would
be better than what we have for a lot less work than the "full" solution.

This is all shooting from the hip; I haven't spent time looking into it.
Has anyone else got insights to offer?
        regards, tom lane


Re: [HACKERS] Slow - grindingly slow - query

From
Theo Kramer
Date:
Tom Lane wrote:

> The Informix EXPLAIN results that Theo Kramer posted (a few messages
> back in this thread) are pretty interesting too.  If I'm reading that
> printout right, Informix is not any smarter than we are about choosing
> the scan types for the outer and inner queries; and yet they have a much
> faster runtime for the WHERE IN query.

The informix EXPLAIN for the 'not in' query was when I did not have an
index on registrationtype (the explain appends to file sqexplain.out so I
missed it :(). Anyway here is the Informix EXPLAIN with the index on
registrationtype.


QUERY:
------
select accountdetail.domain from accountdetail where accountdetail.domain not in (select accountmaster.domain from
accountmaster)

Estimated Cost: 4510
Estimated # of Rows Returned: 58810

1) informix.accounts: SEQUENTIAL SCAN
   Filters: (informix.accounts.domain != ALL <subquery> AND
informix.accounts.registrationtype != 'N' )
   Subquery:   ---------   Estimated Cost: 12   Estimated # of Rows Returned: 10
   1) informix.accounts: INDEX PATH
       (1) Index Keys: registrationtype           Lower Index Filter: informix.accounts.registrationtype = 'N'


The speed difference with or without the subquery index is neglible for
Informix.
--------
Regards
Theo


replication

From
"Aaron J. Seigo"
Date:
hi...

is anyone working on replication services in pgsql?

-- 
Aaron J. Seigo
Sys Admin