Re: [HACKERS] Neverending query on 6.5.2 over Solaris 2.5.1 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Neverending query on 6.5.2 over Solaris 2.5.1
Date
Msg-id 20407.940603731@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Neverending query on 6.5.2 over Solaris 2.5.1  (Fernando Schapachnik <fpscha@ns1.via-net-works.net.ar>)
Responses Re: [HACKERS] Neverending query on 6.5.2 over Solaris 2.5.1
Re: [HACKERS] Neverending query on 6.5.2 over Solaris 2.5.1
List pgsql-hackers
Fernando Schapachnik <fpscha@ns1.via-net-works.net.ar> writes:
>>>> postmaster starts eating a lot of CPU and it doesn't finish to 
>>>> process the query in +20 minutes.
>> 
>> Have you vacuumed the database lately?  What does "explain ..." show

> After vacuum:
> On the Sun: 1 minute.
> On the FreeBSD: 12 seconds.

That's a little better, anyway ...

> Explain shows (on both machines):

> Nested Loop  (cost=503.74 rows=1 width=74)
>   ->  Nested Loop  (cost=500.89 rows=1 width=58)
>         ->  Seq Scan on usuarios u  (cost=498.84 rows=1 width=30)
>         ->  Index Scan using passwd_id_usr_key on passwd pas  
> (cost=2.05 rows=10571 width=28)
>   ->  Seq Scan on perfiles per  (cost=2.85 rows=56 width=16)

OK, that still looks a little bogus.  It's estimating it will only
find one row in usarios that needs to be joined against the other
two tables.  If that were true, then this plan is pretty reasonable,
but I bet it's not true.  The only WHERE clause that can be used to
eliminate usarios rows in advance of the join is (u.activa), and I'll
bet you have more than one active user.

Does the plan change if you do VACUUM ANALYZE instead of just a plain
vacuum?

> As usuarios.id_usr and passwd.id_usr are both serial, they have 
> indexes automatically created (I double checked that). PgAccess shows 
> that usuarios has no primary key (I don't know why) and that 
> usuarios_id_usr_key is an unique, no clustered index. Same on passwd.

OK, so it *could* make a mergejoin plan without sorting.  I think the
problem is the unreasonably low estimate for number of matching usarios
rows; that makes the nested-loop plan look cheap because of its lower
startup overhead.  But if there's a lot of usarios rows to process then
it's not so cheap anymore.

As an experiment you could try forbidding nestloop plans (start psql
with environment variable PGOPTIONS="-fn") and see what sort of plan
you get then and how long it really takes in comparison to the nestloop.
This isn't a good long-term solution, because you might get poor plans
for smaller queries, but it would help us see whether and how the
planner is making the wrong choice.  (I've been trying to collect
examples of poor planning so that I can improve the planner --- so
I'm quite interested in the details of your situation.)

> I'm running postmaster -N 8 -B 16 because whitout these postmaster 
> wouldn't get all the shared memory it needed and won't start. Do you 
> think that this may be in some way related?

Well, that's certainly costing you performance; 16 disk pages is not
enough buffer space to avoid thrashing.  You need to increase your
kernel's max-shared-memory-block-size (SHMMAX, I think) parameter
so that you can run with a more reasonable -B setting.  A lot of
kernels ship with SHMMAX settings that are ridiculously small for
any modern machine.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Fernando Schapachnik
Date:
Subject: Re: [HACKERS] Neverending query on 6.5.2 over Solaris 2.5.1
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Re: [GENERAL] Postgres INSERTs much slower than MySQL?