Thread: Re: COUNT(*) to find records which have a certain number of

Re: COUNT(*) to find records which have a certain number of

From

T E Schmitz

Date:

20 September 2004, 17:19:50

I figured it eventually. (The only thing I don't know is where to put
the ORDER BY.)

> I want to select only those BRAND/MODEL combinations, where the MODEL 
> has more than one TYPE, but only where one of those has TYPE_NAME='xyz'.
> I am not interested in MODELs with multiple TYPEs where none of them are 
> called 'xyz'.

SELECT
BRAND_NAME,MODEL_NAME
FROM TYPE
left outer join MODEL  on MODEL_PK =TYPE.MODEL_FK
left outer join BRAND  on BRAND_PK =MODEL.BRAND_FK
group by BRAND.BRAND_NAME,MODEL_NAME
having count(*)>1

intersect

SELECT
BRAND_NAME,MODEL.MODEL_NAME
FROM TYPE
left outer join MODEL  on MODEL_PK =TYPE.MODEL_FK
left outer join BRAND  on BRAND_PK =MODEL.BRAND_FK
where TYPE_NAME='xyz'
group by BRAND.BRAND_NAME,MODEL_NAME


-- 


Regards/Gruß,

Tarlika Elisabeth Schmitz



-- 


Regards/Gruß,

Tarlika Elisabeth Schmitz

Re: COUNT(*) to find records which have a certain number of

From

Rod Taylor

Date:

20 September 2004, 17:32:44

On Mon, 2004-09-20 at 12:19, T E Schmitz wrote:
> I figured it eventually. (The only thing I don't know is where to put
> the ORDER BY.)

Try this:              SELECT brand_name, model_name         FROM (SELECT ... INTERSECT SELECT ...) AS t       ORDER BY
...

Re: COUNT(*) to find records which have a certain number of

From

T E Schmitz

Date:

20 September 2004, 19:08:17

Hi Rod,

Rod Taylor wrote:

> On Mon, 2004-09-20 at 12:19, T E Schmitz wrote:
> 
>>I figured it eventually. (The only thing I don't know is where to put
>>the ORDER BY.)
> 
> 
> Try this:
>         
>         SELECT brand_name, model_name
>           FROM (SELECT ... INTERSECT SELECT ...) AS t
>         ORDER BY ...

That does the trick. You're a genius ;-)

-- 


Regards/Gruß,

Tarlika

Re: COUNT(*) to find records which have a certain number of

From

Greg Stark

Date:

20 September 2004, 19:33:53

T E Schmitz <mailreg@numerixtechnology.de> writes:

> SELECT
> BRAND_NAME,MODEL_NAME
...
> intersect
...

Huh, I never think of the set operation solutions. I'm curious how it compares
speed-wise.

-- 
greg

JOIN performance

From

"Dean Gibson (DB Administrator)"

Date:

20 September 2004, 20:25:04

I have a view that when used, is slow:

CREATE  VIEW     "Data".genlic_a4avail  AS        SELECT                genlic_a4.*,                last_action_date,
            end_date,                canceled            FROM    genlic_a4                LEFT JOIN lic_hd  USING(
sys_id)            WHERE       status != 'A';
 

Here is the EXPLAIN output:
 Merge Join  (cost=155360.47..159965.70 rows=13063 width=75)   Merge Cond: ("outer".sys_id = "inner".sys_id)   ->  Sort
(cost=3912.51..3916.48 rows=1589 width=62)         Sort Key: "_GenLicGroupA4".sys_id         ->  Nested Loop
(cost=0.00..3828.04rows=1589 width=62)               Join Filter: ("outer".callsign ~ ("inner".pattern)::text)
    ->  Seq Scan on "_GenLicGroupA4"  (cost=0.00..1667.40 
 
rows=1589 width=21)                     Filter: ((status <> 'R'::bpchar) AND (status <> 
'A'::bpchar) AND (geo_region = 12))               ->  Seq Scan on "_GeoRestrict"  (cost=0.00..1.16 rows=16 
width=41)   ->  Sort  (cost=262032.96..264249.96 rows=886799 width=72)         Sort Key: lic_hd.sys_id         ->
SubqueryScan lic_hd  (cost=0.00..24529.99 rows=886799 width=72)               ->  Seq Scan on "_LicHD"
(cost=0.00..24529.99rows=886799 
 
width=72)

If I change the view to this:

CREATE  VIEW     "Data".genlic_a4avail  AS        SELECT                genlic_a4.*,                (SELECT
last_action_dateFROM lic_hd WHERE sys_id = 
 
genlic_a4.sys_id LIMIT 1) AS                last_action_date,                (SELECT end_date         FROM lic_hd WHERE
sys_id= 
 
genlic_a4.sys_id LIMIT 1) AS                end_date,                (SELECT canceled         FROM lic_hd WHERE sys_id
=
 
genlic_a4.sys_id LIMIT 1) AS                canceled            FROM    genlic_a4            WHERE       status !=
'A';

Then the performance is MUCH better:
 Subquery Scan genlic_a4avail  (cost=0.00..3828.04 rows=1589 width=62)   ->  Nested Loop  (cost=0.00..3828.04 rows=1589
width=62)        Join Filter: ("outer".callsign ~ ("inner".pattern)::text)         ->  Seq Scan on "_GenLicGroupA4"
(cost=0.00..1667.40rows=1589 
 
width=21)               Filter: ((status <> 'R'::bpchar) AND (status <> 
'A'::bpchar) AND (geo_region = 12))         ->  Seq Scan on "_GeoRestrict"  (cost=0.00..1.16 rows=16 width=41)
SubPlan          ->  Limit  (cost=0.00..3.01 rows=1 width=4)                 ->  Index Scan using "_LicHD_pkey" on 
 
"_LicHD"  (cost=0.00..3.01 rows=1 width=4)                       Index Cond: (unique_system_identifier = $0)
-> Limit  (cost=0.00..3.01 rows=1 width=8)                 ->  Index Scan using "_LicHD_pkey" on 
 
"_LicHD"  (cost=0.00..3.01 rows=1 width=8)                       Index Cond: (unique_system_identifier = $0)
-> Limit  (cost=0.00..3.01 rows=1 width=8)                 ->  Index Scan using "_LicHD_pkey" on 
 
"_LicHD"  (cost=0.00..3.01 rows=1 width=8)                       Index Cond: (unique_system_identifier = $0)

Note that genlic_a4 is small (4519), and lic_hd is large (886799), and 
lic_hd has sys_id as its PRIMARY KEY.

Is there a better way to write the LEFT JOIN so as to achieve the 
performance of the second VIEW without the clumsiness of the three (SELECT 
... LIMIT 1) ???

Re: JOIN performance

From

Tom Lane

Date:

21 September 2004, 00:06:25

"Dean Gibson (DB Administrator)" <postgresql3@ultimeth.com> writes:
> I have a view that when used, is slow:

It's obvious that you haven't given us anything remotely like full
information here.  AFAICT the "tables" in your view are actually
other views, plus it looks like your call to the view is a query
that adds some other conditions you didn't mention.  (Or are those
conditions coming from the sub-views?  Hard to tell.)  If you want
useful help you need to be more complete.
        regards, tom lane

Re: JOIN performance

From

"Dean Gibson (DB Administrator)"

Date:

21 September 2004, 01:11:39

Tom Lane wrote on 2004-09-20 16:06:
>"Dean Gibson (DB Administrator)" <postgresql3@ultimeth.com> writes:
> > I have a view that when used, is slow:
>
>... If you want useful help you need to be more complete.

I use views to "hide" tables so that I can populate new tables and then 
atomically switch to them with "CREATE OR REPLACE ...".  Here is the same 
data with the raw tables:

=> explain select * from "20040920_070010"."_GenLicGroupA4" AS x LEFT JOIN 
"20040919_070713"."_LicHD" AS y ON x.sys_id = 
y.unique_system_identifier;                                           QUERY 
PLAN
------------------------------------------------------------------------------------------------- Merge Join
(cost=5235.14..35123.51rows=43680 width=365)   Merge Cond: ("outer".unique_system_identifier = "inner".sys_id)   ->
IndexScan using "_LicHD_pkey" on "_LicHD" y  (cost=0.00..27361.79 
 
rows=886799 width=344)   ->  Sort  (cost=5235.14..5344.34 rows=43680 width=21)         Sort Key: x.sys_id         ->
SeqScan on "_GenLicGroupA4" x  (cost=0.00..1339.80 
 
rows=43680 width=21)

Using first level views, as mentioned above, the results are the same:

=> explain select * from "Base"."GenLicGroupA4" AS x LEFT JOIN 
"Base"."LicHD" AS y ON x.sys_id = 
y.unique_system_identifier;                                          QUERY 
PLAN
----------------------------------------------------------------------------------------------- Merge Join
(cost=5235.14..35123.51rows=43680 width=365)   Merge Cond: ("outer".unique_system_identifier = "inner".sys_id)   ->
IndexScan using "_LicHD_pkey" on "_LicHD"  (cost=0.00..27361.79 
 
rows=886799 width=344)   ->  Sort  (cost=5235.14..5344.34 rows=43680 width=21)         Sort Key:
"_GenLicGroupA4".sys_id        ->  Seq Scan on "_GenLicGroupA4"  (cost=0.00..1339.80 rows=43680 
 
width=21)

However, when I introduce a second-level view for the second table of:

CREATE  VIEW     "Data".lic_hd  AS        SELECT                unique_system_identifier                        AS
sys_id,               callsign                                        AS callsign,                uls_file_number
                         AS 
 
uls_file_num,                applicant_type_code                             AS 
applicant_type,                radio_service_code                              AS 
radio_service,                license_status                                  AS 
license_status,                grant_date                                      AS grant_date,
effective_date                                 AS 
 
effective_date,                cancellation_date                               AS 
cancel_date,                expired_date                                    AS 
expire_date,                last_action_date                                AS 
last_action_date,                CASE WHEN cancellation_date < expired_date                     THEN cancellation_date
                  ELSE expired_date                END                                             AS end_date,
      cancellation_date < expired_date                AS canceled            FROM    "Base"."LicHD";
 

And then change the query to use it, I get:

=> explain select * from "Base"."GenLicGroupA4" AS x LEFT JOIN lic_hd AS y 
ON x.sys_id = y.sys_id;                                     QUERY 
PLAN
------------------------------------------------------------------------------------ Merge Join
(cost=280258.11..289399.92rows=359154 width=98)   Merge Cond: ("outer".sys_id = "inner".sys_id)   ->  Sort
(cost=5235.14..5344.34rows=43680 width=21)         Sort Key: "_GenLicGroupA4".sys_id         ->  Seq Scan on
"_GenLicGroupA4" (cost=0.00..1339.80 rows=43680 
 
width=21)   ->  Sort  (cost=262032.96..264249.96 rows=886799 width=72)         Sort Key: y.sys_id         ->  Subquery
Scany  (cost=0.00..24529.99 rows=886799 width=72)               ->  Seq Scan on "_LicHD"  (cost=0.00..24529.99
rows=886799
 
width=72)

Note that the scan on _LicHD is now sequential.  If I change the above view 
to remove the last two columns, I get:
                                     QUERY 
PLAN
------------------------------------------------------------------------------------ Merge Join
(cost=5235.14..35123.51rows=43680 width=93)   Merge Cond: ("outer".unique_system_identifier = "inner".sys_id)   ->
IndexScan using "_LicHD_pkey" on "_LicHD"  (cost=0.00..27361.79 
 
rows=886799 width=72)   ->  Sort  (cost=5235.14..5344.34 rows=43680 width=21)         Sort Key: x.sys_id         ->
SeqScan on "_GenLicGroupA4" x  (cost=0.00..1339.80 
 
rows=43680 width=21)

Which is back to my original (good) performance.

Question:  why do the last two column definitions in the second VIEW change 
the scan on _LicHD from indexed to sequential ??

-- Dean

Re: JOIN performance

From

Tom Lane

Date:

21 September 2004, 01:54:35

"Dean Gibson (DB Administrator)" <postgresql3@ultimeth.com> writes:
> Question:  why do the last two column definitions in the second VIEW change 
> the scan on _LicHD from indexed to sequential ??

It's the CASE that's getting you.  The poor plan is basically because
the sub-view isn't getting "flattened" into the upper query, and so it's
not possible to choose a plan for it that's dependent on the upper query
context.  And the reason it's not getting flattened is that subselects
that are on the nullable side of an outer join can't be flattened unless
they have nullable targetlists --- otherwise the results might not go to
NULL when they are supposed to.  A CASE construct is always going to be
treated as non-nullable.

Fixing this properly is a research project, and I haven't thought of any
quick-and-dirty hacks that aren't too ugly to consider :-(

In the meantime, you could easily replace that CASE construct with a
min() function that's declared strict.  I think date_smaller would
do nicely, assuming the columns are actually of type date.
        regards, tom lane

Re: JOIN performance

From

"Dean Gibson (DB Administrator)"

Date:

21 September 2004, 02:44:42

Ahh, that explains why a "plain" JOIN (which doesn't do what I need) gave 
much better performance than the LEFT JOIN.

I could ask why a CASE statement is always non-nullable, but I don't think 
the answer would help be solve my problem. <grin>  I suppose it's that even 
though my particular CASE statement has WHEN/ELSE values that come from the 
nullable side of the JOIN, in general that's not true ...

Okay, now for my big question:  I searched high and low for a function that 
would return the minimum of two dates, and found none.  Now you come up 
with "date_smaller", which works fine (as does "date_larger"), but where 
are those documented?  More importantly, where are other functions like 
them documented?

-- Dean

ps:  Who dreamt up the names "date_smaller" and "date_larger" ??? Much more 
intuitive are "min_date" and "max_date".

pps:  I realize that "date_smaller" isn't exactly equivalent to my CASE 
statement;  a NULL value for one of the CASE operands causes the result of 
the ELSE clause to be returned, whereas "date_smaller" just returns NULL in 
that case.  In my data, that's significant.  I suppose that COALESCE has 
the same problem as CASE ...

Tom Lane wrote on 2004-09-20 17:54:
>"Dean Gibson (DB Administrator)" <postgresql3@ultimeth.com> writes:
>Question:  Why do the last two column definitions in the second VIEW 
>change the scan on _LicHD from indexed to sequential ??
>
>It's the CASE that's getting you.  The poor plan is basically because the 
>sub-view isn't getting "flattened" into the upper query, and so it's not 
>possible to choose a plan for it that's dependent on the upper query 
>context.  And the reason it's not getting flattened is that subselects 
>that are on the nullable side of an outer join can't be flattened unless 
>they have nullable targetlists --- otherwise the results might not go to 
>NULL when they are supposed to.  A CASE construct is always going to be 
>treated as non-nullable.
>
>Fixing this properly is a research project, and I haven't thought of any 
>quick-and-dirty hacks that aren't too ugly to consider :-(
>
>In the meantime, you could easily replace that CASE construct with a min() 
>function that's declared strict.  I think date_smaller would do nicely, 
>assuming the columns are actually of type date.

Re: JOIN performance

From

Chester Kustarz

Date:

21 September 2004, 05:09:31

On Mon, 20 Sep 2004, Dean Gibson (DB Administrator) wrote:
> Okay, now for my big question:  I searched high and low for a function that
> would return the minimum of two dates, and found none.  Now you come up
> with "date_smaller", which works fine (as does "date_larger"), but where
> are those documented?  More importantly, where are other functions like
> them documented?

You can make them:

http://www.postgresql.org/docs/7.4/interactive/server-programming.html

"strict" means it will return NULL when the input is NULL.

Re: JOIN performance

From

Tom Lane

Date:

21 September 2004, 05:10:23

"Dean Gibson (DB Administrator)" <postgresql3@ultimeth.com> writes:
> I could ask why a CASE statement is always non-nullable, but I don't think 
> the answer would help be solve my problem. <grin>  I suppose it's that even 
> though my particular CASE statement has WHEN/ELSE values that come from the 
> nullable side of the JOIN, in general that's not true ...

Right, the code just sees CASE and barfs.  A finer grain of analysis
could conclude that this CASE is actually safe, but I'm unconvinced that
it's worth doing.  (Feel free to pursue this on -hackers if you care.)

> Okay, now for my big question:  I searched high and low for a function that 
> would return the minimum of two dates, and found none.  Now you come up 
> with "date_smaller", which works fine (as does "date_larger"), but where 
> are those documented?

They aren't; they are actually only intended as support functions for
MIN (resp. MAX) on dates.  But they're there, and there's nothing to
stop you using 'em.

(You do of course realize that you could have implemented these
functions for yourself in a one-liner sql or plpgsql function.  Finding
the function in the system saves you a few minutes at most.)

> ps:  Who dreamt up the names "date_smaller" and "date_larger" ???

[ shrug... ]  Some now-forgotten grad student at Berkeley.  All the
support functions for MIN and MAX are named 'foosmaller' and
'foolarger'.  Try

select aggtransfn from pg_aggregate, pg_proc
where pg_proc.oid = aggfnoid and proname = 'min';

> pps:  I realize that "date_smaller" isn't exactly equivalent to my CASE 
> statement;  a NULL value for one of the CASE operands causes the result of 
> the ELSE clause to be returned, whereas "date_smaller" just returns NULL in 
> that case.  In my data, that's significant.

Well, in that case you may have a problem here.  The point of the test
in question is that the expression has to return NULL if *either* input
is null.  Now it's true that the LEFT JOIN only cares that it goes to
NULL when *both* inputs are NULL, but we don't have any way to declare
that particular property of a function.  To make the planner happy you
will have to declare the function as STRICT, which will force it to
behave in the first fashion.

> I suppose that COALESCE has the same problem as CASE ...

Yup, of course.
        regards, tom lane

Re: JOIN performance

From

"Dean Gibson (DB Administrator)"

Date:

21 September 2004, 05:42:54

It turns out that even when I removed the CASE statement from the VIEW, the 
performance problem remained.  I had to remove the conditional as well from 
the VIEW.

To refresh your memory, there was this line in the VIEW (said VIEW being 
the subject of the LEFT JOIN):

cancel_date < expire_date AS canceled,

(cancel_date & expire_date are DATEs in the base view)

When I removed the above line from the VIEW, the performance problem 
disappeared.

Do you know why?  I'm running 7.3.4, by the way.

-- Dean

Re: JOIN performance

From

Tom Lane

Date:

21 September 2004, 06:19:39

"Dean Gibson (DB Administrator)" <postgresql3@ultimeth.com> writes:
> It turns out that even when I removed the CASE statement from the VIEW, the 
> performance problem remained.  I had to remove the conditional as well from 
> the VIEW.

> cancel_date < expire_date AS canceled,

Yuck :-(

> Do you know why?  I'm running 7.3.4, by the way.

7.3 is even stupider than 7.4, is why ...

In 7.3 only a view whose targetlist consists of simple column references
can be pulled up into the nullable side of an outer join.
        regards, tom lane

Re: JOIN performance

From

Greg Stark

Date:

21 September 2004, 19:26:46

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Fixing this properly is a research project, and I haven't thought of any
> quick-and-dirty hacks that aren't too ugly to consider :-(

Just thinking out loud here. Instead of trying to peek inside the CASE
couldn't the optimizer just wrap the non-strict expression in a conditional
that tests whether the row was found?

-- 
greg

Re: COUNT(*) to find records which have a certain number of

From

Martin Knipper

Date:

26 September 2004, 20:24:49

Am 20.09.2004 18:19 schrieb T E Schmitz:

> I figured it eventually. (The only thing I don't know is where to put
> the ORDER BY.)
> 
> 
>>I want to select only those BRAND/MODEL combinations, where the MODEL 
>>has more than one TYPE, but only where one of those has TYPE_NAME='xyz'.
>>I am not interested in MODELs with multiple TYPEs where none of them are 
>>called 'xyz'.
> 
> 

Try this:

select b.brand_name,m.model_name,count(*) from model m inner join
brand b on m.brand_fk=b.brand_pk inner join type t on
m.model_pk=t.model_fk where t.type_name='xyz' group by
b.brand_name,m.model_name having count(m.model_name)>0;

Greetings,

Martin

-- 
Martin Knipper
www  : http://www.mk-os.de
Mail : knipper@mk-os.de

Re: JOIN performance

From

"Dean Gibson (DB Administrator)"

Date:

26 September 2004, 23:37:53

Tom Lane wrote on 2004-09-20 22:19:
>In 7.3 only a view whose targetlist consists of simple column references 
>can be pulled up into the nullable side of an outer join.

OK, well you dragged me kicking and screaming into writing and 'end_date' 
function (IMMUTABLE but non-STRICT).  I then use that in the JOINed VIEW 
(and other strategic places) rather than in the subject of the LEFT JOIN, 
and that works with no performance problems.

Since we mentioned PostgreSQL versions, one of the things I'd like to see 
in v8.0 is support for "DELETE FROM ... AS ..." and "UPDATE ... AS 
...".  For example, I need to do things like:

DELETE FROM xxx AS x WHERE field > xxx.field;

In other words, a DELETE that has criteria similar to a self JOIN.

Right now, I do this via:

CREATE VIEW yyy AS SELECT * FROM xxx;

DELETE FROM xxx WHERE field > yyy.field;

And that works fine.

But it's probably too late for v8.0 ...

-- Dean

Re: COUNT(*) to find records which have a certain number of

From

T E Schmitz

Date:

27 September 2004, 10:45:23

Hallo Martin,

Martin Knipper wrote:

> Am 20.09.2004 18:19 schrieb T E Schmitz:
> 
>>>I want to select only those BRAND/MODEL combinations, where the MODEL 
>>>has more than one TYPE, but only where one of those has TYPE_NAME='xyz'.
>>>I am not interested in MODELs with multiple TYPEs where none of them are 
>>>called 'xyz'.
>>
>>
> 
> Try this:
> 
> select b.brand_name,m.model_name,count(*) from model m inner join
> brand b on m.brand_fk=b.brand_pk inner join type t on
> m.model_pk=t.model_fk where t.type_name='xyz' group by
> b.brand_name,m.model_name having count(m.model_name)>0;
> 

Sorry, no, this doesn't work. It definitely doesn't return the required 
resultset. It just returns all BRAND_NAME,MODEL_NAME for all Models that 
have a Type with TYPE_NAME='xyz'.

-- 


Regards/Gruß,

Tarlika Elisabeth Schmitz

Is NULLIF nullable?

From

"Dean Gibson (DB Administrator)"

Date:

10 November 2004, 03:30:09

Recently I asked about why a field from the nullable side of an OUTER JOIN 
was causing the JOIN to be inefficient, and was told that it was because 
that field had a CASE statement as part of its definition, and that CASE 
(and by extension COALESCE) were non-nullable constructs.

Is NULLIF nullable, in that sense?

Reason:  I'd like to define a VIEW with a field definition thusly:

'P.O. Box' || NULLIF( po_box, ' '::char ) AS ...

And I would like the field thusly defined to be nullable.

-- Dean

Re: Is NULLIF nullable?

From

"Dean Gibson (DB Administrator)"

Date:

10 November 2004, 05:12:57

You can ignore my question below, since I just put the field definition 
below in an SQL FUNCTION and marked it STRICT.

-- Dean

Dean Gibson (DB Administrator) wrote on 2004-11-09 19:29:
>Recently I asked about why a field from the nullable side of an OUTER JOIN 
>was causing the JOIN to be inefficient, and was told that it was because 
>that field had a CASE statement as part of its definition, and that CASE 
>(and by extension COALESCE) were non-nullable constructs.
>
>Is NULLIF nullable, in that sense?
>
>Reason:  I'd like to define a VIEW with a field definition thusly:
>
>'P.O. Box' || NULLIF( po_box, ' '::char ) AS ...
>
>And I would like the field thusly defined to be nullable.
>
>-- Dean