Anti join miscalculates row number? - Mailing list pgsql-performance

From Jens Reufsteck
Subject Anti join miscalculates row number?
Date
Msg-id 000d01cc93c9$5d3a8e90$17afabb0$@reufsteck@staufenbiel.de
Whole thread Raw
Responses Re: Anti join miscalculates row number?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Anti join miscalculates row number?  (Julius Tuskenis <julius@nsoft.lt>)
List pgsql-performance
I’ve got a lengthy query, that doesn't finish in reasonable time (i.e.
10min+). I suspect, that the query optimizer miscalculates the number of
rows for part of the query.

The suspicious subquery:

SELECT
                sv1.sid as sid
            FROM stud_vera sv1
            LEFT JOIN stud_vera AS sv2
            ON (
                sv1.sid=sv2.sid
                AND sv2.veraid IN ( 109 )
                )
            WHERE sv1.veraid IN ( 3 )
            AND sv2.veraid IS NULL

The whole query:

SELECT count(DISTINCT sid) AS Anzahl FROM (SELECT sid
                        FROM stud
                        WHERE (
                                        status IN (1,2)
                                        AND length(vname) > 1
                                        AND length(nname) > 1
                    AND length(email) > 1

                                )) AS stud INNER JOIN (SELECT DISTINCT
                                sid,
                                cast(created AS date) AS tag,
                                cast(veradate AS DATE) -
cast(stud_vera.created AS DATE) AS tage,
                                cast(
                                        floor(
                                                (cast(veradate AS date) -
cast(stud_vera.created AS date))/7
                                        )
                                AS integer) AS woche,
                                cast(extract(week from stud_vera.created) AS
integer) AS kalenderwoche,
                                to_char(stud_vera.created, 'YYYY/MM') AS
monat,
                                to_char(stud_vera.abgemeldet, 'YYYY/MM') AS
abmeldemonat,
                                CASE
                                        WHEN newsletterid &1 = 1 THEN 'Flag
1'
                                        WHEN newsletterid &2 = 2 THEN 'Flag
2'
                                        WHEN newsletterid &4 = 2 THEN 'Flag
3'
                                        WHEN newsletterid &8 = 8 THEN 'Flag
4'
                                        WHEN newsletterid &16 = 16 THEN
'Flag 5'
                                        WHEN newsletterid &32 = 32 THEN
'Flag 6'
                                        WHEN newsletterid &64 = 64 THEN
'Flag 7'
                                        WHEN newsletterid &128 = 128 THEN
'Flag 8'
                                END AS newsletterid
                        FROM stud_vera
                        JOIN vera USING (veraid)
                        WHERE


                        stud_vera.status > 0
                        AND abgemeldet is null




                        AND veraid IN (

                                        3

                        )) AS vera USING (sid)  INNER JOIN (SELECT
                        sid,
                        age(date_trunc('MONTH', now()), date_trunc('MONTH',
bis)) || '' AS months
                FROM
                        study
                WHERE status = 1

                                AND

                                        age(date_trunc('MONTH', now()),
date_trunc('MONTH', bis)) < interval '60 months'


                                        AND


                                        age(date_trunc('MONTH', bis),
date_trunc('MONTH', now())) <= interval '-24 months') AS examen USING (sid)
INNER JOIN (SELECT
                                sv1.sid as sid
                        FROM stud_vera sv1
                        LEFT JOIN stud_vera AS sv2
                        ON (
                                sv1.sid=sv2.sid
                                AND sv2.veraid IN ( 109 )
                                )
                        WHERE sv1.veraid IN ( 3 )
                        AND sv2.veraid IS NULL) AS veraAusschluss USING
(sid)


And the explain analyze for the sub query: http://explain.depesz.com/s/8d2
And the explain for the whole query: http://explain.depesz.com/s/GGf
(explain analyze doesn't finish in reasonable time)

What strucks me, is that in the sub query row numbers for sv1 and sv2 are
calculated quite accurat. But the resulting 4 rows after the final join is
far from reality. Shouldn't this be as minimum the number of rows for sv1
minus number of rows for sv2?

If the optimizer knew, that the number is much bigger, it probably wouldn't
choose the nested loop in the next step, which I suspect is the reason for
the performance issues.

We're using postgres 9.0.4. It might be interesting, that the same query
runs smoothly on our test system with postgres 8.3.7.

The tables:

                                           Tabelle »public.stud_vera«
      Spalte       |             Typ             |
Attribute
-------------------+-----------------------------+--------------------------
------------------------------------
 svid              | integer                     | not null Vorgabewert
nextval('stud_vera_svid_seq'::regclass)
 sid               | integer                     |
 veraid            | integer                     |
 modified          | timestamp without time zone | not null Vorgabewert
now()
 created           | timestamp without time zone | not null Vorgabewert
now()
 verastep1         | timestamp without time zone |
 kontoinhaber      | character varying(64)       |
 kontonum          | character varying(32)       |
 blz               | character varying(32)       |
 bank              | character varying(64)       |
 betrag            | numeric(5,2)                |
 verastep2         | timestamp without time zone |
 deferred          | smallint                    |
 verastep3         | timestamp without time zone |
 status            | smallint                    | not null
 label             | character varying(128)      |
 kanalid           | integer                     |
 deferredtxt       | character varying(256)      |
 comment           | character varying(64)       |
 label2            | character varying(128)      |
 dstid             | integer                     |
 abgemeldet        | date                        |
 bsid              | integer                     | Vorgabewert 1
 newsletterid      | integer                     |
 abmeldenewsletter | integer                     | not null Vorgabewert 0
 kanalcomment      | character varying(128)      |
Indexe:
    "stud_vera_pkey" PRIMARY KEY, btree (svid)
    "stud_vera_sid_veraid_idx" UNIQUE, btree (sid, veraid)
    "stud_vera_sid_idx" btree (sid)
    "stud_vera_veraid_idx" btree (veraid)
Fremdschlüssel-Constraints:
    "$1" FOREIGN KEY (sid) REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE
CASCADE
    "$2" FOREIGN KEY (veraid) REFERENCES vera(veraid) ON UPDATE CASCADE ON
DELETE SET NULL
    "stud_vera_dstid" FOREIGN KEY (dstid) REFERENCES datenschutztext(dstid)
ON UPDATE CASCADE ON DELETE SET NULL
Fremdschlüsselverweise von:
    TABLE "eingang" CONSTRAINT "eingang_svid_fkey" FOREIGN KEY (svid)
REFERENCES stud_vera(svid) ON UPDATE CASCADE ON DELETE CASCADE

                                        Tabelle »public.stud«
    Spalte     |             Typ             |
Attribute
---------------+-----------------------------+------------------------------
--------------------------
 sid           | integer                     | not null Vorgabewert
nextval('stud_sid_seq'::regclass)
 login         | character varying(64)       | not null
 passwd        | character varying(32)       |
 modified      | timestamp without time zone | not null Vorgabewert now()
 created       | timestamp without time zone | not null Vorgabewert now()
 lastlogin     | timestamp without time zone |
 mow           | smallint                    |
 titel         | character varying(32)       |
 vname         | character varying(32)       |
 nname         | character varying(32)       |
 birth         | date                        |
 einstieg      | date                        |
 blacksheep    | integer                     |
 studstatusid  | integer                     |
 status        | smallint                    | not null
 studmodified  | timestamp without time zone |
 adminmodified | timestamp without time zone |
 comment       | character varying(128)      |
 dstid         | integer                     |
 linkid        | integer                     |
 beesiteuserid | integer                     |
 ypdate        | date                        |
 email         | character varying(64)       |
 flag          | smallint                    |
Indexe:
    "stud_pkey" PRIMARY KEY, btree (sid)
    "stud_login_idx" UNIQUE, btree (login)
    "stud_login_lower" btree (lower(login::text))
    "stud_nname_idx" btree (lower(nname::text))
    "stud_sid_status_idx" btree (sid, status)
    "stud_vname_idx" btree (lower(vname::text))
Check-Constraints:
    "birth" CHECK (birth >= '1900-01-01'::date AND birth <=
'1999-12-31'::date)
Fremdschlüssel-Constraints:
    "stud_dstid_fkey" FOREIGN KEY (dstid) REFERENCES datenschutztext(dstid)
ON UPDATE CASCADE ON DELETE SET NULL
Fremdschlüsselverweise von:
    TABLE "stud_vera" CONSTRAINT "$1" FOREIGN KEY (sid) REFERENCES stud(sid)
ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "stud_wdwolle" CONSTRAINT "$1" FOREIGN KEY (sid) REFERENCES
stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "stud_staats" CONSTRAINT "$1" FOREIGN KEY (sid) REFERENCES
stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "locking" CONSTRAINT "$1" FOREIGN KEY (sid) REFERENCES stud(sid)
ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "stud_ad" CONSTRAINT "$1" FOREIGN KEY (sid) REFERENCES stud(sid)
ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "gutschein" CONSTRAINT "$2" FOREIGN KEY (sid) REFERENCES stud(sid)
ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "kontakt" CONSTRAINT "kontakt_sid_fkey" FOREIGN KEY (sid)
REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "stud_ae" CONSTRAINT "stud_ae_sid_fkey" FOREIGN KEY (sid)
REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "stud_berufsfeld" CONSTRAINT "stud_berufsfeld_fk_sid" FOREIGN KEY
(sid) REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "stud_einstiegsbereich" CONSTRAINT
"stud_einstiegsbereich_fkey_sid" FOREIGN KEY (sid) REFERENCES stud(sid) ON
UPDATE CASCADE ON DELETE CASCADE
    TABLE "stud_vakanzen" CONSTRAINT "stud_vakanzen_sid_fkey" FOREIGN KEY
(sid) REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "stud_vposition" CONSTRAINT "stud_vposition_sid_fkey" FOREIGN KEY
(sid) REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "study" CONSTRAINT "study_sid_fkey" FOREIGN KEY (sid) REFERENCES
stud(sid) ON UPDATE CASCADE ON DELETE CASCADE

                                         Tabelle »public.vera«
    Spalte    |             Typ             |
Attribute
--------------+-----------------------------+-------------------------------
----------------------------
 veraid       | integer                     | not null Vorgabewert
nextval('vera_veraid_seq'::regclass)
 vera         | character varying(64)       |
 verakurz     | character varying(32)       |
 vera_e       | character varying(8)        |
 vera_e2      | character varying(8)        |
 veratyp      | smallint                    |
 veradate     | date                        |
 veradauer    | integer                     |
 veraort      | character varying(32)       |
 veraland     | character varying(32)       |
 veracomment  | character varying(255)      |
 active       | smallint                    |
 status       | smallint                    |
 landid       | integer                     |
 spontandate  | date                        |
 spontandate2 | date                        |
 dstid        | integer                     |
 xmlconf      | character varying(128)      |
 verakurz2    | character varying(32)       |
 closingdate  | timestamp without time zone |
 url          | character varying(128)      |
 urltext      | character varying(32)       | Vorgabewert 'Zum
Event'::character varying
 etflag       | integer                     |
Indexe:
    "vera_pkey" PRIMARY KEY, btree (veraid)
    "vera_verakurz_unique" UNIQUE, btree (verakurz)
Fremdschlüssel-Constraints:
    "vera_dstid" FOREIGN KEY (dstid) REFERENCES datenschutztext(dstid) ON
UPDATE CASCADE ON DELETE SET NULL
Fremdschlüsselverweise von:
    TABLE "vera_reihe" CONSTRAINT "$1" FOREIGN KEY (veraid) REFERENCES
vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "gutschein" CONSTRAINT "$1" FOREIGN KEY (veraid) REFERENCES
vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "prod_vera" CONSTRAINT "$2" FOREIGN KEY (veraid) REFERENCES
vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "stud_vera" CONSTRAINT "$2" FOREIGN KEY (veraid) REFERENCES
vera(veraid) ON UPDATE CASCADE ON DELETE SET NULL
    TABLE "auswahlevent" CONSTRAINT "auswahlevent_veraid_fkey" FOREIGN KEY
(veraid) REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "inside" CONSTRAINT "fk_inside_veraid" FOREIGN KEY (veraid)
REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "fprofil" CONSTRAINT "fprofil_veraid_fkey" FOREIGN KEY (veraid)
REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "mailversand" CONSTRAINT "mailversand_veraid_fkey" FOREIGN KEY
(veraid) REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "mailvorlage" CONSTRAINT "mailvorlage_veraid_fkey" FOREIGN KEY
(veraid) REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "raum" CONSTRAINT "raum_veraid_fkey" FOREIGN KEY (veraid)
REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "schiene" CONSTRAINT "schiene_veraid_fkey" FOREIGN KEY (veraid)
REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "vakanzen" CONSTRAINT "vakanzen_veraid_fkey" FOREIGN KEY (veraid)
REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "vposition" CONSTRAINT "vposition_veraid_fkey" FOREIGN KEY
(veraid) REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE

                                               Tabelle »public.study«
       Spalte        |              Typ               |
Attribute
---------------------+--------------------------------+---------------------
----------------------------------------
 studyid             | integer                        | not null Vorgabewert
nextval('study_studyid_seq'::regclass)
 sid                 | integer                        | not null
 modified            | timestamp(0) without time zone | not null Vorgabewert
now()
 created             | timestamp(0) without time zone | not null Vorgabewert
now()
 abschlusstypid      | integer                        |
 uniid               | integer                        |
 von                 | date                           |
 bis                 | date                           |
 unisonstige         | character varying(128)         |
 unilandid           | integer                        |
 ausrichtungsonstige | character varying(64)          |
 vertiefungsonstige  | character varying(64)          |
 qnoteid             | integer                        |
 status              | smallint                       | not null Vorgabewert
1
Indexe:
    "study_pkey" PRIMARY KEY, btree (studyid)
    "study_sid_idx" btree (sid)
Fremdschlüssel-Constraints:
    "study_sid_fkey" FOREIGN KEY (sid) REFERENCES stud(sid) ON UPDATE
CASCADE ON DELETE CASCADE
Fremdschlüsselverweise von:
    TABLE "study_ausrichtung" CONSTRAINT "study_ausrichtung_studyid_fkey"
FOREIGN KEY (studyid) REFERENCES study(studyid) ON UPDATE CASCADE ON DELETE
CASCADE
    TABLE "study_vertiefung" CONSTRAINT "study_vertiefung_fkey1" FOREIGN KEY
(studyid) REFERENCES study(studyid) ON UPDATE CASCADE ON DELETE CASCADE

Many thanks

--
Jens Reufsteck



pgsql-performance by date:

Previous
From: Linos
Date:
Subject: Re: CTE vs Subquery
Next
From: Merlin Moncure
Date:
Subject: Re: CTE vs Subquery