Thread: Faster with a sub-query then without

Faster with a sub-query then without

From
Martin Foster
Date:
I thought this could generate some interesting discussion.  Essentially,
there are three queries below, two using sub-queries to change the way
the randomized information (works first by author and then by work) and
the original which simply randomizes out of all works available.

The one not using sub-queries under EXPLAIN ANALYZE proves itself to be
less efficient and have a far higher cost then those with the penalty of
a sub-query.   Since this seems to be counter to what I have been told
in the past, I thought I would bring this forward and get some
enlightenment.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org

-----------------------------------------------------------------------

SELECT
  g.GalleryID,
  w.WorkID,
  w.WorkName,
  w.WorkImageThumbnail,
  g.GalleryRating,
  g.GalleryPenName
FROM ethereal.Work w, ethereal.Gallery g
WHERE w.GalleryID = g.GalleryID
  AND g.GalleryPrivacy = 'no'
  AND w.WorkImageThumbnail IS NOT NULL
  AND g.PuppeteerLogin = (SELECT PuppeteerLogin
   FROM ethereal.Gallery
   WHERE GalleryType='image'
   GROUP BY PuppeteerLogin
   ORDER BY RANDOM() LIMIT 1)
ORDER BY RANDOM() LIMIT 1

  Limit  (cost=60.70..60.70 rows=1 width=100) (actual time=1.013..1.013
rows=0 loops=1)
    InitPlan
      ->  Limit  (cost=6.36..6.37 rows=1 width=11) (actual
time=0.711..0.713 rows=1 loops=1)
            ->  Sort  (cost=6.36..6.45 rows=33 width=11) (actual
time=0.708..0.708 rows=1 loops=1)
                  Sort Key: random()
                  ->  HashAggregate  (cost=5.45..5.53 rows=33 width=11)
(actual time=0.420..0.553 rows=46 loops=1)
                        ->  Seq Scan on gallery  (cost=0.00..5.30
rows=60 width=11) (actual time=0.007..0.227 rows=59 loops=1)
                              Filter: ((gallerytype)::text = 'image'::text)
    ->  Sort  (cost=54.33..54.37 rows=16 width=100) (actual
time=1.009..1.009 rows=0 loops=1)
          Sort Key: random()
          ->  Nested Loop  (cost=0.00..54.01 rows=16 width=100) (actual
time=0.981..0.981 rows=0 loops=1)
                ->  Seq Scan on gallery g  (cost=0.00..5.56 rows=2
width=24) (actual time=0.855..0.888 rows=1 loops=1)
                      Filter: (((galleryprivacy)::text = 'no'::text) AND
((puppeteerlogin)::text = ($0)::text))
                ->  Index Scan using pkwork on "work" w
(cost=0.00..24.10 rows=8 width=80) (actual time=0.080..0.080 rows=0 loops=1)
                      Index Cond: (w.galleryid = "outer".galleryid)
                      Filter: (workimagethumbnail IS NOT NULL)
  Total runtime: 1.211 ms

-----------------------------------------------------------------------

SELECT
  g.GalleryID,
  w.WorkID,
  w.WorkName,
  w.WorkImageThumbnail,
  g.GalleryRating,
  g.GalleryPenName
FROM ethereal.Work w, ethereal.Gallery g
WHERE w.GalleryID = g.GalleryID
  AND g.GalleryPrivacy = 'no'
  AND w.WorkImageThumbnail IS NOT NULL
  AND g.GalleryPenName = (SELECT GalleryPenName
   FROM ethereal.Gallery
   WHERE GalleryType='image'
   GROUP BY GalleryPenName
   ORDER BY RANDOM() LIMIT 1)
ORDER BY RANDOM() LIMIT 1

  Limit  (cost=59.92..59.92 rows=1 width=100) (actual time=0.904..0.906
rows=1 loops=1)
    InitPlan
      ->  Limit  (cost=6.69..6.69 rows=1 width=14) (actual
time=0.731..0.733 rows=1 loops=1)
            ->  Sort  (cost=6.69..6.79 rows=42 width=14) (actual
time=0.729..0.729 rows=1 loops=1)
                  Sort Key: random()
                  ->  HashAggregate  (cost=5.45..5.56 rows=42 width=14)
(actual time=0.431..0.568 rows=48 loops=1)
                        ->  Seq Scan on gallery  (cost=0.00..5.30
rows=60 width=14) (actual time=0.011..0.233 rows=59 loops=1)
                              Filter: ((gallerytype)::text = 'image'::text)
    ->  Sort  (cost=53.23..53.27 rows=16 width=100) (actual
time=0.899..0.899 rows=1 loops=1)
          Sort Key: random()
          ->  Nested Loop  (cost=0.00..52.91 rows=16 width=100) (actual
time=0.808..0.862 rows=6 loops=1)
                ->  Index Scan using idxgallery_pen on gallery g
(cost=0.00..4.45 rows=2 width=24) (actual time=0.767..0.769 rows=1 loops=1)
                      Index Cond: ((gallerypenname)::text = ($0)::text)
                      Filter: ((galleryprivacy)::text = 'no'::text)
                ->  Index Scan using pkwork on "work" w
(cost=0.00..24.10 rows=8 width=80) (actual time=0.020..0.042 rows=6 loops=1)
                      Index Cond: (w.galleryid = "outer".galleryid)
                      Filter: (workimagethumbnail IS NOT NULL)
  Total runtime: 1.117 ms

-----------------------------------------------------------------------

SELECT
  g.GalleryID,
  w.WorkID,
  w.WorkName,
  w.WorkImageThumbnail,
  g.GalleryRating,
  g.GalleryPenName
FROM ethereal.Work w, ethereal.Gallery g
WHERE w.GalleryID = g.GalleryID
  AND g.GalleryType = 'image'
  AND g.GalleryPrivacy = 'no'
  AND w.WorkImageThumbnail IS NOT NULL
ORDER BY RANDOM() LIMIT 1

--------
  Limit  (cost=111.73..111.73 rows=1 width=100) (actual
time=13.021..13.023 rows=1 loops=1)
    ->  Sort  (cost=111.73..113.70 rows=786 width=100) (actual
time=13.017..13.017 rows=1 loops=1)
          Sort Key: random()
          ->  Hash Join  (cost=5.55..73.93 rows=786 width=100) (actual
time=1.081..8.320 rows=803 loops=1)
                Hash Cond: ("outer".galleryid = "inner".galleryid)
                ->  Seq Scan on "work" w  (cost=0.00..54.47 rows=817
width=80) (actual time=0.006..2.207 rows=817 loops=1)
                      Filter: (workimagethumbnail IS NOT NULL)
                ->  Hash  (cost=5.30..5.30 rows=100 width=24) (actual
time=0.669..0.669 rows=0 loops=1)
                      ->  Seq Scan on gallery g  (cost=0.00..5.30
rows=100 width=24) (actual time=0.020..0.402 rows=100 loops=1)
                            Filter: ((galleryprivacy)::text = 'no'::text)
  Total runtime: 13.252 ms


Re: Faster with a sub-query then without

From
Tom Lane
Date:
Martin Foster <martin@ethereal-realms.org> writes:
> The one not using sub-queries under EXPLAIN ANALYZE proves itself to be
> less efficient and have a far higher cost then those with the penalty of
> a sub-query.   Since this seems to be counter to what I have been told
> in the past, I thought I would bring this forward and get some
> enlightenment.

The ones with the subqueries are not having to form the full join of W
and G; they just pick a few rows out of G and look up the matching W
rows.

The "subquery penalty" is nonexistent in this case because the
subqueries are not dependent on any variables from the outer query, and
so they need be evaluated only once, rather than once per outer-query
row which is what I suppose you were expecting.  This is reflected in
the EXPLAIN output: notice they are shown as InitPlans not SubPlans.
The outputs of the InitPlans are essentially treated as constants (shown
as $0 in the EXPLAIN output) and the outer plan is approximately what
it would be if you'd written WHERE g.field = 'constant' instead of
WHERE g.field = (select ...)

            regards, tom lane

Re: Faster with a sub-query then without

From
Martin Foster
Date:
Tom Lane wrote:
> Martin Foster <martin@ethereal-realms.org> writes:
>
>>The one not using sub-queries under EXPLAIN ANALYZE proves itself to be
>>less efficient and have a far higher cost then those with the penalty of
>>a sub-query.   Since this seems to be counter to what I have been told
>>in the past, I thought I would bring this forward and get some
>>enlightenment.
>
>
> The ones with the subqueries are not having to form the full join of W
> and G; they just pick a few rows out of G and look up the matching W
> rows.
>
> The "subquery penalty" is nonexistent in this case because the
> subqueries are not dependent on any variables from the outer query, and
> so they need be evaluated only once, rather than once per outer-query
> row which is what I suppose you were expecting.  This is reflected in
> the EXPLAIN output: notice they are shown as InitPlans not SubPlans.
> The outputs of the InitPlans are essentially treated as constants (shown
> as $0 in the EXPLAIN output) and the outer plan is approximately what
> it would be if you'd written WHERE g.field = 'constant' instead of
> WHERE g.field = (select ...)
>
>             regards, tom lane

That would explain it overall.  Still, it does seem unusual when one
puts in additional code, which most literature warns you about and you
actually gain a speed boost.

Thanks!

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org