Home > mailing lists

Re: query performance with hstore vs. non-hstore - Mailing list pgsql-performance

From	David G Johnston
Subject	Re: query performance with hstore vs. non-hstore
Date	September 2, 2014 06:38:29
Msg-id	1409629101885-5817281.post@n5.nabble.com Whole thread Raw
In response to	Re: query performance with hstore vs. non-hstore ("Huang, Suya" <Suya.Huang@au.experian.com>)
Responses	Re: query performance with hstore vs. non-hstore ("Huang, Suya" <Suya.Huang@au.experian.com>)
List	pgsql-performance

Tree view

Huang, Suya wrote
> See output of explain (analyze,timing off), the total runtime is close to
> the one enable timing.

Calling 43s "close to" 70s doesn't sound right...


> dev=# explain (analyze, timing off) select cha_type, sum(visits) from
> (select (each(visits)).key as cha_type,(each(visits)).value::numeric as
> visits from weekly_hstore a  join seg1 b on a.ref_id=b.ref_id )foo  group
> by cha_type  order by sum(visits) desc;

What version of PostgreSQL are you using?

Two calls to each() and cast to numeric are not free.

Your sequential scan savings is nearly 9 seconds but you lose all of that,
and more, when PostgreSQL evaluates the result of the scan and has to
process the each() and the cast before it performs the join against the
expanded result.  There is no planner node for this activity but it does
cost time - in this case more time than it would take to simply store the
native data types in separate rows.

You really should expand the hstore after the join (i.e., in the top-most
select-list) but in this case since the join removed hardly any rows the
gain from doing so would be minimal.  The idea being you should not expand
the hstore of any row that fails the join condition since it will not end up
in the final result anyway.

Also, in this specific case, the call to each(...).key is pointless - you
never use the data.

If you did need to use both columns, and are using 9.3, you should re-write
this to use LATERAL.

In 9.2- you, possibly using a CTE, could do something like this:

SELECT (each).* FROM (
SELECT each(hs) FROM ( VALUES('k=>1'::hstore) ) h (hs)
) src

This is a single call to each(), in a subquery, which result is then
expanded using (col).* notation in the parent query.  This avoids calling
each twice - and note that (each(...).*) does not work to avoid the
double-call - you have to use a subquery / a CTE one to ensure that it is
not collapsed (offset 0 should work too but I find the CTE one a little
cleaner personally).

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/query-performance-with-hstore-vs-non-hstore-tp5817109p5817281.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

pgsql-performance by date:

From: "Huang, Suya"
Date: 02 September 2014, 03:57:02
Subject: Re: query performance with hstore vs. non-hstore

From: Xiaoyulei
Date: 02 September 2014, 10:15:34
Subject: why after increase the hash table partitions, tpmc decrease

Re: query performance with hstore vs. non-hstore - Mailing list pgsql-performance

Previous

Next