Re: Another try at reducing repeated detoast work for PostGIS - Mailing list pgsql-hackers

From Mark Cave-Ayland
Subject Re: Another try at reducing repeated detoast work for PostGIS
Date
Msg-id 4A8A7287.1080208@siriusit.co.uk
Whole thread Raw
In response to Another try at reducing repeated detoast work for PostGIS  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Another try at reducing repeated detoast work for PostGIS  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:

> There was recently another go-round on the postgis-devel list about
> the same problem Mark Cave-Ayland complained about last year:
> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00384.php
> Basically, what is happening is a nestloop join where the inner
> indexscan gets a comparison argument from the outer table, and that
> argument is toasted.  Every single call of an index support function
> detoasts the argument again :-(, leading to upwards of 90% of the
> runtime being spent in detoasting.
> 
> I made a proposal to fix it
> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00709.php
> but that failed due to excessive ambition --- the cost/benefit/risk
> tradeoffs just weren't good enough.
> 
> Thinking about it again, it seems to me that a much narrower patch
> could solve the specific forms of the problem that the PostGIS folk
> are seeing.  Instead of trying to have a general-purpose method of
> preventing repeat de-toasting, we could just prevent it for inner
> indexscans by having ExecIndexEvalRuntimeKeys() detoast anything it's
> passing to the index AM.  The attached patch accomplishes this with
> a net addition of about three statements.  (It looks bigger, because
> I had to move a hunk of code to have the datatype info available when
> needed.)  Paul Ramsey reports that this fixes the problem for him:
> http://postgis.refractions.net/pipermail/postgis-devel/2009-August/006659.html
> 
> The only downside I can see offhand is that it will detoast short-header
> values that might not actually need to be detoasted.  But we don't have
> any very good way to know whether a datatype's index support functions
> use PG_DETOAST_DATUM or PG_DETOAST_DATUM_PACKED.  In the former case we
> do need to detoast short-header values.  The extra overhead in this case
> amounts to only one palloc and a fairly short memcpy, which should be
> pretty negligible in comparison to the other setup costs of an
> indexscan, so I'm not very concerned about it.
> 
> Comments?  Better ideas?
> 
>             regards, tom lane


Hi Tom,

Thanks for the patch. Fortunately enough I was able to find the dataset 
from the original report above, and so I've tested your patch against 
PostgreSQL 8.4. Unfortunately in the original test case, it doesn't seem 
to give the same performance improvement for me that Paul was seeing :(


postgis13=# \d geography                                   Table "public.geography"   Column   |          Type
|                       Modifiers
 
------------+------------------------+--------------------------------------------------------- gid        | integer
           | not null default 
 
nextval('geography_gid_seq'::regclass) type       | character varying(1)   | id         | numeric(20,0)          | name
     | character varying(32)  | population | numeric(20,0)          | abbreviati | character varying(2)   | po_name
|character varying(100) | id_geo_sta | numeric(20,0)          | the_geom   | geometry               | centroid   |
geometry              |
 
Indexes:    "geography_pkey" PRIMARY KEY, btree (gid)    "idx_geography_centroid_z" gist (centroid)
Check constraints:    "enforce_dims_the_geom" CHECK (ndims(the_geom) = 2)    "enforce_geotype_the_geom" CHECK
(geometrytype(the_geom)= 
 
'MULTIPOLYGON'::text OR the_geom IS NULL)    "enforce_srid_the_geom" CHECK (srid(the_geom) = (-1))


PostgreSQL 8.4 vanilla:

postgis13=# explain analyze select count(*) from geography where 
type='Z' and centroid && (select the_geom from geography where id=69495);
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual 
 
time=1770.333..1770.334 rows=1 loops=1)   InitPlan 1 (returns $0)     ->  Seq Scan on geography  (cost=0.00..6884.00
rows=1width=4432) 
 
(actual time=19.529..43.893 rows=1 loops=1)           Filter: (id = 69495::numeric)   ->  Index Scan using
idx_geography_centroid_zon geography 
 
(cost=0.00..8.28 rows=1 width=0) (actual time=48.897..1730.004 
rows=29687 loops=1)         Index Cond: (centroid && $0)         Filter: ((type)::text = 'Z'::text) Total runtime:
1770.417ms
 
(8 rows)


PostgreSQL 8.4 with your patch applied:

postgis13=# explain analyze select count(*) from geography where 
type='Z' and centroid && (select the_geom from geography where id=69495);
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual 
 
time=1499.414..1499.416 rows=1 loops=1)   InitPlan 1 (returns $0)     ->  Seq Scan on geography  (cost=0.00..6884.00
rows=1width=4432) 
 
(actual time=18.901..42.648 rows=1 loops=1)           Filter: (id = 69495::numeric)   ->  Index Scan using
idx_geography_centroid_zon geography 
 
(cost=0.00..8.28 rows=1 width=0) (actual time=43.133..1456.944 
rows=29687 loops=1)         Index Cond: (centroid && $0)         Filter: ((type)::text = 'Z'::text) Total runtime:
1499.498ms
 
(8 rows)


PostgreSQL 8.4 with force_2d() wrapper function:

postgis13=# explain analyze select count(*) from geography where 
type='Z' and centroid && (select force_2d(the_geom) from geography where 
id=69495);
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual 
 
time=285.974..285.975 rows=1 loops=1)   InitPlan 1 (returns $0)     ->  Seq Scan on geography  (cost=0.00..6884.00
rows=1width=4432) 
 
(actual time=19.278..48.056 rows=1 loops=1)           Filter: (id = 69495::numeric)   ->  Index Scan using
idx_geography_centroid_zon geography 
 
(cost=0.00..8.28 rows=1 width=0) (actual time=48.375..196.404 rows=29687 
loops=1)         Index Cond: (centroid && $0)         Filter: ((type)::text = 'Z'::text) Total runtime: 286.057 ms


So while there is some improvement with your patch, it still comes 
nowhere close to the speed obtained with wrapping the column in function 
that deTOASTs the input. Could it be that your patch misses out this 
particular case?


ATB,

Mark.

-- 
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: explain root element for auto-explain
Next
From: "simon@2ndquadrant.com"
Date:
Subject: Lazy Snapshots