Re: Bad query optimizer misestimation because of TOAST - Mailing list pgsql-performance
From | Markus Schaber |
---|---|
Subject | Re: Bad query optimizer misestimation because of TOAST |
Date | |
Msg-id | 42011B0B.2050305@logi-track.com Whole thread Raw |
In response to | Re: Bad query optimizer misestimation because of TOAST tables (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Bad query optimizer misestimation because of TOAST
|
List | pgsql-performance |
Hi, Tom, Tom Lane schrieb: >>IMHO, this tells the reason. The query planner has a table size of 3 >>pages, which clearly is a case for a seqscan. But during the seqscan, >>the database has to fetch an additional amount of 8225 toast pages and >>127 toast index pages, and rebuild the geometries contained therein. > > I don't buy this analysis at all. The toasted columns are not those in > the index (because we don't support out-of-line-toasted index entries), > so a WHERE clause that only touches indexed columns isn't going to need > to fetch anything from the toast table. The only stuff it would fetch > is in rows that passed the WHERE and need to be returned to the client > --- and those costs are going to be the same either way. > > I'm not entirely sure where the time is going, but I do not think you > have proven your theory about it. I'd suggest building the backend > with -pg and getting some gprof evidence. The column is a PostGIS column, and the index was created using GIST. Those are lossy indices that do not store the whole geometry, but only the bounding box corners of the Geometry (2 Points). Without using the index, the && Operator (which tests for bbox overlapping) has to load the whole geometry from disk, and extract the bbox therein (as it cannot make use of partial fetch). Some little statistics: logigis=# select max(mem_size(geom)), avg(mem_size(geom))::int, max(npoints(geom)) from adminbndy1; max | avg | max ----------+---------+-------- 20998856 | 1384127 | 873657 (1 Zeile) So the geometries use are about 1.3 MB average size, and have a maximum size of 20Mb. I'm pretty shure this cannot be stored without TOASTing. Additionally, my suggested workaround using a separate bbox column really works: logigis=# alter table adminbndy1 ADD column bbox geometry; ALTER TABLE logigis=# update adminbndy1 set bbox = setsrid(box3d(geom)::geometry, 4326); UPDATE 83 logigis=# explain analyze SELECT geom FROM adminbndy1 WHERE bbox && setsrid('BOX3D(9.4835390946502 47.39365740740741,9.5164609053498 47.40634259259259)'::box3d,4326); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Seq Scan on adminbndy1 (cost=100000000.00..100000022.50 rows=1 width=32) (actual time=0.554..0.885 rows=5 loops=1) Filter: (bbox && 'SRID=4326;BOX3D(9.4835390946502 47.3936574074074 0,9.5164609053498 47.4063425925926 0)'::geometry) Total runtime: 0.960 ms (3 Zeilen) Here, the seqential scan matching exactly the same 5 rows only needs about 1/8000th of time, because it does not have to touch the TOAST pages at all. logigis=# \o /dev/null logigis=# \timing Zeitmessung ist an. logigis=# SELECT geom FROM adminbndy1 WHERE geom && setsrid('BOX3D(9.4835390946502 47.39365740740741,9.5164609053498 47.40634259259259)'::box3d,4326); Zeit: 11224,185 ms logigis=# SELECT geom FROM adminbndy1 WHERE bbox && setsrid('BOX3D(9.4835390946502 47.39365740740741,9.5164609053498 47.40634259259259)'::box3d,4326); Zeit: 7689,720 ms So you can see that, when actually detoasting the 5 rows and deserializing the geometries to WKT format (their canonical text representation), the time relation gets better, but there's still a noticeable difference. Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:schabios@logi-track.com | www.logi-track.com
Attachment
pgsql-performance by date: