Thread: Performance issues
Hi,
I am running Postgresql 8.4.7 with Postgis 2.0 (for raster support).
Server is mainly 1 user for spatial data processing. This involves queries that can take hours.
This is running on a ubuntu 10.10 Server with Core2Duo 6600 @ 2.4 GHZ, 6 GB RAM.
My postgresql.conf:
# - Memory -
shared_buffers = 1024MB # min 128kB
# (change requires restart)
temp_buffers = 256MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
# (change requires restart)
# Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
work_mem = 1024MB # min 64kB
maintenance_work_mem = 256MB # min 1MB
max_stack_depth = 7MB # min 100kB
wal_buffers = 8MB
effective_cache_size = 3072MB
Everything else is default.
My Pgbench results:
/usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
duration: 60 s
number of transactions actually processed: 7004
tps = 116.728199 (including connections establishing)
tps = 116.733012 (excluding connections establishing)
My question is if these are acceptable results, or if someone can recommend settings which will improve my servers performance.
Andreas
I am running Postgresql 8.4.7 with Postgis 2.0 (for raster support).
Server is mainly 1 user for spatial data processing. This involves queries that can take hours.
This is running on a ubuntu 10.10 Server with Core2Duo 6600 @ 2.4 GHZ, 6 GB RAM.
My postgresql.conf:
# - Memory -
shared_buffers = 1024MB # min 128kB
# (change requires restart)
temp_buffers = 256MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
# (change requires restart)
# Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
work_mem = 1024MB # min 64kB
maintenance_work_mem = 256MB # min 1MB
max_stack_depth = 7MB # min 100kB
wal_buffers = 8MB
effective_cache_size = 3072MB
Everything else is default.
My Pgbench results:
/usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
duration: 60 s
number of transactions actually processed: 7004
tps = 116.728199 (including connections establishing)
tps = 116.733012 (excluding connections establishing)
My question is if these are acceptable results, or if someone can recommend settings which will improve my servers performance.
Andreas
On Mon, Mar 07, 2011 at 02:45:03PM +0100, Andreas For? Tollefsen wrote: > Hi, > > I am running Postgresql 8.4.7 with Postgis 2.0 (for raster support). > Server is mainly 1 user for spatial data processing. This involves queries > that can take hours. > > This is running on a ubuntu 10.10 Server with Core2Duo 6600 @ 2.4 GHZ, 6 GB > RAM. > > My postgresql.conf: > # - Memory - > shared_buffers = 1024MB # min 128kB > # (change requires restart) > temp_buffers = 256MB # min 800kB > #max_prepared_transactions = 0 # zero disables the feature > # (change requires restart) > # Note: Increasing max_prepared_transactions costs ~600 bytes of shared > memory > # per transaction slot, plus lock space (see max_locks_per_transaction). > # It is not advisable to set max_prepared_transactions nonzero unless you > # actively intend to use prepared transactions. > work_mem = 1024MB # min 64kB > maintenance_work_mem = 256MB # min 1MB > max_stack_depth = 7MB # min 100kB > wal_buffers = 8MB > effective_cache_size = 3072MB > > Everything else is default. > > My Pgbench results: > /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1 > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > query mode: simple > number of clients: 1 > duration: 60 s > number of transactions actually processed: 7004 > tps = 116.728199 (including connections establishing) > tps = 116.733012 (excluding connections establishing) > > > My question is if these are acceptable results, or if someone can recommend > settings which will improve my servers performance. > > Andreas Your results are I/O limited. Depending upon your requirements, you may be able to turn off synchronous_commit which can help. Your actual workload may be able to use batching to help as well. Your work_mem looks pretty darn high for a 6GB system. Cheers, Ken
Thanks, Ken.
It seems like the tip to turn off synchronous_commit did the trick:
/usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
duration: 60 s
number of transactions actually processed: 86048
tps = 1434.123199 (including connections establishing)
tps = 1434.183362 (excluding connections establishing)
Is this acceptable compared to others when considering my setup?
Cheers,
Andreas
2011/3/7 Kenneth Marshall <ktm@rice.edu>
Your results are I/O limited. Depending upon your requirements,On Mon, Mar 07, 2011 at 02:45:03PM +0100, Andreas For? Tollefsen wrote:
> Hi,
>
> I am running Postgresql 8.4.7 with Postgis 2.0 (for raster support).
> Server is mainly 1 user for spatial data processing. This involves queries
> that can take hours.
>
> This is running on a ubuntu 10.10 Server with Core2Duo 6600 @ 2.4 GHZ, 6 GB
> RAM.
>
> My postgresql.conf:
> # - Memory -
> shared_buffers = 1024MB # min 128kB
> # (change requires restart)
> temp_buffers = 256MB # min 800kB
> #max_prepared_transactions = 0 # zero disables the feature
> # (change requires restart)
> # Note: Increasing max_prepared_transactions costs ~600 bytes of shared
> memory
> # per transaction slot, plus lock space (see max_locks_per_transaction).
> # It is not advisable to set max_prepared_transactions nonzero unless you
> # actively intend to use prepared transactions.
> work_mem = 1024MB # min 64kB
> maintenance_work_mem = 256MB # min 1MB
> max_stack_depth = 7MB # min 100kB
> wal_buffers = 8MB
> effective_cache_size = 3072MB
>
> Everything else is default.
>
> My Pgbench results:
> /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
> starting vacuum...end.
> transaction type: TPC-B (sort of)
> scaling factor: 1
> query mode: simple
> number of clients: 1
> duration: 60 s
> number of transactions actually processed: 7004
> tps = 116.728199 (including connections establishing)
> tps = 116.733012 (excluding connections establishing)
>
>
> My question is if these are acceptable results, or if someone can recommend
> settings which will improve my servers performance.
>
> Andreas
you may be able to turn off synchronous_commit which can help.
Your actual workload may be able to use batching to help as well.
Your work_mem looks pretty darn high for a 6GB system.
Cheers,
Ken
On Mon, Mar 07, 2011 at 03:17:05PM +0100, Andreas For? Tollefsen wrote: > Thanks, Ken. > > It seems like the tip to turn off synchronous_commit did the trick: > > /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1 > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 1 > query mode: simple > number of clients: 1 > duration: 60 s > number of transactions actually processed: 86048 > tps = 1434.123199 (including connections establishing) > tps = 1434.183362 (excluding connections establishing) > > Is this acceptable compared to others when considering my setup? > > Cheers, > Andreas > These are typical results for synchronous_commit off. The caveat is you must be able to handle loosing transactions if you have a database crash, but your database is still intact. This differs from turning fsync off in which a crash means you would need to restore from a backup. Cheers, Ken
Ok. Cheers. I will do some more testing on my heavy PostGIS queries which often takes hours to complete.
Thanks.
Andreas
2011/3/7 Kenneth Marshall <ktm@rice.edu>
On Mon, Mar 07, 2011 at 03:17:05PM +0100, Andreas For? Tollefsen wrote:These are typical results for synchronous_commit off. The caveat
> Thanks, Ken.
>
> It seems like the tip to turn off synchronous_commit did the trick:
>
> /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
> starting vacuum...end.
> transaction type: TPC-B (sort of)
> scaling factor: 1
> query mode: simple
> number of clients: 1
> duration: 60 s
> number of transactions actually processed: 86048
> tps = 1434.123199 (including connections establishing)
> tps = 1434.183362 (excluding connections establishing)
>
> Is this acceptable compared to others when considering my setup?
>
> Cheers,
> Andreas
>
is you must be able to handle loosing transactions if you have a
database crash, but your database is still intact. This differs
from turning fsync off in which a crash means you would need to
restore from a backup.
Cheers,
Ken
On Mon, 7 Mar 2011, Andreas For? Tollefsen wrote: > Ok. Cheers. I will do some more testing on my heavy PostGIS queries which > often takes hours to complete. I'd like to see hours long queries :) EXPLAIN ANALYZE > > Thanks. > Andreas > > 2011/3/7 Kenneth Marshall <ktm@rice.edu> > >> On Mon, Mar 07, 2011 at 03:17:05PM +0100, Andreas For? Tollefsen wrote: >>> Thanks, Ken. >>> >>> It seems like the tip to turn off synchronous_commit did the trick: >>> >>> /usr/lib/postgresql/8.4/bin/pgbench -T 60 test1 >>> starting vacuum...end. >>> transaction type: TPC-B (sort of) >>> scaling factor: 1 >>> query mode: simple >>> number of clients: 1 >>> duration: 60 s >>> number of transactions actually processed: 86048 >>> tps = 1434.123199 (including connections establishing) >>> tps = 1434.183362 (excluding connections establishing) >>> >>> Is this acceptable compared to others when considering my setup? >>> >>> Cheers, >>> Andreas >>> >> >> >> These are typical results for synchronous_commit off. The caveat >> is you must be able to handle loosing transactions if you have a >> database crash, but your database is still intact. This differs >> from turning fsync off in which a crash means you would need to >> restore from a backup. >> >> Cheers, >> Ken >> > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
The synchronous_commit off increased the TPS, but not the speed of the below query.
Oleg:
This is a query i am working on now. It creates an intersection of two geometries. One is a grid of 0.5 x 0.5 decimal degree sized cells, while the other is the country geometries of all countries in the world for a certain year.priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode, ST_Intersection(pri
ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land, cshapeswdate WHERE ST_In
tersects(priogrid_land.cell, cshapeswdate.geom);
QUERY PLAN
--------------------------------------------------------------------------------
------------------------------------------------------------------
Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual time=1.815..7
074973.711 rows=130331 loops=1)
Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom)
-> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242 width=87248) (actual
time=0.007..0.570 rows=242 loops=1)
-> Index Scan using idx_priogrid_land_cell on priogrid_land (cost=0.00..7.1
5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
Index Cond: (priogrid_land.cell && cshapeswdate.geom)
Total runtime: 7075188.549 ms
(6 rows)
2011/3/7 Oleg Bartunov <oleg@sai.msu.su>
On Mon, 7 Mar 2011, Andreas For? Tollefsen wrote:I'd like to see hours long queries :) EXPLAIN ANALYZEOk. Cheers. I will do some more testing on my heavy PostGIS queries which
often takes hours to complete.Regards,
Thanks.
Andreas
2011/3/7 Kenneth Marshall <ktm@rice.edu>On Mon, Mar 07, 2011 at 03:17:05PM +0100, Andreas For? Tollefsen wrote:Thanks, Ken.
It seems like the tip to turn off synchronous_commit did the trick:
/usr/lib/postgresql/8.4/bin/pgbench -T 60 test1
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
duration: 60 s
number of transactions actually processed: 86048
tps = 1434.123199 (including connections establishing)
tps = 1434.183362 (excluding connections establishing)
Is this acceptable compared to others when considering my setup?
Cheers,
Andreas
These are typical results for synchronous_commit off. The caveat
is you must be able to handle loosing transactions if you have a
database crash, but your database is still intact. This differs
from turning fsync off in which a crash means you would need to
restore from a backup.
Cheers,
Ken
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
On Mon, Mar 07, 2011 at 10:49:48PM +0100, Andreas For Tollefsen wrote: - The synchronous_commit off increased the TPS, but not the speed of the below - query. - - Oleg: - This is a query i am working on now. It creates an intersection of two - geometries. One is a grid of 0.5 x 0.5 decimal degree sized cells, while the - other is the country geometries of all countries in the world for a certain - year. - - priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode, - ST_Intersection(pri - ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land, cshapeswdate WHERE - ST_In - tersects(priogrid_land.cell, cshapeswdate.geom); - QUERY - PLAN - - -------------------------------------------------------------------------------- - ------------------------------------------------------------------ - Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual - time=1.815..7 - 074973.711 rows=130331 loops=1) - Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom) - -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242 width=87248) - (actual - time=0.007..0.570 rows=242 loops=1) - -> Index Scan using idx_priogrid_land_cell on priogrid_land - (cost=0.00..7.1 - 5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242) - Index Cond: (priogrid_land.cell && cshapeswdate.geom) - Total runtime: 7075188.549 ms - (6 rows) Your estimated and actuals are way off, have you analyzed those tables? Dave
=?ISO-8859-1?Q?Andreas_For=F8_Tollefsen?= <andreasft@gmail.com> writes: > This is a query i am working on now. It creates an intersection of two > geometries. One is a grid of 0.5 x 0.5 decimal degree sized cells, while the > other is the country geometries of all countries in the world for a certain > year. Hm, are you sure your data is right? Because the actual rowcounts imply that each country intersects about half of the grid cells, which doesn't seem right. > priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode, > ST_Intersection(pri > ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land, cshapeswdate WHERE > ST_Intersects(priogrid_land.cell, cshapeswdate.geom); > QUERY > PLAN > -------------------------------------------------------------------------------- > ------------------------------------------------------------------ > Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual > time=1.815..7 > 074973.711 rows=130331 loops=1) > Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom) > -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242 width=87248) > (actual > time=0.007..0.570 rows=242 loops=1) > -> Index Scan using idx_priogrid_land_cell on priogrid_land > (cost=0.00..7.1 > 5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242) > Index Cond: (priogrid_land.cell && cshapeswdate.geom) > Total runtime: 7075188.549 ms > (6 rows) AFAICT, all of the runtime is going into calculating the ST_Intersects and/or ST_Intersection functions. The two scans are only accounting for perhaps 5.5 seconds, and the join infrastructure isn't going to be terribly expensive, so it's got to be those functions. Not knowing much about PostGIS, I don't know if the functions themselves can be expected to be really slow. If it's not them, it could be the cost of fetching their arguments --- in particular, I bet the country outlines are very large objects and are toasted out-of-line. There's been some past discussion of automatically avoiding repeated detoastings in scenarios like the above, but nothing's gotten to the point of acceptance yet. Possibly you could do something to force detoasting in a subquery. regards, tom lane
Hi. Thanks for the comments. My data is right, and the result is exactly what i want, but as you say i think what causes the query to be slow is the ST_Intersection which creates the intersection between the vector grid (fishnet) and the country polygons.
I will check with the postgis user list if they have any idea on how to speed up this query.
Best,
Andreas
2011/3/8 Tom Lane <tgl@sss.pgh.pa.us>
Andreas Forø Tollefsen <andreasft@gmail.com> writes:Hm, are you sure your data is right? Because the actual rowcounts imply
> This is a query i am working on now. It creates an intersection of two
> geometries. One is a grid of 0.5 x 0.5 decimal degree sized cells, while the
> other is the country geometries of all countries in the world for a certain
> year.
that each country intersects about half of the grid cells, which doesn't
seem right.> ST_Intersects(priogrid_land.cell, cshapeswdate.geom);
> priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode,
> ST_Intersection(pri
> ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land, cshapeswdate WHERE> QUERYAFAICT, all of the runtime is going into calculating the ST_Intersects
> PLAN
> --------------------------------------------------------------------------------
> ------------------------------------------------------------------
> Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual
> time=1.815..7
> 074973.711 rows=130331 loops=1)
> Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom)
> -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242 width=87248)
> (actual
> time=0.007..0.570 rows=242 loops=1)
> -> Index Scan using idx_priogrid_land_cell on priogrid_land
> (cost=0.00..7.1
> 5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
> Index Cond: (priogrid_land.cell && cshapeswdate.geom)
> Total runtime: 7075188.549 ms
> (6 rows)
and/or ST_Intersection functions. The two scans are only accounting for
perhaps 5.5 seconds, and the join infrastructure isn't going to be
terribly expensive, so it's got to be those functions. Not knowing much
about PostGIS, I don't know if the functions themselves can be expected
to be really slow. If it's not them, it could be the cost of fetching
their arguments --- in particular, I bet the country outlines are very
large objects and are toasted out-of-line. There's been some past
discussion of automatically avoiding repeated detoastings in scenarios
like the above, but nothing's gotten to the point of acceptance yet.
Possibly you could do something to force detoasting in a subquery.
regards, tom lane
I have seen really complex geometries cause problems. If you have thousands of points, when 10 would do, try ST_Simplify and see if it doesnt speed things up. -Andy On 3/8/2011 2:42 AM, Andreas Forø Tollefsen wrote: > Hi. Thanks for the comments. My data is right, and the result is exactly > what i want, but as you say i think what causes the query to be slow is > the ST_Intersection which creates the intersection between the vector > grid (fishnet) and the country polygons. > I will check with the postgis user list if they have any idea on how to > speed up this query. > > Best, > Andreas > > 2011/3/8 Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> > > =?ISO-8859-1?Q?Andreas_For=F8_Tollefsen?= <andreasft@gmail.com > <mailto:andreasft@gmail.com>> writes: > > This is a query i am working on now. It creates an intersection > of two > > geometries. One is a grid of 0.5 x 0.5 decimal degree sized > cells, while the > > other is the country geometries of all countries in the world for > a certain > > year. > > Hm, are you sure your data is right? Because the actual rowcounts imply > that each country intersects about half of the grid cells, which doesn't > seem right. > > > priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode, > > ST_Intersection(pri > > ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land, > cshapeswdate WHERE > > ST_Intersects(priogrid_land.cell, cshapeswdate.geom); > > > QUERY > > PLAN > > > > -------------------------------------------------------------------------------- > > ------------------------------------------------------------------ > > Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual > > time=1.815..7 > > 074973.711 rows=130331 loops=1) > > Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom) > > -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242 > width=87248) > > (actual > > time=0.007..0.570 rows=242 loops=1) > > -> Index Scan using idx_priogrid_land_cell on priogrid_land > > (cost=0.00..7.1 > > 5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242) > > Index Cond: (priogrid_land.cell && cshapeswdate.geom) > > Total runtime: 7075188.549 ms > > (6 rows) > > AFAICT, all of the runtime is going into calculating the ST_Intersects > and/or ST_Intersection functions. The two scans are only accounting for > perhaps 5.5 seconds, and the join infrastructure isn't going to be > terribly expensive, so it's got to be those functions. Not knowing much > about PostGIS, I don't know if the functions themselves can be expected > to be really slow. If it's not them, it could be the cost of fetching > their arguments --- in particular, I bet the country outlines are very > large objects and are toasted out-of-line. There's been some past > discussion of automatically avoiding repeated detoastings in scenarios > like the above, but nothing's gotten to the point of acceptance yet. > Possibly you could do something to force detoasting in a subquery. > > regards, tom lane > >
Andy. Thanks. That is a great tips. I tried it but i get the error:
NOTICE: ptarray_simplify returned a <2 pts array.
Query:
SELECT ST_Intersection(priogrid_land.cell, ST_Simplify(cshapeswdate.geom,0.1)) AS geom,
priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate, capname, caplong, caplat, col, row, xcoord, ycoord
FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell, ST_Simplify(cshapeswdate.geom,0.1)) AND cshapeswdate.gwsyear <=1946 AND cshapeswdate.gweyear >=1946 AND cshapeswdate.startdate <= '1946/1/1';
2011/3/8 Andy Colson <andy@squeakycode.net>
I have seen really complex geometries cause problems. If you have thousands of points, when 10 would do, try ST_Simplify and see if it doesnt speed things up.
-Andy
On 3/8/2011 2:42 AM, Andreas Forř Tollefsen wrote:Hi. Thanks for the comments. My data is right, and the result is exactly2011/3/8 Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>
what i want, but as you say i think what causes the query to be slow is
the ST_Intersection which creates the intersection between the vector
grid (fishnet) and the country polygons.
I will check with the postgis user list if they have any idea on how to
speed up this query.
Best,
Andreas<mailto:andreasft@gmail.com>> writes:
> This is a query i am working on now. It creates an intersection
of two
> geometries. One is a grid of 0.5 x 0.5 decimal degree sized
cells, while the
> other is the country geometries of all countries in the world for
a certain
> year.
Hm, are you sure your data is right? Because the actual rowcounts imply
that each country intersects about half of the grid cells, which doesn't
seem right.
> priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode,
> ST_Intersection(pri
> ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land,
cshapeswdate WHERE
> ST_Intersects(priogrid_land.cell, cshapeswdate.geom);
>
QUERY
> PLAN
>
--------------------------------------------------------------------------------
> ------------------------------------------------------------------
> Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual
> time=1.815..7
> 074973.711 rows=130331 loops=1)
> Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom)
> -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242
width=87248)
> (actual
> time=0.007..0.570 rows=242 loops=1)
> -> Index Scan using idx_priogrid_land_cell on priogrid_land
> (cost=0.00..7.1
> 5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
> Index Cond: (priogrid_land.cell && cshapeswdate.geom)
> Total runtime: 7075188.549 ms
> (6 rows)
AFAICT, all of the runtime is going into calculating the ST_Intersects
and/or ST_Intersection functions. The two scans are only accounting for
perhaps 5.5 seconds, and the join infrastructure isn't going to be
terribly expensive, so it's got to be those functions. Not knowing much
about PostGIS, I don't know if the functions themselves can be expected
to be really slow. If it's not them, it could be the cost of fetching
their arguments --- in particular, I bet the country outlines are very
large objects and are toasted out-of-line. There's been some past
discussion of automatically avoiding repeated detoastings in scenarios
like the above, but nothing's gotten to the point of acceptance yet.
Possibly you could do something to force detoasting in a subquery.
regards, tom lane
Forgot to mention that the query terminates the connection because of a crash of server process.
2011/3/8 Andreas Forø Tollefsen <andreasft@gmail.com>
Andy. Thanks. That is a great tips. I tried it but i get the error:NOTICE: ptarray_simplify returned a <2 pts array.Query:SELECT ST_Intersection(priogrid_land.cell, ST_Simplify(cshapeswdate.geom,0.1)) AS geom,priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, enddate, capname, caplong, caplat, col, row, xcoord, ycoordFROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell, ST_Simplify(cshapeswdate.geom,0.1)) AND cshapeswdate.gwsyear <=1946 AND cshapeswdate.gweyear >=1946 AND cshapeswdate.startdate <= '1946/1/1';2011/3/8 Andy Colson <andy@squeakycode.net>I have seen really complex geometries cause problems. If you have thousands of points, when 10 would do, try ST_Simplify and see if it doesnt speed things up.
-Andy
On 3/8/2011 2:42 AM, Andreas Forř Tollefsen wrote:Hi. Thanks for the comments. My data is right, and the result is exactly2011/3/8 Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>
what i want, but as you say i think what causes the query to be slow is
the ST_Intersection which creates the intersection between the vector
grid (fishnet) and the country polygons.
I will check with the postgis user list if they have any idea on how to
speed up this query.
Best,
Andreas<mailto:andreasft@gmail.com>> writes:
> This is a query i am working on now. It creates an intersection
of two
> geometries. One is a grid of 0.5 x 0.5 decimal degree sized
cells, while the
> other is the country geometries of all countries in the world for
a certain
> year.
Hm, are you sure your data is right? Because the actual rowcounts imply
that each country intersects about half of the grid cells, which doesn't
seem right.
> priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode,
> ST_Intersection(pri
> ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land,
cshapeswdate WHERE
> ST_Intersects(priogrid_land.cell, cshapeswdate.geom);
>
QUERY
> PLAN
>
--------------------------------------------------------------------------------
> ------------------------------------------------------------------
> Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) (actual
> time=1.815..7
> 074973.711 rows=130331 loops=1)
> Join Filter: _st_intersects(priogrid_land.cell, cshapeswdate.geom)
> -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242
width=87248)
> (actual
> time=0.007..0.570 rows=242 loops=1)
> -> Index Scan using idx_priogrid_land_cell on priogrid_land
> (cost=0.00..7.1
> 5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242)
> Index Cond: (priogrid_land.cell && cshapeswdate.geom)
> Total runtime: 7075188.549 ms
> (6 rows)
AFAICT, all of the runtime is going into calculating the ST_Intersects
and/or ST_Intersection functions. The two scans are only accounting for
perhaps 5.5 seconds, and the join infrastructure isn't going to be
terribly expensive, so it's got to be those functions. Not knowing much
about PostGIS, I don't know if the functions themselves can be expected
to be really slow. If it's not them, it could be the cost of fetching
their arguments --- in particular, I bet the country outlines are very
large objects and are toasted out-of-line. There's been some past
discussion of automatically avoiding repeated detoastings in scenarios
like the above, but nothing's gotten to the point of acceptance yet.
Possibly you could do something to force detoasting in a subquery.
regards, tom lane
On 3/8/2011 10:58 AM, Andreas Forø Tollefsen wrote: > Andy. Thanks. That is a great tips. I tried it but i get the error: > NOTICE: ptarray_simplify returned a <2 pts array. > > Query: > SELECT ST_Intersection(priogrid_land.cell, > ST_Simplify(cshapeswdate.geom,0.1)) AS geom, > priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate, > enddate, capname, caplong, caplat, col, row, xcoord, ycoord > FROM priogrid_land, cshapeswdate WHERE ST_Intersects(priogrid_land.cell, > ST_Simplify(cshapeswdate.geom,0.1)) AND cshapeswdate.gwsyear <=1946 AND > cshapeswdate.gweyear >=1946 AND cshapeswdate.startdate <= '1946/1/1'; > > > 2011/3/8 Andy Colson <andy@squeakycode.net <mailto:andy@squeakycode.net>> > > I have seen really complex geometries cause problems. If you have > thousands of points, when 10 would do, try ST_Simplify and see if it > doesnt speed things up. > > -Andy > > > > On 3/8/2011 2:42 AM, Andreas Forř Tollefsen wrote: > > Hi. Thanks for the comments. My data is right, and the result is > exactly > what i want, but as you say i think what causes the query to be > slow is > the ST_Intersection which creates the intersection between the > vector > grid (fishnet) and the country polygons. > I will check with the postgis user list if they have any idea on > how to > speed up this query. > > Best, > Andreas > > 2011/3/8 Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us> > <mailto:tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>> > > > =?ISO-8859-1?Q?Andreas_For=F8_Tollefsen?= > <andreasft@gmail.com <mailto:andreasft@gmail.com> > <mailto:andreasft@gmail.com <mailto:andreasft@gmail.com>>> writes: > > This is a query i am working on now. It creates an intersection > of two > > geometries. One is a grid of 0.5 x 0.5 decimal degree sized > cells, while the > > other is the country geometries of all countries in the world for > a certain > > year. > > Hm, are you sure your data is right? Because the actual > rowcounts imply > that each country intersects about half of the grid cells, > which doesn't > seem right. > > > priogrid=# EXPLAIN ANALYZE SELECT priogrid_land.gid, gwcode, > > ST_Intersection(pri > > ogrid_land.cell, cshapeswdate.geom) FROM priogrid_land, > cshapeswdate WHERE > > ST_Intersects(priogrid_land.cell, cshapeswdate.geom); > > > QUERY > > PLAN > > > > > -------------------------------------------------------------------------------- > > > ------------------------------------------------------------------ > > Nested Loop (cost=0.00..12644.85 rows=43351 width=87704) > (actual > > time=1.815..7 > > 074973.711 rows=130331 loops=1) > > Join Filter: _st_intersects(priogrid_land.cell, > cshapeswdate.geom) > > -> Seq Scan on cshapeswdate (cost=0.00..14.42 rows=242 > width=87248) > > (actual > > time=0.007..0.570 rows=242 loops=1) > > -> Index Scan using idx_priogrid_land_cell on priogrid_land > > (cost=0.00..7.1 > > 5 rows=1 width=456) (actual time=0.069..5.604 rows=978 loops=242) > > Index Cond: (priogrid_land.cell && cshapeswdate.geom) > > Total runtime: 7075188.549 ms > > (6 rows) > > AFAICT, all of the runtime is going into calculating the > ST_Intersects > and/or ST_Intersection functions. The two scans are only > accounting for > perhaps 5.5 seconds, and the join infrastructure isn't going > to be > terribly expensive, so it's got to be those functions. Not > knowing much > about PostGIS, I don't know if the functions themselves can > be expected > to be really slow. If it's not them, it could be the cost > of fetching > their arguments --- in particular, I bet the country > outlines are very > large objects and are toasted out-of-line. There's been > some past > discussion of automatically avoiding repeated detoastings in > scenarios > like the above, but nothing's gotten to the point of > acceptance yet. > Possibly you could do something to force detoasting in a > subquery. > > regards, tom lane > > > > ew... thats not good. Seems like it simplified it down to a single point? (not 100% sure that's what the error means, just a guess) Try getting some info about it: select ST_Npoints(geom) As before, ST_NPoints(ST_Simplify(geom,0.1)) as after from cshapeswdate Also try things like ST_IsSimple ST_IsValid. I seem to recall sometimes needing ST_Points or st_NumPoints instead of ST_Npoints. -Andy