On 01/04/2017 05:59 PM, Israel Brewster wrote:
> Short version:
> Do indexes impact the speed of an UPDATE, even when the indexed columns
> aren't changing?
>
They shouldn't, as long as the updated tuple can be updated on the same
page (8kB chunk of data). In that case we can do a HOT update for the
row, without updating the index(es).
But as you're updating the whole table, that would require about 50% of
all pages to be free, which is unlikely to be true. So perhaps some
updates can proceed without touching indexes, but most can't.
> Details:
> I have a table containing geographical data (Latitude, longitude, and
> elevation) with 406,833,705 records. The Latitude and Longitude columns
> are indexed. In order to better utilize the data, I've been looking into
> PostGIS, and decided I wanted to add a "Location" column with PostGIS
> type "GEOGRAPHY(point)". I then tried to populate it from the existing
> latitude/longitude data using the following query:
>
> UPDATE data SET
> location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||'
> '||lat::text||')');
>
> I expected this update to take quite a while, since it has 406 million
> rows to update, but at this point it's been over 19 hours since I
> started the query, and it still hasn't completed.
>
> I'm wondering if the presence of the indexes could be slowing things
> down even though the indexed columns aren't being updated? Would I be
> better off canceling the update query, dropping the indexes, and trying
> again? Or is more likely that the update query is "almost" done, and it
> would be better to just let it run it's course? Or is there an even
> better option, such as perhaps exporting the data, adding the additional
> column in a text editor, and re-importing the data with a COPY command?
>
As explained above, it's likely that such full-table update has to
modify the indexes anyway, making it much more expensive. Without
additional information it's however impossible to confirm that's what's
causing the long update in this case - there may be other bits slowing
it down - e.g. foreign keys checks, triggers.
CREATE TABLE AS SELECT would not pay any of those costs, of course.
Also, if you're running with wal_level=minimal, it would not have to
write the changes into WAL, while the regular UPDATE has to do that.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services