Re: Cursor + upsert (astronomical data) - Mailing list pgsql-performance
From | Jeff Janes |
---|---|
Subject | Re: Cursor + upsert (astronomical data) |
Date | |
Msg-id | CAMkU=1ws0YbQy-a9FTrb3e7xkhJ=jmSvZQRfbPyTSFYMamTYeg@mail.gmail.com Whole thread Raw |
In response to | Cursor + upsert (astronomical data) (Jiří Nádvorník <nadvornik.ji@gmail.com>) |
List | pgsql-performance |
The reason why I solve the performance issues here is that the table of observations has atm cca 3e8 rows after 1.5 year of gathering the data. The number growth is linear.
(UPDATE
\schema.objcat
SET
ipix_cat=q3c_ang2ipix(
(raj2000 * weight + curr_raj2000) / (weight + 1),
(dej2000 * weight + curr_dej2000) / (weight + 1)
),
raj2000 = (raj2000 * weight + curr_raj2000) / (weight + 1),
dej2000 = (dej2000 * weight + curr_dej2000) / (weight + 1),
weight=weight+1
WHERE
q3c_join(curr_raj2000, curr_dej2000,
raj2000, dej2000,
radius)
RETURNING *),
Results: When I run the query only once (1 client thread), it runs cca 1 mil rows per hour.
Which is days for the whole dataset. When I run it in parallel with that lock to ensure pessimistic synchronization, it runs sequentially too J the other threads just waiting. When I delete that lock and hope to solve the resulting conflicts later, the ssd disk serves up to 4 threads relatively effectively – which can divide my days of time by 4 – still inacceptable.
The reason is quite clear here – I’m trying to write something in one cycle of the script to a table – then in the following cycle I need to read that information.
Questions for you:
1. The first question is if you can think of a better way how to do this and maybe if SQL is even capable of doing such thing – or do I have to do it in C? Would rewriting the SQL function to C help?
2. Could I somehow bend the commiting during the algorithm for my thing? Ensure that inside one cycle, the whole part of the identifiers table would be kept in memory for faster lookups?
3. Why is this so slow? J It is comparable to the quadratic algorithm in the terms of speed – only does not use any memory.
pgsql-performance by date: