On Wed, Jun 24, 2020 at 8:13 PM Hardik Bansal <hardikbansal24@gmail.com> wrote:
> The problem we are facing is that when we are trying to insert data using following query:
>
> insert into message (
> m_type, content, is_received, is_seen, is_active, created_at, room_no_id, sender_id
> )
> select
> 'TEXT',
> CASE WHEN s.i % 2 = 0 THEN 'text 1'
> ELSE 'text 2'
> end,
> TRUE,
> TRUE,
> TRUE,
> dr.created_at + s.i * (interval '1 hour'),
> dr.id,
> CASE WHEN s.i % 2 = 0 THEN split_part(dr.name, '_', 2)::int
> ELSE split_part(dr.name, '_', 3)::int
> end,
> from room as dr, generate_series(0, 10) as s(i);
>
> It is taking nearly 1 hour 50 minutes to insert around 20 million entries. When we are not sharding the table, it
takesaround 8 minutes to perform the same. So, that is basically 14 times slower than without sharding. Are we missing
anythinghere or inserts are that slow in sharding using this method?
Unfortunately, it's less efficient to route such many rows to foreign
partitions than expected; because the rows are sent to the remote side
one by one using the remote INSERT command. I'm not sure there is any
good workaround to this case, but there is a patch for improving the
efficiency of COPY FROM for sharded tables [1]. Once we have that in
PostgreSQL, we would be able to route such many rows more efficiently
using COPY FROM. In this case, we would need to copy the data to a
file before COPY FROM, though.
Best regards,
Etsuro Fujita
[1] https://www.postgresql.org/message-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru