Re: Incorrect number of rows inserted into partitioned table - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Incorrect number of rows inserted into partitioned table
Date
Msg-id 23169.1681164279@sss.pgh.pa.us
Whole thread Raw
In response to Incorrect number of rows inserted into partitioned table  (Роман Осипов <osipovromanvladimirovich@gmail.com>)
List pgsql-bugs
=?UTF-8?B?0KDQvtC80LDQvSDQntGB0LjQv9C+0LI=?= <osipovromanvladimirovich@gmail.com> writes:
> There is an insertion not of the amount specified in *limit*,, but a little
> more or less than it.

I think you're getting burnt by the fact that a rule is a macro,
combined with the fact that your query is underspecified:

insert into book_inherit_test select * from book_test limit 1000000;

This doesn't constrain *which* 1000000 rows of book_test get inserted.
If we EXPLAIN it we get:

=# explain insert into book_inherit_test select * from book_test limit 1000000;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Insert on book_inherit_test  (cost=0.00..32353.00 rows=0 width=0)
   ->  Subquery Scan on "*SELECT*"  (cost=0.00..32353.00 rows=617176 width=31)
         Filter: ((("*SELECT*".releaseyear >= 1900) AND ("*SELECT*".releaseyear < 1951)) IS NOT TRUE)
         ->  Limit  (cost=0.00..17353.00 rows=1000000 width=31)
               ->  Seq Scan on book_test  (cost=0.00..34706.00 rows=2000000 width=31)

 Insert on book_inherit_test_before_1950  (cost=0.00..32353.00 rows=0 width=0)
   ->  Subquery Scan on "*SELECT*"  (cost=0.00..32353.00 rows=382824 width=31)
         Filter: (("*SELECT*".releaseyear >= 1900) AND ("*SELECT*".releaseyear < 1951))
         ->  Limit  (cost=0.00..17353.00 rows=1000000 width=31)
               ->  Seq Scan on book_test  (cost=0.00..34706.00 rows=2000000 width=31)
(11 rows)

from which we can see that the insertions into book_inherit_test and
those into book_inherit_test_before_1950 are driven off completely
separate scans of book_test.  And that table is big enough that
the synchronize_seqscans feature kicks in, meaning that indeed this
will scan two different million-row subsets of book_test, producing
results fundamentally unlike what you expected.

I get stable results after disabling synchronize_seqscans, but
a more correct way would be to add ORDER BY to fully determine which
rows of book_test are considered.

Advice: don't use a RULE for this sort of thing.  If you really want to,
you can get the same effects more predictably with a trigger.
But as David said, the whole thing looks like a poor reimplementation
of partitioning.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Роман Осипов
Date:
Subject: Re: Incorrect number of rows inserted into partitioned table
Next
From: Роман Осипов
Date:
Subject: Re: Incorrect number of rows inserted into partitioned table