Thread: A question about leakproof
Problem description
After testing, we don't find the difference between functions of proleakproof=true and functions of proleakproof=false (the function is described in pg_proc). Can you give specific examples to show that functions of proleakproof=true are more secure or can prevent data disclosure than functions of proleakproof=false. My related testing process is as follows (the rsp_user and wumk used below are the two database users that have been created).
1. Operation under user rsp_user
1.1 create table
drop table if exists tb_a cascade;
create table tb_a(id int4,c1 int, c2 int, pad text, effective_date timestamp without time zone NOT NULL) ;
CREATE INDEX tb_a_t_idx_id ON tb_a USING btree (id) TABLESPACE pg_default;
CREATE INDEX tb_a_t_idx_ed ON tb_a USING btree (effective_date) TABLESPACE pg_default;
CREATE INDEX tb_a_t_idx_c2 ON tb_a USING btree (c2) TABLESPACE pg_default;
1.2 insert data
insert into tb_a select id, id %200, id%1000, 'ss', current_date - floor((random() * 10000))::int from (select generate_series(1,10000) id) tb_a;
analyze;
1.3 create view
a. condition of view:effective_date > now() - TIME'23:00', in particular, type of effective_date is ‘timestamp without time zone NOT NULL’,but the type of the result of now() - TIME'23:00' is:timestamp with time zone,the related SQL statements are as follows:
drop view if exists tb_a_date_v1;
CREATE VIEW tb_a_date_v1 AS select * from tb_a where effective_date > now() - TIME'23:00';
b. the condition of the view is id=183. Note that type of id field is int4. the relevant SQL statements are as follows:
drop view if exists tb_a_int4_v1;
CREATE VIEW tb_a_int4_v1 AS select * from tb_a where id=183;
1.4 Authorize the view to user wumk
GRANT ALL ON SCHEMA public TO wumk;
GRANT ALL ON table tb_a_date_v1 TO wumk;
GRANT ALL ON table tb_a_int4_v1 TO wumk;
1.5 test SQL
Execute the following SQL statements respectively:explain select * from tb_a_int4_v1; explain select * from tb_a_date_v1。the results are as follows, the plans for both SQL statements are indexscan。
2. Operation under user wumk
1.1. test SQL,looking the plans
Execute the following SQL statements respectively:explain select * from tb_a_int4_v1; explain select * from tb_a_date_v1。The results are as follows:
As shown above, the plan of view tb_a_date_ v1 is seqscan (under the user rsp_user, its plan is indexscan),After analyzing the source code of postgresql, it is found that this problem is related to the following code:
In the above figure, func_oid=2523, the condition (effective_date > now() - TIME'23:00' in view b_a_data_v1 ) will use the function(oid=2523) for calculation, the function name is timestamp_gt_timestamptz,as shown in the figure below:
The proleakproof of the function is false,so,for the selectivity calculation of effective_date > now() - TIME'23:00 don’t use statistical information
,so the selectivity calculation of indexscan is higher than the actual,So finally, seqscan is selected.
While condition id=183 in in view tb_a_int4_v1 uses a comparison function whose proleakproof is true, so tb_a_int4_v1 view will normally use statistics info to calculate the selectivity, so that the correct indexscan is finally selected
So check the official postgresql document about leakproof:
According to my understanding, if the proleakproof is true, the function will not cause data leakage, and if the proleakproof is false, the function may cause data leakage. So I had tested the proleakproof about data leakage in sections 2.2 and 2.3.
1.2. test the data leakage of seqscan
a. Execute the following SQL and the results are as follows:
create or replace function leak_date(timestamp with time zone) returns bool as $$begin raise notice 'abc:%', $1; return true; end$$ language plpgsql cost 0.0000000000000001;
create or replace function leak_int4(int4) returns bool as $$begin raise notice 'abc:%', $1; return true; end$$ language plpgsql cost 0.0000000000000001;
set enable_indexscan=off;
set enable_bitmapscan=off;
select * from tb_a_date_v1 where leak_date(effective_date);
select * from tb_a_int4_v1 where leak_int4(id);
Analysis conclusion:
It is found that whether the proleakproof is true or false, there is a problem of data leakage under the seqscan plan.
1.3. test the data leakage of indexscan
To ensure that the plan must be indexscan, you need to set the following parameters:
set enable_seqscan=off; set enable_indexscan=on; set enable_bitmapscan=on;
Then execute the sqls:
select * from tb_a_date_v1 where leak_date(effective_date);
select * from tb_a_int4_v1 where leak_int4(id);
The results are as follows:
Analysis conclusion:
It is found that whether the proleakproof is true or false, there is no problem of data leakage under the indexscan plan
Attachment
Hi, On Mon, Oct 17, 2022 at 09:15:20AM +0800, qiumingcheng wrote: > Hello, My questions are as follows: > Problem description > After testing, we don't find the difference between functions of > proleakproof=true and functions of proleakproof=false (the function is > described in pg_proc). Can you give specific examples to show that functions > of proleakproof=true are more secure or can prevent data disclosure than > functions of proleakproof=false. My related testing process is as follows > (the rsp_user and wumk used below are the two database users that have been > created). Have you looked at https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Security_barriers_and_Leakproof?
Julien Rouhaud <rjuju123@gmail.com> writes: > On Mon, Oct 17, 2022 at 09:15:20AM +0800, qiumingcheng wrote: >> After testing, we don't find the difference between functions of >> proleakproof=true and functions of proleakproof=false (the function is >> described in pg_proc). > Have you looked at > https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Security_barriers_and_Leakproof? Also: the fact that a built-in function is not marked leakproof doesn't mean that it isn't leakproof. It could just mean that we haven't looked at it closely, or that there's too much code involved to have much confidence that it would stay leakproof. regards, tom lane
> https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Security_barriers_and_Leakproof?
>doesn't mean that it isn't leakproof. It could just mean that
>we haven't looked at it closely, or that there's too much code
>involved to have much confidence that it would stay leakproof.
2. What basis do you set proleakproof of in4eq function to true? How should I judge whether a function should be marked as proleakproof.Can you give a function that will not leak?
------------------------------------------------------------------发件人:Tom Lane <tgl@sss.pgh.pa.us>发送时间:2022年10月17日(星期一) 09:54收件人:Julien Rouhaud <rjuju123@gmail.com>抄 送:qiumingcheng <qiumingcheng@aliyun.com>; pgsql-general <pgsql-general@lists.postgresql.org>主 题:Re: A question about leakproofJulien Rouhaud <rjuju123@gmail.com> writes:
> On Mon, Oct 17, 2022 at 09:15:20AM +0800, qiumingcheng wrote:
>> After testing, we don't find the difference between functions of
>> proleakproof=true and functions of proleakproof=false (the function is
>> described in pg_proc).
> Have you looked at
> https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Security_barriers_and_Leakproof?
Also: the fact that a built-in function is not marked leakproof
doesn't mean that it isn't leakproof. It could just mean that
we haven't looked at it closely, or that there's too much code
involved to have much confidence that it would stay leakproof.
regards, tom lane
"qiumingcheng" <qiumingcheng@aliyun.com> writes: > 1. In the test example I gave, the in4eq function's proleakproof=true, but its actual test result is leaking. Does thatmean you will adjust it to proleakproof=false later? int4eq is about as leakproof as a function could possibly be: it does not leak, it's plain from the code of the function that it does not leak, and it calls no other code that might accidentally introduce a leak in future. I think you do not understand what that property actually means. Per the CREATE FUNCTION man page: LEAKPROOF indicates that the function has no side effects. It reveals no information about its arguments other than by its return value. For example, a function which throws an error message for some argument values but not others, or which includes the argument values in any error message, is not leakproof. Please note that this definition talks only about the behavior of the function itself. Re-reading your email, you seem to be imagining that changes in a query's plan on the basis of changes in collected statistics have something to do with this. They do not. regards, tom lane
"qiumingcheng" <qiumingcheng@aliyun.com> writes:
> 1. In the test example I gave, the in4eq function's proleakproof=true, but its actual test result is leaking. Does that mean you will adjust it to proleakproof=false later?
int4eq is about as leakproof as a function could possibly be: it does
not leak, it's plain from the code of the function that it does not
leak, and it calls no other code that might accidentally introduce
a leak in future.
I think you do not understand what that property actually means.
Per the CREATE FUNCTION man page:
LEAKPROOF indicates that the function has no side
effects. It reveals no information about its arguments other than by
its return value. For example, a function which throws an error message
for some argument values but not others, or which includes the argument
values in any error message, is not leakproof.
Please note that this definition talks only about the behavior
of the function itself. Re-reading your email, you seem to be
imagining that changes in a query's plan on the basis of changes in
collected statistics have something to do with this. They do not.
------------------------------------------------------------------发件人:Tom Lane <tgl@sss.pgh.pa.us>发送时间:2022年10月17日(星期一) 11:33收件人:qiumingcheng <qiumingcheng@aliyun.com>抄 送:Julien Rouhaud <rjuju123@gmail.com>; pgsql-general <pgsql-general@lists.postgresql.org>主 题:Re: 回复:A question about leakproof"qiumingcheng" <qiumingcheng@aliyun.com> writes:
> 1. In the test example I gave, the in4eq function's proleakproof=true, but its actual test result is leaking. Does that mean you will adjust it to proleakproof=false later?
int4eq is about as leakproof as a function could possibly be: it does
not leak, it's plain from the code of the function that it does not
leak, and it calls no other code that might accidentally introduce
a leak in future.
I think you do not understand what that property actually means.
Per the CREATE FUNCTION man page:
LEAKPROOF indicates that the function has no side
effects. It reveals no information about its arguments other than by
its return value. For example, a function which throws an error message
for some argument values but not others, or which includes the argument
values in any error message, is not leakproof.
Please note that this definition talks only about the behavior
of the function itself. Re-reading your email, you seem to be
imagining that changes in a query's plan on the basis of changes in
collected statistics have something to do with this. They do not.
regards, tom lane
Attachment
On Mon, 2022-10-17 at 13:17 +0800, qiumingcheng wrote: > > you seem to be imagining that changes in a query's plan on the basis of changes > > in collected statistics have something to do with this. They do not. > > Sorry, I may not fully understand what you mean. I mean that after my tests, > the execution results of this SQL (explain select * from tb_a_date_v1) execution plan > are different under different users, which is really related to the parameter proleakproof. That's the idea behind leakproof: if a function is not leakproof, the optimizer will not move it "inside" the view definition. Then the function is evaluated only after the view definition. That may very well lead to a slower execution plan, because it cannot use certain indexes on the underlying tables. It is the price you have to pay for good security. Yours, Laurenz Albe
> in collected statistics have something to do with this. They do not."
------------------------------------------------------------------发件人:Laurenz Albe <laurenz.albe@cybertec.at>发送时间:2022年10月17日(星期一) 15:20收件人:qiumingcheng <qiumingcheng@aliyun.com>; Tom Lane <tgl@sss.pgh.pa.us>抄 送:Julien Rouhaud <rjuju123@gmail.com>; pgsql-general <pgsql-general@lists.postgresql.org>主 题:Re: 回复:回复:A question about leakproofOn Mon, 2022-10-17 at 13:17 +0800, qiumingcheng wrote:
> > you seem to be imagining that changes in a query's plan on the basis of changes
> > in collected statistics have something to do with this. They do not.
>
> Sorry, I may not fully understand what you mean. I mean that after my tests,
> the execution results of this SQL (explain select * from tb_a_date_v1) execution plan
> are different under different users, which is really related to the parameter proleakproof.
That's the idea behind leakproof: if a function is not leakproof, the optimizer
will not move it "inside" the view definition. Then the function is evaluated only
after the view definition. That may very well lead to a slower execution plan,
because it cannot use certain indexes on the underlying tables.
It is the price you have to pay for good security.
Yours,
Laurenz Albe
"qiumingcheng" <qiumingcheng@aliyun.com> writes: > 2. What's the reason about the function timestamp_gt_timestampz may cause data leakage? Can you explain how it causes dataleakage? It's capable of throwing an error (see timestamp2timestamptz_opt_overflow). Now, maybe that's unreachable, or maybe we could rerrange things to remove it. But there's still enough code underneath the timezone conversion requirement that I'd be very hesitant to apply a leakproof marking. In short: it might be leakproof in practice, but we don't wish to offer a guarantee. regards, tom lane
On Mon, 2022-10-17 at 16:24 +0800, qiumingcheng wrote: > > "you seem to be imagining that changes in a query's plan on the basis of changes > > in collected statistics have something to do with this. They do not." > > 1. My understanding of the above paragraph is that for the same view and different users, > the proleakproof=false attribute of the function will not lead to inconsistent plans, > but my actual test result is that proleakproof=false will lead to inconsistent plans. The above says "on the basis of changes in collected statistics". The different execution you see is not because the statistics are different, but because the permissions of the users are different. > 2. What's the reason about the function timestamp_gt_timestampz may cause data leakage? > Can you explain how it causes data leakage? I don't know the reason in this case. You could look at the source code, perhaps it is possible to cause error messages that can give you some clue as to the value that you compare with. But perhaps, as Tome said, it is just that nobody scrutinized the function hard enough to exclude that something like that can happen. Yours, Laurenz Albe
------------------------------------------------------------------发件人:Tom Lane <tgl@sss.pgh.pa.us>发送时间:2022年10月17日(星期一) 22:07收件人:qiumingcheng <qiumingcheng@aliyun.com>抄 送:Laurenz Albe <laurenz.albe@cybertec.at>; Julien Rouhaud <rjuju123@gmail.com>; pgsql-general <pgsql-general@lists.postgresql.org>; yuexingzhi <yuexingzhi@hotmail.com>主 题:Re: 回复:回复:回复:A question about leakproof"qiumingcheng" <qiumingcheng@aliyun.com> writes:
> 2. What's the reason about the function timestamp_gt_timestampz may cause data leakage? Can you explain how it causes data leakage?
It's capable of throwing an error (see
timestamp2timestamptz_opt_overflow). Now, maybe that's unreachable, or
maybe we could rerrange things to remove it. But there's still enough
code underneath the timezone conversion requirement that I'd be very
hesitant to apply a leakproof marking. In short: it might be leakproof
in practice, but we don't wish to offer a guarantee.
regards, tom lane
"qiumingcheng" <qiumingcheng@aliyun.com> writes: > Yes, It's capable of throwing an error(timestamp out of range) , but the > message "timestamp out of range" is not sensitive information. Really? Whether that's true at all is a matter of opinion. There's also the prospect that somebody could determine the value of a supposedly-unreadable timestamp by seeing how big an interval could be added to it without overflow. Maybe that's infeasible because of timestamp_pl_interval not being marked leakproof, but then we're getting into precisely the sort of conditional-on-other-assumptions reasoning that we don't want to indulge in. > Only from this function(timestamp_gt_timestamptz), can it be marked as leakproof? Project policy is that we will not mark a function as leakproof unless it's evident from the text of the function that it can't throw errors. I don't see a good argument for making a exception for this one. regards, tom lane
------------------------------------------------------------------发件人:Tom Lane <tgl@sss.pgh.pa.us>发送时间:2022年10月18日(星期二) 00:27收件人:qiumingcheng <qiumingcheng@aliyun.com>抄 送:Laurenz Albe <laurenz.albe@cybertec.at>; Julien Rouhaud <rjuju123@gmail.com>; pgsql-general <pgsql-general@lists.postgresql.org>; yuexingzhi <yuexingzhi@hotmail.com>主 题:Re: 回复:回复:回复:回复:A question about leakproof"qiumingcheng" <qiumingcheng@aliyun.com> writes:
> Yes, It's capable of throwing an error(timestamp out of range) , but the
> message "timestamp out of range" is not sensitive information.
Really? Whether that's true at all is a matter of opinion. There's
also the prospect that somebody could determine the value of a
supposedly-unreadable timestamp by seeing how big an interval could
be added to it without overflow. Maybe that's infeasible because of
timestamp_pl_interval not being marked leakproof, but then we're
getting into precisely the sort of conditional-on-other-assumptions
reasoning that we don't want to indulge in.
> Only from this function(timestamp_gt_timestamptz), can it be marked as leakproof?
Project policy is that we will not mark a function as leakproof unless
it's evident from the text of the function that it can't throw errors.
I don't see a good argument for making a exception for this one.
regards, tom lane