Thread: Planet Postgres and the curse of AI

Planet Postgres and the curse of AI

From
Greg Sabino Mullane
Date:
I've been noticing a growing trend of blog posts written mostly, if not entirely, with AI (aka LLMs, ChatGPT, etc.). I'm not sure where to raise this issue. I considered a blog post, but this mailing list seemed a better forum to generate a discussion.

The problem is two-fold as I see it.

First, there is the issue of people trying to game the system by churning out content that is not theirs, but was written by a LLM. I'm not going to name specific posts, but after a while it gets easy to recognize things that are written mostly by AI.

These blog posts are usually generic, describing some part of Postgres in an impersonal, mid-level way. Most of the time the facts are not wrong, per se, but they lack nuances that a real DBA would bring to the discussion, and often leave important things out. Code examples are often wrong in subtle ways. Places where you might expect a deeper discussion are glossed over.

So this first problem is that it is polluting the Postgres blogs with overly bland, moderately helpful posts that are not written by a human, and do not really bring anything interesting to the table. There is a place for posts that describe basic Postgres features, but the ones written by humans are much better. (yeah, yeah, "for now" and all hail our AI overlords in the future).

The second problem is worse, in that LLMs are not merely gathering information, but have the ability to synthesize new conclusions and facts. In short, they can lie. Or hallucinate. However you want to call it, it's a side effect of the way LLMs work. In a technical field like Postgres, this can be a very bad thing. I don't know how widespread this is, but I was tipped off about this over a year ago when I came across a blog suggesting using the "max_toast_size configuration parameter". For those not familiar, I can assure you that Postgres does not have, nor will likely ever have, a GUC with that name.

As anyone who has spoken with ChatGPT knows, getting small important details correct is not its forte. I love ChatGPT and actually use it daily. It is amazing at doing certain tasks. But writing blog posts should not be one of them.

Do we need a policy or a guideline for Planet Postgres? I don't know. It can be a gray line. Obviously spelling and grammar checking is quite okay, and making up random GUCs is not, but the middle bit is very hazy. (Human) thoughts welcome.

Cheers,
Greg

Re: Planet Postgres and the curse of AI

From
Pavel Stehule
Date:


st 17. 7. 2024 v 19:22 odesílatel Greg Sabino Mullane <htamfids@gmail.com> napsal:
I've been noticing a growing trend of blog posts written mostly, if not entirely, with AI (aka LLMs, ChatGPT, etc.). I'm not sure where to raise this issue. I considered a blog post, but this mailing list seemed a better forum to generate a discussion.

The problem is two-fold as I see it.

First, there is the issue of people trying to game the system by churning out content that is not theirs, but was written by a LLM. I'm not going to name specific posts, but after a while it gets easy to recognize things that are written mostly by AI.

These blog posts are usually generic, describing some part of Postgres in an impersonal, mid-level way. Most of the time the facts are not wrong, per se, but they lack nuances that a real DBA would bring to the discussion, and often leave important things out. Code examples are often wrong in subtle ways. Places where you might expect a deeper discussion are glossed over.

So this first problem is that it is polluting the Postgres blogs with overly bland, moderately helpful posts that are not written by a human, and do not really bring anything interesting to the table. There is a place for posts that describe basic Postgres features, but the ones written by humans are much better. (yeah, yeah, "for now" and all hail our AI overlords in the future).

The second problem is worse, in that LLMs are not merely gathering information, but have the ability to synthesize new conclusions and facts. In short, they can lie. Or hallucinate. However you want to call it, it's a side effect of the way LLMs work. In a technical field like Postgres, this can be a very bad thing. I don't know how widespread this is, but I was tipped off about this over a year ago when I came across a blog suggesting using the "max_toast_size configuration parameter". For those not familiar, I can assure you that Postgres does not have, nor will likely ever have, a GUC with that name.

As anyone who has spoken with ChatGPT knows, getting small important details correct is not its forte. I love ChatGPT and actually use it daily. It is amazing at doing certain tasks. But writing blog posts should not be one of them.

Do we need a policy or a guideline for Planet Postgres? I don't know. It can be a gray line. Obviously spelling and grammar checking is quite okay, and making up random GUCs is not, but the middle bit is very hazy. (Human) thoughts welcome.

It is very unpleasant to read a long article, and at the end to understand so there is zero valuable information. Terrible situation was on planet mariadb https://mariadb.org/planet/, but now it was cleaned. I am for some form of moderating - and gently touching an author that writes articles without extra value against documentation.

Regards

Pavel

 

Cheers,
Greg

Re: Planet Postgres and the curse of AI

From
Kashif Zeeshan
Date:
Hi Greg

I agree with you on the misuse of AI based tools, as per my experience with Postgres the solutions suggested wont work at times.
Its not bad to get help from these tools but put all the solutions from there is counter productive.
I think People should take care while using these tools while suggesting solutions for real world problems.

Regards
Kashif Zeeshan

On Wed, Jul 17, 2024 at 10:22 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
I've been noticing a growing trend of blog posts written mostly, if not entirely, with AI (aka LLMs, ChatGPT, etc.). I'm not sure where to raise this issue. I considered a blog post, but this mailing list seemed a better forum to generate a discussion.

The problem is two-fold as I see it.

First, there is the issue of people trying to game the system by churning out content that is not theirs, but was written by a LLM. I'm not going to name specific posts, but after a while it gets easy to recognize things that are written mostly by AI.

These blog posts are usually generic, describing some part of Postgres in an impersonal, mid-level way. Most of the time the facts are not wrong, per se, but they lack nuances that a real DBA would bring to the discussion, and often leave important things out. Code examples are often wrong in subtle ways. Places where you might expect a deeper discussion are glossed over.

So this first problem is that it is polluting the Postgres blogs with overly bland, moderately helpful posts that are not written by a human, and do not really bring anything interesting to the table. There is a place for posts that describe basic Postgres features, but the ones written by humans are much better. (yeah, yeah, "for now" and all hail our AI overlords in the future).

The second problem is worse, in that LLMs are not merely gathering information, but have the ability to synthesize new conclusions and facts. In short, they can lie. Or hallucinate. However you want to call it, it's a side effect of the way LLMs work. In a technical field like Postgres, this can be a very bad thing. I don't know how widespread this is, but I was tipped off about this over a year ago when I came across a blog suggesting using the "max_toast_size configuration parameter". For those not familiar, I can assure you that Postgres does not have, nor will likely ever have, a GUC with that name.

As anyone who has spoken with ChatGPT knows, getting small important details correct is not its forte. I love ChatGPT and actually use it daily. It is amazing at doing certain tasks. But writing blog posts should not be one of them.

Do we need a policy or a guideline for Planet Postgres? I don't know. It can be a gray line. Obviously spelling and grammar checking is quite okay, and making up random GUCs is not, but the middle bit is very hazy. (Human) thoughts welcome.

Cheers,
Greg

Re: Planet Postgres and the curse of AI

From
Adrian Klaver
Date:
On 7/17/24 10:21, Greg Sabino Mullane wrote:
> I've been noticing a growing trend of blog posts written mostly, if not 
> entirely, with AI (aka LLMs, ChatGPT, etc.). I'm not sure where to raise 
> this issue. I considered a blog post, but this mailing list seemed a 
> better forum to generate a discussion.
> 

> 
> Do we need a policy or a guideline for Planet Postgres? I don't know. It 
> can be a gray line. Obviously spelling and grammar checking is quite 
> okay, and making up random GUCs is not, but the middle bit is very hazy. 
> (Human) thoughts welcome.

A policy would be nice, just not sure how enforceable it would be. How 
do you differentiate between the parrot that is AI and one that is 
human? I run across all manner of blog posts where folks have lifted 
content from the documentation or other sources without attribution, 
which is basically what AI generated content is. AI does like to 
embellish and make things up(ask the NYC lawyer suing the airlines about 
that), though that is a human trait as well.

> 
> Cheers,
> Greg
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Planet Postgres and the curse of AI

From
Laurenz Albe
Date:
On Wed, 2024-07-17 at 13:21 -0400, Greg Sabino Mullane wrote:
> I've been noticing a growing trend of blog posts written mostly, if not entirely, with AI
> (aka LLMs, ChatGPT, etc.). I'm not sure where to raise this issue. I considered a blog post,
> but this mailing list seemed a better forum to generate a discussion.
>
> The problem is two-fold as I see it.
>
> First, there is the issue of people trying to game the system by churning out content that is not theirs [...]
>
> So this first problem is that it is polluting the Postgres blogs [...]
>
> The second problem is worse, in that LLMs are not merely gathering information, but have
> the ability to synthesize new conclusions and facts. In short, they can lie.
>
> Do we need a policy or a guideline for Planet Postgres? I don't know. It can be a gray line.
> Obviously spelling and grammar checking is quite okay, and making up random GUCs is not,
> but the middle bit is very hazy. (Human) thoughts welcome.

As someone who writes blogs and occasionally browses Planet Postgres, this has not
struck me as a major problem.  I just scrolled through it and nothing stood out to
me - perhaps I am too naïve.

There certainly are people who publish random short utterances, perhaps with the
intention to hit the "top posters" list, but I don't think we need strong measures.

If anything, I am most annoyed by articles that are just thinly veiled advertising,
but there is already a policy controlling that.

As long as there is not a flood of AI generated babble (and I cannot see one), I'd
say that this will regulate itself: spewing empty content and lies is not going to
reflect well on the author and his/her organization.

PostgreSQL has excellent documentation.  Anybody who blindly follows advice from a
blog without checking with the documentation only has himself/herself to blame.

Yours,
Laurenz Albe



Re: Planet Postgres and the curse of AI

From
Laurenz Albe
Date:
I wrote:
> On Wed, 2024-07-17 at 13:21 -0400, Greg Sabino Mullane wrote:
> > I've been noticing a growing trend of blog posts written mostly, if not entirely, with AI
> > (aka LLMs, ChatGPT, etc.). I'm not sure where to raise this issue. I considered a blog post,
> > but this mailing list seemed a better forum to generate a discussion.
> >
> > [...]
> >
> > Do we need a policy or a guideline for Planet Postgres? I don't know. It can be a gray line.
> > Obviously spelling and grammar checking is quite okay, and making up random GUCs is not,
> > but the middle bit is very hazy. (Human) thoughts welcome.
>
> As someone who writes blogs and occasionally browses Planet Postgres, this has not
> struck me as a major problem.  I just scrolled through it and nothing stood out to
> me - perhaps I am too naïve.

Seems like I *was* naïve - Álvaro has pointed me to a juicy example off-list.

Still, I wouldn't make a policy specifically against AI generated content.  That is
hard to prove, and it misses the core of the problem.  The real problem is low-level,
counterfactual content, be it generated by an AI or not.

Perhaps there could be a way to report misleading, bad content and a policy that says
that you can be banned if you repeatedly write grossly misleading and counterfactual
content.  Stuff like "to improve performance, set fast_mode = on and restart the database".

Yours,
Laurenz Albe



Re: Planet Postgres and the curse of AI

From
David Rowley
Date:
On Fri, 19 Jul 2024 at 00:31, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> Perhaps there could be a way to report misleading, bad content and a policy that says
> that you can be banned if you repeatedly write grossly misleading and counterfactual
> content.  Stuff like "to improve performance, set fast_mode = on and restart the database".

As a first step, maybe it's worth just privately writing to the
offenders telling them what's been seen, giving them a chance to
improve and letting them know what they're doing isn't going
unnoticed.  If I was doing this and someone pointed out lots of silly
mistakes with something I'd published, I'd be very embarrassed and I'd
reconsider my blog writing approach.

It might also be worth considering if we want to have a policy on LLM
usage in https://www.postgresql.org/about/policies/planet-postgresql/
.  If we want to disallow blogs written by LLMs then we'd need to be
careful about how we define that as doing something like using an
LLM-based spell checker does not seem like it should be disallowed.
But to what degree exactly should that be allowed?

David



Re: Planet Postgres and the curse of AI

From
Greg Sabino Mullane
Date:
> But to what degree exactly should that be allowed?

Somewhat ironically, here's a distinction chatgpt and I came up with:

LLM-generated content: Content where the substantial part of the text is directly created by LLMs without significant human alteration or editing.

Human-edited or reviewed content: Content that has been substantially revised, corrected, or enhanced by a human after initial generation by LLMs. This includes using spell and grammar checking, manual edits for clarity or style, and content that reflects significant human input beyond the original LLM output.


Re: Planet Postgres and the curse of AI

From
Laurenz Albe
Date:
On Thu, 2024-07-18 at 10:25 -0400, Greg Sabino Mullane wrote:
> > But to what degree exactly should that be allowed?
>
> Somewhat ironically, here's a distinction chatgpt and I came up with:
>
> LLM-generated content: Content where the substantial part of the text is directly
> created by LLMs without significant human alteration or editing.

I have no problem with that definition, but it is useless as a policy:
Even in a blog with glaring AI nonsense in it, how can you prove that the
author did not actually edit and improve other significant parts of the text?

Why not say that authors who repeatedly post grossly counterfactual or
misleading content can be banned?

Yours,
Laurenz Albe



Re: Planet Postgres and the curse of AI

From
Greg Sabino Mullane
Date:
On Fri, Jul 19, 2024 at 3:22 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
I have no problem with that definition, but it is useless as a policy:
Even in a blog with glaring AI nonsense in it, how can you prove that the
author did not actually edit and improve other significant parts of the text?

Well, we can't 100% prove it, but we can have ethical guidelines. We already have other guidelines that are open to interpretation (and plenty of planet posts bend the rules quite often, IMO, but that's another post).
 
Why not say that authors who repeatedly post grossly counterfactual or
misleading content can be banned?

Banned is a strong word, but certainly they can have the posts removed, and receive warnings from the planet admins. If the admins can point to a policy, that helps. Perhaps as you hint at, we need a policy to not just discourage AI-generated things, but also wrong/misleading things in general (which was not much of a problem before LLMs arrived, to be honest).

Cheers,
Greg
 

Re: Planet Postgres and the curse of AI

From
Laurenz Albe
Date:
On Tue, 2024-07-23 at 10:38 -0400, Greg Sabino Mullane wrote:
> > Why not say that authors who repeatedly post grossly counterfactual or
> > misleading content can be banned?
>
> Perhaps as you hint at, we need a policy to not just discourage AI-generated
> things, but also wrong/misleading things in general

I have been known to make mistakes in my blogs...
We shouldn't discourage people who happen to blog something wrong.
That's why I used strong verbiage like "grossly counterfactual".

Yours,
Laurenz Albe



Re: Planet Postgres and the curse of AI

From
Avinash Vallarapu
Date:
Hi,

As someone who has taken days to publish each blog upon so many reviews, corrections and edits while
attempting to make it as best/informative and as perfect as possible, it might seem slightly frustrating when
we see some AI generated content, especially when it is misleading readers.

However, I do agree with Lawrence that it is impossible to prove whether it is written by AI or a human.
AI can make mistakes and it might mistakenly point out that a blog is written by AI (which I know is difficult to implement).

I see Moderators spending some time reviewing content and accepting or warning if it is not related to Postgres.
AI may be adopted to help us score whether an article is related to Postgres and decline the submission/blog feed.
But, it is very impossible to use AI or some strategy to identify whether it is written by AI or human.

People may also use AI generated Images in their blogs, and they may be meaningful for their article.
Is it only the content or also the images ?  It might get too complicated while implementing some rules. 

Ultimately, Humans do make mistakes and we shouldn't discourage people assuming it is AI that made that mistake. 



On Tue, Jul 23, 2024 at 11:51 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2024-07-23 at 10:38 -0400, Greg Sabino Mullane wrote:
> > Why not say that authors who repeatedly post grossly counterfactual or
> > misleading content can be banned?
>
> Perhaps as you hint at, we need a policy to not just discourage AI-generated
> things, but also wrong/misleading things in general

I have been known to make mistakes in my blogs...
We shouldn't discourage people who happen to blog something wrong.
That's why I used strong verbiage like "grossly counterfactual".

Yours,
Laurenz Albe




--
Regards,
Avinash Vallarapu