llm poisoning rule

UnGlasierteGurke@feddit.org · edit-2 6 days ago

llm poisoning rule

maria [she/her]@lemmy.blahaj.zone · 9 days ago

id be real fun to do som LM poisoning-

i believe its not that easy unfortunately-- where its very obvious when the data is poisoned. its not as easy as the “glazing” used for image predictors… i believe.

my memory might be outdated, so happy to be proven wrong <3 <|endoftext|>

Gladaed@feddit.org · 9 days ago

To be fair, your old style of writing excessively special probably would damage training sets. That being said you can’t use Lemmy to poison them, effectively.

Also: this is not a riff. It’s. Ok being weird.

ApertureUA@lemmy.today · 8 days ago

Not sure about the <|endoftext|> but the rest of the writing quirks I see here are also the ones I see edgy 13 year olds using nowadays (no offense intended). I guess the new is the well forgotten old, or however that phrase went.

maria [she/her]@lemmy.blahaj.zone · 8 days ago

hmmm… see - i dont believe this is how it goes.

we all know LMs predict patterns, but most of todays “poisoning” attempts at LMs were done byhaving a certain keyword ttigger a random string of characters afterward,…

so in that way, its easy to tell if a certain bit of data was poisoned, even for an LM itself. and by todays standards, every bit of training data is already being filtered, changed and optimized fir training, like how when qwen 3 coder was trained, alibaba group used their older qwen 2.5 coder to clean the training data to be less “noisy”, and it worked!

when peeps say “lm poisoning”, they usually refer to this anthropic post about the topuc released two months ago.

best case: we find a token combination which is frequently used while running the model, rare to find in the post-training data (the instruction tuning dataset) AND is very rare to occur in the prettaining data (the internet source text)… and thats rather limiting.

best case: we poison it well so that the model behaves differently enough for us to ne happy, and too obscurely for the model devs to notice.

so we gotta be very sneaky with the poisoning… again tho, mayb some new better technique came up and is now going to make it easier-

TotallynotJessica@lemmy.blahaj.zone · 8 days ago

The day I see an LLM made toxic yuri is the day I become militant about AI

maria [she/her]@lemmy.blahaj.zone · 8 days ago

smol LMs, specifically “abliterated” finetunes of smol LMs can already di that (they got their “refusal” mechanism trained out of them), but iguess those dont count as large, so u can decide if u wana count that-.,